ChatGPT解决这个技术问题 Extra ChatGPT

SQLite UPSERT / UPDATE OR INSERT

I need to perform UPSERT / INSERT OR UPDATE against a SQLite Database.

There is the command INSERT OR REPLACE which in many cases can be useful. But if you want to keep your id's with autoincrement in place because of foreign keys, it does not work since it deletes the row, creates a new one and consequently this new row has a new ID.

This would be the table:

players - (primary key on id, user_name unique)

|  id   | user_name |  age   |
------------------------------
|  1982 |   johnny  |  23    |
|  1983 |   steven  |  29    |
|  1984 |   pepee   |  40    |

C
Community

Q&A Style

Well, after researching and fighting with the problem for hours, I found out that there are two ways to accomplish this, depending on the structure of your table and if you have foreign keys restrictions activated to maintain integrity. I'd like to share this in a clean format to save some time to the people that may be in my situation.

Option 1: You can afford deleting the row

In other words, you don't have foreign key, or if you have them, your SQLite engine is configured so that there no are integrity exceptions. The way to go is INSERT OR REPLACE. If you are trying to insert/update a player whose ID already exists, the SQLite engine will delete that row and insert the data you are providing. Now the question comes: what to do to keep the old ID associated?

Let's say we want to UPSERT with the data user_name='steven' and age=32.

Look at this code:

INSERT INTO players (id, name, age)

VALUES (
    coalesce((select id from players where user_name='steven'),
             (select max(id) from drawings) + 1),
    32)

The trick is in coalesce. It returns the id of the user 'steven' if any, and otherwise, it returns a new fresh id.

Option 2: You cannot afford deleting the row

After monkeying around with the previous solution, I realized that in my case that could end up destroying data, since this ID works as a foreign key for other table. Besides, I created the table with the clause ON DELETE CASCADE, which would mean that it'd delete data silently. Dangerous.

So, I first thought of a IF clause, but SQLite only has CASE. And this CASE can't be used (or at least I did not manage it) to perform one UPDATE query if EXISTS(select id from players where user_name='steven'), and INSERT if it didn't. No go.

And then, finally I used the brute force, with success. The logic is, for each UPSERT that you want to perform, first execute a INSERT OR IGNORE to make sure there is a row with our user, and then execute an UPDATE query with exactly the same data you tried to insert.

Same data as before: user_name='steven' and age=32.

-- make sure it exists
INSERT OR IGNORE INTO players (user_name, age) VALUES ('steven', 32); 

-- make sure it has the right data
UPDATE players SET user_name='steven', age=32 WHERE user_name='steven'; 

And that's all!

EDIT

As Andy has commented, trying to insert first and then update may lead to firing triggers more often than expected. This is not in my opinion a data safety issue, but it is true that firing unnecessary events makes little sense. Therefore, a improved solution would be:

-- Try to update any existing row
UPDATE players SET age=32 WHERE user_name='steven';

-- Make sure it exists
INSERT OR IGNORE INTO players (user_name, age) VALUES ('steven', 32); 

Ditto... option 2 is great. Except, I did it the other way around: try an update, check if rowsAffected > 0, if not then do an insert.
That's a pretty good approach too, the only small drawback is that you don't have only one SQL for the "upsert".
you don't need re-set user_name in update statement in last code sample. It's enough to set age.
M
Mark A. Donohoe

This is a late answer. Starting from SQLIte 3.24.0, released on June 4, 2018, there is finally a support for UPSERT clause following PostgreSQL syntax.

INSERT INTO players (user_name, age)
  VALUES('steven', 32) 
  ON CONFLICT(user_name) 
  DO UPDATE SET age=excluded.age;

Note: For those having to use a version of SQLite earlier than 3.24.0, please reference this answer below (posted by me, @MarqueIV). However if you do have the option to upgrade, you are strongly encouraged to do so as unlike my solution, the one posted here achieves the desired behavior in a single statement. Plus you get all the other features, improvements and bug fixes that usually come with a more recent release.


For now, no this release in Ubuntu repository yet.
Why can't I use this on android? I tried db.execSQL("insert into bla(id,name) values (?,?) on conflict(id) do update set name=?"). Gives me a syntax error on the word "on"
@BastianVoigt Because the SQLite3 libraries installed on various versions of Android are older than 3.24.0. See: developer.android.com/reference/android/database/sqlite/… Sadly, it you need a new feature of SQLite3 (or any other system library) on Android or iOS, you need to bundle a specific version of SQLite in your application instead of relying on the system installed one.
Rather than UPSERT, isn't this more of an INDATE since it tries the insert first? ;)
Is there a way to automatically update all columns with the new data? Or do I need to specify them one by one, i.e. SET age=excluded.age, gender=excluded.gender etc?
M
Mark A. Donohoe

Here's an approach that doesn't require the brute-force 'ignore' which would only work if there was a key violation. This way works based on any conditions you specify in the update.

Try this...

-- Try to update any existing row
UPDATE players
SET age=32
WHERE user_name='steven';

-- If no update happened (i.e. the row didn't exist) then insert one
INSERT INTO players (user_name, age)
SELECT 'steven', 32
WHERE (Select Changes() = 0);

How It Works

The 'magic sauce' here is using Changes() in the Where clause. Changes() represents the number of rows affected by the last operation, which in this case is the update.

In the above example, if there are no changes from the update (i.e. the record doesn't exist) then Changes() = 0 so the Where clause in the Insert statement evaluates to true and a new row is inserted with the specified data.

If the Update did update an existing row, then Changes() = 1 (or more accurately, not zero if more than one row was updated), so the 'Where' clause in the Insert now evaluates to false and thus no insert will take place.

The beauty of this is there's no brute-force needed, nor unnecessarily deleting, then re-inserting data which may result in messing up downstream keys in foreign-key relationships.

Additionally, since it's just a standard Where clause, it can be based on anything you define, not just key violations. Likewise, you can use Changes() in combination with anything else you want/need anywhere expressions are allowed.


@MarqueIV and what about if there are two items must be updated or inserted? for example, the firs was updated, and the second one not exist. in such case Changes() = 0 will return false and two rows will do INSERT OR REPLACE
Usually an UPSERT is supposed to act on one record. If you are saying you know for sure it is acting on more than one record, then change the count check accordingly.
The bad thing is that if the row is exists, the update method must be executed regardless of whether the row has changed or not.
Why is that a bad thing? And if the data hasn't changed, why are you calling UPSERT in the first place? But even so, it's a good thing the update happens, setting Changes=1 or else the INSERT statement would incorrectly fire, which you don't want it to.
If there are asynchronous applications/functions running that can perform inserts/updates, could this lead to unwanted consequences? Meaning, you perform your first update here, it doesn't go through, then before the select changes() code is run, the other application performs an insert that goes through, then select changes() will return 1, rather than 0. I actually don't know if this is true - I'm moreso asking if it's true, as if it is, it would be a problem here.
A
Andy

The problem with all presented answers it complete lack of taking triggers (and probably other side effects) into account. Solution like

INSERT OR IGNORE ...
UPDATE ...

leads to both triggers executed (for insert and then for update) when row does not exist.

Proper solution is

UPDATE OR IGNORE ...
INSERT OR IGNORE ...

in that case only one statement is executed (when row exists or not).


G
Gilco

To have a pure UPSERT with no holes (for programmers) that don't relay on unique and other keys:

UPDATE players SET user_name="gil", age=32 WHERE user_name='george'; 
SELECT changes();

SELECT changes() will return the number of updates done in the last inquire. Then check if return value from changes() is 0, if so execute:

INSERT INTO players (user_name, age) VALUES ('gil', 32); 

This is equivalent to what @fiznool proposed in his comment (although I would go for his solution). It's all right and actually works fine, but you don't have a unique SQL statement. UPSERT not based on PK or other unique keys makes little to no sense to me.
i
itsho

Option 1: Insert -> Update

If you like to avoid both changes()=0 and INSERT OR IGNORE even if you cannot afford deleting the row - You can use this logic;

First, insert (if not exists) and then update by filtering with the unique key.

Example

-- Table structure
CREATE TABLE players (
    id        INTEGER       PRIMARY KEY AUTOINCREMENT,
    user_name VARCHAR (255) NOT NULL
                            UNIQUE,
    age       INTEGER       NOT NULL
);

-- Insert if NOT exists
INSERT INTO players (user_name, age)
SELECT 'johnny', 20
WHERE NOT EXISTS (SELECT 1 FROM players WHERE user_name='johnny' AND age=20);

-- Update (will affect row, only if found)
-- no point to update user_name to 'johnny' since it's unique, and we filter by it as well
UPDATE players 
SET age=20 
WHERE user_name='johnny';

Regarding Triggers

Notice: I haven't tested it to see the which triggers are being called, but I assume the following:

if row does not exists

BEFORE INSERT

INSERT using INSTEAD OF

AFTER INSERT

BEFORE UPDATE

UPDATE using INSTEAD OF

AFTER UPDATE

if row does exists

BEFORE UPDATE

UPDATE using INSTEAD OF

AFTER UPDATE

Option 2: Insert or replace - keep your own ID

in this way you can have a single SQL command

-- Table structure
CREATE TABLE players (
    id        INTEGER       PRIMARY KEY AUTOINCREMENT,
    user_name VARCHAR (255) NOT NULL
                            UNIQUE,
    age       INTEGER       NOT NULL
);

-- Single command to insert or update
INSERT OR REPLACE INTO players 
(id, user_name, age) 
VALUES ((SELECT id from players WHERE user_name='johnny' AND age=20),
        'johnny',
        20);

Edit: added option 2.


M
Maximilian Tyrtania

You can also just add an ON CONFLICT REPLACE clause to your user_name unique constraint and then just INSERT away, leaving it to SQLite to figure out what to do in case of a conflict. See:https://sqlite.org/lang_conflict.html.

Also note the sentence regarding delete triggers: When the REPLACE conflict resolution strategy deletes rows in order to satisfy a constraint, delete triggers fire if and only if recursive triggers are enabled.


C
CashCow

For those who have the latest version of sqlite available, you can still do it in a single statement using INSERT OR REPLACE but beware you need to set all the values. However this "clever" SQL works by use of a left-join on the table into which you are inserting / updating and ifnull:

import sqlite3

con = sqlite3.connect( ":memory:" )

cur = con.cursor()
cur.execute("create table test( id varchar(20) PRIMARY KEY, value int, value2 int )")
cur.executemany("insert into test (id, value, value2) values (:id, :value, :value2)",
        [ {'id': 'A', 'value' : 1, 'value2' : 8 }, {'id': 'B', 'value' : 3, 'value2' : 10 } ] )
cur.execute('select * from test')
print( cur.fetchall())

con.commit()
cur = con.cursor()

# upsert using insert or replace. 
 # when id is found it should modify value but ignore value2
 # when id is not found it will enter a record with value and value2
upsert = '''
   insert or replace into test
        select d.id, d.value, ifnull(t.value2, d.value2) from ( select :id as id, :value as value, :value2 as value2 ) d  
           left join test t on d.id = t.id
    '''           


upsert_data = [ { 'id' : 'B', 'value' : 4, 'value2' : 5 },
                { 'id' : 'C', 'value' : 3, 'value2' : 12 } ]
       
cur.executemany( upsert, upsert_data )

cur.execute('select * from test')
print( cur.fetchall())

The first few lines of that code are setting up the table, with a single ID primary key column and two values. It then enters data with IDs 'A' and 'B'

The second section creates the 'upsert' text, and calls it for 2 rows of data, one with the ID of 'B' which is found and one with 'C' which is not found.

When you run it, you'll find the data at the end produces

$python3 main.py
[('A', 1, 8), ('B', 3, 10)]
[('A', 1, 8), ('B', 4, 10), ('C', 3, 12)]

B 'updated' value to 4 but value2 (5) was ignored, C inserted.

Note: this does not work if your table has an auto-incremented primary key as INSERT OR REPLACE will replace the number with a new one.

A slight modification to add such a column

import sqlite3

con = sqlite3.connect( ":memory:" )

cur = con.cursor()
cur.execute("create table test( pkey integer primary key autoincrement not null, id varchar(20) UNIQUE not null, value int, value2 int )")
cur.executemany("insert into test (id, value, value2) values (:id, :value, :value2)",
        [ {'id': 'A', 'value' : 1, 'value2' : 8 }, {'id': 'B', 'value' : 3, 'value2' : 10 } ] )
cur.execute('select * from test')
print( cur.fetchall())

con.commit()
cur = con.cursor()

# upsert using insert or replace. 
 # when id is found it should modify value but ignore value2
 # when id is not found it will enter a record with value and value2
upsert = '''
   insert or replace into test (id, value, value2)
        select d.id, d.value, ifnull(t.value2, d.value2) from ( select :id as id, :value as value, :value2 as value2 ) d  
           left join test t on d.id = t.id
    '''           


upsert_data = [ { 'id' : 'B', 'value' : 4, 'value2' : 5 },
                { 'id' : 'C', 'value' : 3, 'value2' : 12 } ]
       
cur.executemany( upsert, upsert_data )

cur.execute('select * from test')
print( cur.fetchall())

output is now:

$python3 main.py
[(1, 'A', 1, 8), (2, 'B', 3, 10)]
[(1, 'A', 1, 8), (3, 'B', 4, 10), (4, 'C', 3, 12)]

Note pkey 2 is replaced with 3 for id 'B'

This is therefore not ideal but is a good solution when:

You don't have an auto-generated primary key

You want to create an 'upsert' query with bound parameters

You want to use executemany() to merge in multiple rows of data in one go.