I need to perform UPSERT / INSERT OR UPDATE against a SQLite Database.
There is the command INSERT OR REPLACE which in many cases can be useful. But if you want to keep your id's with autoincrement in place because of foreign keys, it does not work since it deletes the row, creates a new one and consequently this new row has a new ID.
This would be the table:
players - (primary key on id, user_name unique)
| id | user_name | age |
------------------------------
| 1982 | johnny | 23 |
| 1983 | steven | 29 |
| 1984 | pepee | 40 |
Q&A Style
Well, after researching and fighting with the problem for hours, I found out that there are two ways to accomplish this, depending on the structure of your table and if you have foreign keys restrictions activated to maintain integrity. I'd like to share this in a clean format to save some time to the people that may be in my situation.
Option 1: You can afford deleting the row
In other words, you don't have foreign key, or if you have them, your SQLite engine is configured so that there no are integrity exceptions. The way to go is INSERT OR REPLACE. If you are trying to insert/update a player whose ID already exists, the SQLite engine will delete that row and insert the data you are providing. Now the question comes: what to do to keep the old ID associated?
Let's say we want to UPSERT with the data user_name='steven' and age=32.
Look at this code:
INSERT INTO players (id, name, age)
VALUES (
coalesce((select id from players where user_name='steven'),
(select max(id) from drawings) + 1),
32)
The trick is in coalesce. It returns the id of the user 'steven' if any, and otherwise, it returns a new fresh id.
Option 2: You cannot afford deleting the row
After monkeying around with the previous solution, I realized that in my case that could end up destroying data, since this ID works as a foreign key for other table. Besides, I created the table with the clause ON DELETE CASCADE, which would mean that it'd delete data silently. Dangerous.
So, I first thought of a IF clause, but SQLite only has CASE. And this CASE can't be used (or at least I did not manage it) to perform one UPDATE query if EXISTS(select id from players where user_name='steven'), and INSERT if it didn't. No go.
And then, finally I used the brute force, with success. The logic is, for each UPSERT that you want to perform, first execute a INSERT OR IGNORE to make sure there is a row with our user, and then execute an UPDATE query with exactly the same data you tried to insert.
Same data as before: user_name='steven' and age=32.
-- make sure it exists
INSERT OR IGNORE INTO players (user_name, age) VALUES ('steven', 32);
-- make sure it has the right data
UPDATE players SET user_name='steven', age=32 WHERE user_name='steven';
And that's all!
EDIT
As Andy has commented, trying to insert first and then update may lead to firing triggers more often than expected. This is not in my opinion a data safety issue, but it is true that firing unnecessary events makes little sense. Therefore, a improved solution would be:
-- Try to update any existing row
UPDATE players SET age=32 WHERE user_name='steven';
-- Make sure it exists
INSERT OR IGNORE INTO players (user_name, age) VALUES ('steven', 32);
This is a late answer. Starting from SQLIte 3.24.0, released on June 4, 2018, there is finally a support for UPSERT clause following PostgreSQL syntax.
INSERT INTO players (user_name, age)
VALUES('steven', 32)
ON CONFLICT(user_name)
DO UPDATE SET age=excluded.age;
Note: For those having to use a version of SQLite earlier than 3.24.0, please reference this answer below (posted by me, @MarqueIV). However if you do have the option to upgrade, you are strongly encouraged to do so as unlike my solution, the one posted here achieves the desired behavior in a single statement. Plus you get all the other features, improvements and bug fixes that usually come with a more recent release.
db.execSQL("insert into bla(id,name) values (?,?) on conflict(id) do update set name=?")
. Gives me a syntax error on the word "on"
SET age=excluded.age, gender=excluded.gender
etc?
Here's an approach that doesn't require the brute-force 'ignore' which would only work if there was a key violation. This way works based on any conditions you specify in the update.
Try this...
-- Try to update any existing row
UPDATE players
SET age=32
WHERE user_name='steven';
-- If no update happened (i.e. the row didn't exist) then insert one
INSERT INTO players (user_name, age)
SELECT 'steven', 32
WHERE (Select Changes() = 0);
How It Works
The 'magic sauce' here is using Changes()
in the Where
clause. Changes()
represents the number of rows affected by the last operation, which in this case is the update.
In the above example, if there are no changes from the update (i.e. the record doesn't exist) then Changes()
= 0 so the Where
clause in the Insert
statement evaluates to true and a new row is inserted with the specified data.
If the Update
did update an existing row, then Changes()
= 1 (or more accurately, not zero if more than one row was updated), so the 'Where' clause in the Insert
now evaluates to false and thus no insert will take place.
The beauty of this is there's no brute-force needed, nor unnecessarily deleting, then re-inserting data which may result in messing up downstream keys in foreign-key relationships.
Additionally, since it's just a standard Where
clause, it can be based on anything you define, not just key violations. Likewise, you can use Changes()
in combination with anything else you want/need anywhere expressions are allowed.
Changes() = 0
will return false and two rows will do INSERT OR REPLACE
UPSERT
in the first place? But even so, it's a good thing the update happens, setting Changes=1
or else the INSERT
statement would incorrectly fire, which you don't want it to.
The problem with all presented answers it complete lack of taking triggers (and probably other side effects) into account. Solution like
INSERT OR IGNORE ...
UPDATE ...
leads to both triggers executed (for insert and then for update) when row does not exist.
Proper solution is
UPDATE OR IGNORE ...
INSERT OR IGNORE ...
in that case only one statement is executed (when row exists or not).
To have a pure UPSERT with no holes (for programmers) that don't relay on unique and other keys:
UPDATE players SET user_name="gil", age=32 WHERE user_name='george';
SELECT changes();
SELECT changes() will return the number of updates done in the last inquire. Then check if return value from changes() is 0, if so execute:
INSERT INTO players (user_name, age) VALUES ('gil', 32);
Option 1: Insert -> Update
If you like to avoid both changes()=0
and INSERT OR IGNORE
even if you cannot afford deleting the row - You can use this logic;
First, insert (if not exists) and then update by filtering with the unique key.
Example
-- Table structure
CREATE TABLE players (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_name VARCHAR (255) NOT NULL
UNIQUE,
age INTEGER NOT NULL
);
-- Insert if NOT exists
INSERT INTO players (user_name, age)
SELECT 'johnny', 20
WHERE NOT EXISTS (SELECT 1 FROM players WHERE user_name='johnny' AND age=20);
-- Update (will affect row, only if found)
-- no point to update user_name to 'johnny' since it's unique, and we filter by it as well
UPDATE players
SET age=20
WHERE user_name='johnny';
Regarding Triggers
Notice: I haven't tested it to see the which triggers are being called, but I assume the following:
if row does not exists
BEFORE INSERT
INSERT using INSTEAD OF
AFTER INSERT
BEFORE UPDATE
UPDATE using INSTEAD OF
AFTER UPDATE
if row does exists
BEFORE UPDATE
UPDATE using INSTEAD OF
AFTER UPDATE
Option 2: Insert or replace - keep your own ID
in this way you can have a single SQL command
-- Table structure
CREATE TABLE players (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_name VARCHAR (255) NOT NULL
UNIQUE,
age INTEGER NOT NULL
);
-- Single command to insert or update
INSERT OR REPLACE INTO players
(id, user_name, age)
VALUES ((SELECT id from players WHERE user_name='johnny' AND age=20),
'johnny',
20);
Edit: added option 2.
You can also just add an ON CONFLICT REPLACE clause to your user_name unique constraint and then just INSERT away, leaving it to SQLite to figure out what to do in case of a conflict. See:https://sqlite.org/lang_conflict.html.
Also note the sentence regarding delete triggers: When the REPLACE conflict resolution strategy deletes rows in order to satisfy a constraint, delete triggers fire if and only if recursive triggers are enabled.
For those who have the latest version of sqlite available, you can still do it in a single statement using INSERT OR REPLACE but beware you need to set all the values. However this "clever" SQL works by use of a left-join on the table into which you are inserting / updating and ifnull:
import sqlite3
con = sqlite3.connect( ":memory:" )
cur = con.cursor()
cur.execute("create table test( id varchar(20) PRIMARY KEY, value int, value2 int )")
cur.executemany("insert into test (id, value, value2) values (:id, :value, :value2)",
[ {'id': 'A', 'value' : 1, 'value2' : 8 }, {'id': 'B', 'value' : 3, 'value2' : 10 } ] )
cur.execute('select * from test')
print( cur.fetchall())
con.commit()
cur = con.cursor()
# upsert using insert or replace.
# when id is found it should modify value but ignore value2
# when id is not found it will enter a record with value and value2
upsert = '''
insert or replace into test
select d.id, d.value, ifnull(t.value2, d.value2) from ( select :id as id, :value as value, :value2 as value2 ) d
left join test t on d.id = t.id
'''
upsert_data = [ { 'id' : 'B', 'value' : 4, 'value2' : 5 },
{ 'id' : 'C', 'value' : 3, 'value2' : 12 } ]
cur.executemany( upsert, upsert_data )
cur.execute('select * from test')
print( cur.fetchall())
The first few lines of that code are setting up the table, with a single ID primary key column and two values. It then enters data with IDs 'A' and 'B'
The second section creates the 'upsert' text, and calls it for 2 rows of data, one with the ID of 'B' which is found and one with 'C' which is not found.
When you run it, you'll find the data at the end produces
$python3 main.py
[('A', 1, 8), ('B', 3, 10)]
[('A', 1, 8), ('B', 4, 10), ('C', 3, 12)]
B 'updated' value to 4 but value2 (5) was ignored, C inserted.
Note: this does not work if your table has an auto-incremented primary key as INSERT OR REPLACE will replace the number with a new one.
A slight modification to add such a column
import sqlite3
con = sqlite3.connect( ":memory:" )
cur = con.cursor()
cur.execute("create table test( pkey integer primary key autoincrement not null, id varchar(20) UNIQUE not null, value int, value2 int )")
cur.executemany("insert into test (id, value, value2) values (:id, :value, :value2)",
[ {'id': 'A', 'value' : 1, 'value2' : 8 }, {'id': 'B', 'value' : 3, 'value2' : 10 } ] )
cur.execute('select * from test')
print( cur.fetchall())
con.commit()
cur = con.cursor()
# upsert using insert or replace.
# when id is found it should modify value but ignore value2
# when id is not found it will enter a record with value and value2
upsert = '''
insert or replace into test (id, value, value2)
select d.id, d.value, ifnull(t.value2, d.value2) from ( select :id as id, :value as value, :value2 as value2 ) d
left join test t on d.id = t.id
'''
upsert_data = [ { 'id' : 'B', 'value' : 4, 'value2' : 5 },
{ 'id' : 'C', 'value' : 3, 'value2' : 12 } ]
cur.executemany( upsert, upsert_data )
cur.execute('select * from test')
print( cur.fetchall())
output is now:
$python3 main.py
[(1, 'A', 1, 8), (2, 'B', 3, 10)]
[(1, 'A', 1, 8), (3, 'B', 4, 10), (4, 'C', 3, 12)]
Note pkey 2 is replaced with 3 for id 'B'
This is therefore not ideal but is a good solution when:
You don't have an auto-generated primary key
You want to create an 'upsert' query with bound parameters
You want to use executemany() to merge in multiple rows of data in one go.
Success story sharing