upsert - PostgreSQL 9.4 pending patch : INSERT…ON DUPLICATE KEY IGNORE
背景
经常有使用MySQL的开发人员问我PostgreSQL里面有没有replace对应的用法.
http://dev.mysql.com/doc/refman/5.7/en/replace.html
这个在PostgreSQL中目前可以使用函数语言来代替, 使用函数或者online code, 例如
do language plpgsql $$
declare
begin
update xxx set xxx where xxx;
if not found then
insert into xxx(xxx) values (xxx);
end if;
end;
$$;
还有一种MySQL中用得较多的是insert ignore语法. 即插入错误不报错, 忽略该行的插入.
MySQL中的用法详见.
http://dev.mysql.com/doc/refman/5.7/en/insert.html
在说本文提出的补丁前, 我们先看看PostgreSQL可以通过哪几种手段来实现.
digoal=# create table i_test(id int primary key, info text);
CREATE TABLE
正文
1. 函数
digoal=# create or replace function f_i_test(i_id int, i_info text) returns void as $$
declare
begin
insert into i_test (id,info) values (i_id,i_info);
exception
when SQLSTATE '23505' then
return;
end;
$$ language plpgsql strict;
CREATE FUNCTION
digoal=# select f_i_test(1,'test');
f_i_test
----------
(1 row)
digoal=# select ctid,* from i_test;
ctid | id | info
-------+----+------
(0,1) | 1 | test
(1 row)
digoal=# select f_i_test(1,'test');
f_i_test
----------
(1 row)
digoal=# select ctid,* from i_test;
ctid | id | info
-------+----+------
(0,1) | 1 | test
(1 row)
digoal=# select f_i_test(2,'test');
f_i_test
----------
(1 row)
digoal=# select ctid,* from i_test;
ctid | id | info
-------+----+------
(0,1) | 1 | test
(0,3) | 2 | test
(2 rows)
2. do
digoal=# do language plpgsql $$
declare
begin
insert into i_test(id,info) values (1,'test');
exception when SQLSTATE '23505' then
end;
$$;
DO
digoal=# do language plpgsql $$
declare
begin
insert into i_test(id,info) values (3,'test');
exception when SQLSTATE '23505' then
end;
$$;
DO
digoal=# select ctid,* from i_test;
ctid | id | info
-------+----+------
(0,1) | 1 | test
(0,3) | 2 | test
(0,5) | 3 | test
(3 rows)
3. 触发器
digoal=# create or replace function tg_i_test () returns trigger as $$
declare
begin
perform 1 from i_test where id=NEW.id limit 1;
if found then
return null;
else
return new;
end if;
end;
$$ language plpgsql strict;
digoal=# create trigger tg_i before insert on i_test for each row execute procedure tg_i_test();
CREATE TRIGGER
digoal=# insert into i_test values (1,'test');
INSERT 0 0
digoal=# insert into i_test values (4,'test');
INSERT 0 1
digoal=# select ctid,* from i_test;
ctid | id | info
-------+----+------
(0,1) | 1 | test
(0,3) | 2 | test
(0,5) | 3 | test
(0,6) | 4 | test
(4 rows)
因为是before触发器, 所以使用这个方法不会产生垃圾数据, 推荐使用.
4. 规则
digoal=# drop trigger tg_i on i_test;
DROP TRIGGER
digoal=# create rule r_i_test as on insert to i_test where exists (select 1 from i_test where id = new.id) do instead nothing;
CREATE RULE
digoal=# insert into i_test values (4,'test');
INSERT 0 0
digoal=# insert into i_test values (5,'test');
INSERT 0 1
digoal=# select ctid,* from i_test;
ctid | id | info
-------+----+------
(0,1) | 1 | test
(0,3) | 2 | test
(0,5) | 3 | test
(0,6) | 4 | test
(0,7) | 5 | test
(5 rows)
注意1, 规则对于copy不响应, 所以无法规避copy带来的问题. 如果逻辑备份为copy格式的话, 还原的时候也无法规避这个问题. 触发器不存在这个问题.
digoal=# copy i_test from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1 'test'
>> 2 'test'
>> 100 't'
>> \.
ERROR: duplicate key value violates unique constraint "i_test_pkey"
DETAIL: Key (id)=(1) already exists.
CONTEXT: COPY i_test, line 1
注意2, rule 中使用exists,用在这里并不完美,仅仅适合单条insert的语句(使用volatile 函数解决这个问题),否则需要约束来保证唯一性。
postgres=# create table e(id int, info text);
CREATE TABLE
postgres=# create rule r1 as on insert to e where exists (select 1 from e t1 where t1.id=NEW.id limit 1) do instead nothing;
CREATE RULE
在一个语句中插入多条,如果多条中有重复,则在RULE中判断条件时仅判断一次(类似immutable函数)。
postgres=# insert into e values (1,'test'),(1,'test');
INSERT 0 2
postgres=# select * from e;
id | info
----+------
1 | test
1 | test
(2 rows)
解决方法,使用volatile函数
CREATE OR REPLACE FUNCTION public.ff(integer)
RETURNS boolean
LANGUAGE sql
STRICT
volatile
AS $function$
select true from e where id=$1 limit 1;
$function$;
postgres=# drop rule r1 on e;
DROP RULE
postgres=# create rule r1 as on insert to e where ff(NEW.id) do instead nothing;
CREATE RULE
完美了
postgres=# insert into e values (1,'test'),(1,'test');
INSERT 0 0
postgres=# insert into e values (2,'test'),(2,'test');
INSERT 0 1
postgres=# insert into e values (3,'test'),(3,'test');
INSERT 0 1
postgres=# truncate e;
TRUNCATE TABLE
postgres=# select * from b;
id | info
----+------
1 | a
1 | b
(2 rows)
postgres=# insert into e select * from b;
INSERT 0 1
postgres=# \d+ b
Table "public.b"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
不管怎么样,我始终建议需要UPSERT的表,必须有PK。
5. with语法
digoal=# insert into i_test with tmp(c1,c2) as ( values(1,'test'),(2,'test'),(1000,'test') )
digoal-# select tmp.* from tmp where tmp.c1 not in (select id from i_test);
INSERT 0 1
digoal=# insert into i_test with tmp(c1,c2) as ( values(1,'test'),(2,'test'),(1000,'test') )
select tmp.* from tmp where tmp.c1 not in (select id from i_test);
INSERT 0 0
接下来进入正题, 本文要说的这个PostgreSQL补丁就是达到这个目的的, 如果违反唯一约束, 可以忽略插入不报错.
语法 :
insert into t1 values (5,'d',now()) on duplicate key ignore;
insert into t1 values (5,'d',now()) on duplicate key ignore returning rejects *;
补丁:
请使用8月28日的postgresql snapshot.
wget http://www.postgresql.org/message-id/attachment/29989/insert_on_dup.v1.2013_08_30.patch.gz
gunzip insert_on_dup.v1.2013_08_30.patch.gz
[root@db-172-16-3-33 postgresql-090d0f2]# patch -p1 < insert_on_dup.v1.2013_08_30.patch
patching file contrib/pg_stat_statements/pg_stat_statements.c
patching file src/backend/access/heap/tuptoaster.c
patching file src/backend/access/index/indexam.c
patching file src/backend/access/nbtree/nbtinsert.c
patching file src/backend/access/nbtree/nbtree.c
patching file src/backend/access/nbtree/nbtsearch.c
patching file src/backend/catalog/index.c
patching file src/backend/catalog/indexing.c
patching file src/backend/commands/constraint.c
patching file src/backend/commands/copy.c
patching file src/backend/executor/execUtils.c
patching file src/backend/executor/nodeModifyTable.c
patching file src/backend/nodes/copyfuncs.c
patching file src/backend/nodes/equalfuncs.c
patching file src/backend/nodes/nodeFuncs.c
patching file src/backend/nodes/outfuncs.c
patching file src/backend/nodes/readfuncs.c
patching file src/backend/optimizer/plan/createplan.c
patching file src/backend/optimizer/plan/planner.c
patching file src/backend/parser/analyze.c
patching file src/backend/parser/gram.y
patching file src/include/access/genam.h
patching file src/include/access/nbtree.h
patching file src/include/catalog/catversion.h
patching file src/include/catalog/pg_am.h
patching file src/include/catalog/pg_proc.h
patching file src/include/executor/executor.h
patching file src/include/nodes/execnodes.h
patching file src/include/nodes/nodes.h
patching file src/include/nodes/parsenodes.h
patching file src/include/nodes/plannodes.h
patching file src/include/optimizer/planmain.h
patching file src/include/parser/kwlist.h
patching file src/include/utils/rel.h
patching file src/test/isolation/expected/insert-duplicate-key.out
patching file src/test/isolation/isolation_schedule
patching file src/test/isolation/specs/insert-duplicate-key.spec
gmake
gmake install
测试 :
digoal=# create table t(id int primary key, info text unique, crt_time timestamp);
CREATE TABLE
digoal=# \d+ t
Table "public.t"
Column | Type | Modifiers | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+--------------+-------------
id | integer | not null | plain | |
info | text | | extended | |
crt_time | timestamp without time zone | | plain | |
Indexes:
"t_pkey" PRIMARY KEY, btree (id)
"t_info_key" UNIQUE CONSTRAINT, btree (info)
Has OIDs: no
digoal=# insert into t values (1,'a',now());
INSERT 0 1
digoal=# insert into t values (1,'a',now()) on duplicate key ignore;
INSERT 0 0
digoal=# insert into t values (2,'b',now());
INSERT 0 1
使用returning rejects *可以返回未被插入的行.
digoal=# insert into t values (1,'a',now()) on duplicate key ignore returning rejects *;
id | info | crt_time
----+------+----------------------------
1 | a | 2013-09-06 09:03:18.962054
(1 row)
INSERT 0 0
digoal=# select ctid,* from t;
ctid | id | info | crt_time
-------+----+------+----------------------------
(0,1) | 1 | a | 2013-09-06 09:01:41.58451
(0,2) | 2 | b | 2013-09-06 09:02:10.369413
(2 rows)
使用on duplicate key ignore 后, 插入违反唯一约束的话, 记录不会被插入, 同时也不会消耗实际的item.
如果不使用on duplicate key ignore, 会消耗item. 例如.
digoal=# insert into t values (1,'a',now());
ERROR: duplicate key value violates unique constraint "t_pkey"
DETAIL: Key (id)=(1) already exists.
digoal=# insert into t values (1,'a',now());
ERROR: duplicate key value violates unique constraint "t_pkey"
DETAIL: Key (id)=(1) already exists.
digoal=# insert into t values (1,'a',now());
ERROR: duplicate key value violates unique constraint "t_pkey"
DETAIL: Key (id)=(1) already exists.
以上SQL插入报错了, 同时将消耗3条item.
digoal=# insert into t values (3,'c',now());
INSERT 0 1
digoal=# select ctid,* from t;
ctid | id | info | crt_time
-------+----+------+----------------------------
(0,1) | 1 | a | 2013-09-06 09:01:41.58451
(0,2) | 2 | b | 2013-09-06 09:02:10.369413
(0,6) | 3 | c | 2013-09-06 09:04:13.720437
(3 rows)
使用on duplicate key, 插入时如果遇到已有的key, 锁和不使用on duplicate key情形一样.
不使用on duplicate key ignore
session a:
digoal=# begin;
BEGIN
digoal=# insert into t values (4,'d',now());
INSERT 0 1
session b:
digoal=# begin;
BEGIN
digoal=# insert into t values (4,'d',now());
-- 等待中
使用on duplicate key ignore, 同样处于等待.
session b:
digoal=# begin;
BEGIN
digoal=# insert into t values (4,'d',now()) on duplicate key ignore;
从锁信息中可以看到, session b 在等待session a的事务锁.
digoal=# select query,waiting from pg_stat_activity;
query | waiting
-------------------------------------------------------------+---------
insert into t values (4,'d',now()); | f
insert into t values (4,'d',now()) on duplicate key ignore; | t
select query,waiting from pg_stat_activity; | f
(3 rows)
digoal=# select query,waiting,pid from pg_stat_activity;
query | waiting | pid
-------------------------------------------------------------+---------+-------
insert into t values (4,'d',now()); | f | 30999
insert into t values (4,'d',now()) on duplicate key ignore; | t | 24139
select query,waiting,pid from pg_stat_activity; | f | 23361
(3 rows)
digoal=# select * from pg_locks where pid=24139;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |
pid | mode | granted | fastpath
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+
-------+------------------+---------+----------
relation | 16384 | 16426 | | | | | | | | 3/3864 |
24139 | RowExclusiveLock | t | t
relation | 16384 | 16424 | | | | | | | | 3/3864 |
24139 | RowExclusiveLock | t | t
relation | 16384 | 16418 | | | | | | | | 3/3864 |
24139 | RowExclusiveLock | t | t
virtualxid | | | | | 3/3864 | | | | | 3/3864 |
24139 | ExclusiveLock | t | t
transactionid | | | | | | 1712 | | | | 3/3864 |
24139 | ShareLock | f | f
(5 rows)
digoal=# select * from pg_locks where pid=30999;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |
pid | mode | granted | fastpath
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+
-------+------------------+---------+----------
relation | 16384 | 16418 | | | | | | | | 2/1217 |
30999 | RowExclusiveLock | t | t
virtualxid | | | | | 2/1217 | | | | | 2/1217 |
30999 | ExclusiveLock | t | t
transactionid | | | | | | 1712 | | | | 2/1217 |
30999 | ExclusiveLock | t | f
(3 rows)
参考
1. http://dev.mysql.com/doc/refman/5.5/en/replace.html
2. https://commitfest.postgresql.org/action/patch_view?id=1201
3. http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html
4. https://commitfest.postgresql.org/3/35/