upsert - PostgreSQL 9.4 pending patch : INSERT…ON DUPLICATE KEY IGNORE

8 minute read

背景

经常有使用MySQL的开发人员问我PostgreSQL里面有没有replace对应的用法.

http://dev.mysql.com/doc/refman/5.7/en/replace.html

这个在PostgreSQL中目前可以使用函数语言来代替, 使用函数或者online code, 例如

do language plpgsql $$  
declare  
begin  
    update xxx set xxx where xxx;  
    if not found then  
      insert into xxx(xxx) values (xxx);  
    end if;  
end;  
$$;  

还有一种MySQL中用得较多的是insert ignore语法. 即插入错误不报错, 忽略该行的插入.

MySQL中的用法详见.

http://dev.mysql.com/doc/refman/5.7/en/insert.html

在说本文提出的补丁前, 我们先看看PostgreSQL可以通过哪几种手段来实现.

digoal=# create table i_test(id int primary key, info text);  
CREATE TABLE  

正文

1. 函数

digoal=# create or replace function f_i_test(i_id int, i_info text) returns void as $$  
declare  
begin  
  insert into i_test (id,info) values (i_id,i_info);  
  exception  
  when SQLSTATE '23505' then  
    return;  
end;  
$$ language plpgsql strict;  
CREATE FUNCTION  
digoal=# select f_i_test(1,'test');  
 f_i_test   
----------  
   
(1 row)  
digoal=# select ctid,* from i_test;  
 ctid  | id | info   
-------+----+------  
 (0,1) |  1 | test  
(1 row)  
digoal=# select f_i_test(1,'test');  
 f_i_test   
----------  
   
(1 row)  
digoal=# select ctid,* from i_test;  
 ctid  | id | info   
-------+----+------  
 (0,1) |  1 | test  
(1 row)  
digoal=# select f_i_test(2,'test');  
 f_i_test   
----------  
   
(1 row)  
digoal=# select ctid,* from i_test;  
 ctid  | id | info   
-------+----+------  
 (0,1) |  1 | test  
 (0,3) |  2 | test  
(2 rows)  

2. do

digoal=# do language plpgsql $$     
declare  
begin  
  insert into i_test(id,info) values (1,'test');  
  exception when SQLSTATE '23505' then  
end;  
$$;  
DO  
digoal=# do language plpgsql $$  
declare  
begin  
  insert into i_test(id,info) values (3,'test');  
  exception when SQLSTATE '23505' then  
end;  
$$;  
DO  
digoal=# select ctid,* from i_test;  
 ctid  | id | info   
-------+----+------  
 (0,1) |  1 | test  
 (0,3) |  2 | test  
 (0,5) |  3 | test  
(3 rows)  

3. 触发器

digoal=# create or replace function tg_i_test () returns trigger as $$  
declare  
begin  
  perform 1 from i_test where id=NEW.id limit 1;  
  if found then  
    return null;  
  else  
    return new;  
  end if;  
end;  
$$ language plpgsql strict;  
digoal=# create trigger tg_i before insert on i_test for each row execute procedure tg_i_test();  
CREATE TRIGGER  
digoal=# insert into i_test values (1,'test');  
INSERT 0 0  
digoal=# insert into i_test values (4,'test');  
INSERT 0 1  
digoal=# select ctid,* from i_test;  
 ctid  | id | info   
-------+----+------  
 (0,1) |  1 | test  
 (0,3) |  2 | test  
 (0,5) |  3 | test  
 (0,6) |  4 | test  
(4 rows)  

因为是before触发器, 所以使用这个方法不会产生垃圾数据, 推荐使用.

4. 规则

digoal=# drop trigger tg_i on i_test;  
DROP TRIGGER  
digoal=# create rule r_i_test as on insert to i_test where exists (select 1 from i_test where id = new.id) do instead nothing;  
CREATE RULE  
digoal=# insert into i_test values (4,'test');  
INSERT 0 0  
digoal=# insert into i_test values (5,'test');  
INSERT 0 1  
digoal=# select ctid,* from i_test;  
 ctid  | id | info   
-------+----+------  
 (0,1) |  1 | test  
 (0,3) |  2 | test  
 (0,5) |  3 | test  
 (0,6) |  4 | test  
 (0,7) |  5 | test  
(5 rows)  

注意1, 规则对于copy不响应, 所以无法规避copy带来的问题. 如果逻辑备份为copy格式的话, 还原的时候也无法规避这个问题. 触发器不存在这个问题.

digoal=# copy i_test from stdin;  
Enter data to be copied followed by a newline.  
End with a backslash and a period on a line by itself.  
>> 1    'test'  
>> 2    'test'  
>> 100  't'  
>> \.  
ERROR:  duplicate key value violates unique constraint "i_test_pkey"  
DETAIL:  Key (id)=(1) already exists.  
CONTEXT:  COPY i_test, line 1  

注意2, rule 中使用exists,用在这里并不完美,仅仅适合单条insert的语句(使用volatile 函数解决这个问题),否则需要约束来保证唯一性。

postgres=# create table e(id int, info text);  
CREATE TABLE  
postgres=# create rule r1 as on insert to e where exists (select 1 from e t1 where t1.id=NEW.id limit 1) do instead nothing;  
CREATE RULE  
    
在一个语句中插入多条,如果多条中有重复,则在RULE中判断条件时仅判断一次(类似immutable函数)。    
    
postgres=# insert into e values (1,'test'),(1,'test');  
INSERT 0 2  
postgres=# select * from e;  
 id | info   
----+------  
  1 | test  
  1 | test  
(2 rows)  
    
解决方法,使用volatile函数  
  
CREATE OR REPLACE FUNCTION public.ff(integer)  
 RETURNS boolean  
 LANGUAGE sql  
 STRICT  
 volatile  
AS $function$  
  select true from e where id=$1 limit 1;  
$function$;  
  
postgres=# drop rule r1 on e;  
DROP RULE  
postgres=# create rule r1 as on insert to e where ff(NEW.id) do instead nothing;  
CREATE RULE  
  
完美了  
postgres=# insert into e values (1,'test'),(1,'test');  
INSERT 0 0  
postgres=# insert into e values (2,'test'),(2,'test');  
INSERT 0 1  
postgres=# insert into e values (3,'test'),(3,'test');  
INSERT 0 1  
postgres=# truncate e;  
TRUNCATE TABLE  
postgres=# select * from b;  
 id | info   
----+------  
  1 | a  
  1 | b  
(2 rows)  
  
postgres=# insert into e select * from b;  
INSERT 0 1  
postgres=# \d+ b  
                                     Table "public.b"  
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description   
--------+---------+-----------+----------+---------+----------+--------------+-------------  
 id     | integer |           |          |         | plain    |              |   
 info   | text    |           |          |         | extended |              |   

不管怎么样,我始终建议需要UPSERT的表,必须有PK。

5. with语法

digoal=# insert into i_test with tmp(c1,c2) as ( values(1,'test'),(2,'test'),(1000,'test') )   
digoal-# select tmp.* from tmp where tmp.c1 not in (select id from i_test);  
INSERT 0 1  
digoal=# insert into i_test with tmp(c1,c2) as ( values(1,'test'),(2,'test'),(1000,'test') )   
select tmp.* from tmp where tmp.c1 not in (select id from i_test);  
INSERT 0 0  

接下来进入正题, 本文要说的这个PostgreSQL补丁就是达到这个目的的, 如果违反唯一约束, 可以忽略插入不报错.

语法 :

insert into t1 values (5,'d',now()) on duplicate key ignore;  
insert into t1 values (5,'d',now()) on duplicate key ignore returning rejects *;  

补丁:

请使用8月28日的postgresql snapshot.

wget http://www.postgresql.org/message-id/attachment/29989/insert_on_dup.v1.2013_08_30.patch.gz  
gunzip insert_on_dup.v1.2013_08_30.patch.gz  
  
[root@db-172-16-3-33 postgresql-090d0f2]# patch -p1 < insert_on_dup.v1.2013_08_30.patch   
patching file contrib/pg_stat_statements/pg_stat_statements.c  
patching file src/backend/access/heap/tuptoaster.c  
patching file src/backend/access/index/indexam.c  
patching file src/backend/access/nbtree/nbtinsert.c  
patching file src/backend/access/nbtree/nbtree.c  
patching file src/backend/access/nbtree/nbtsearch.c  
patching file src/backend/catalog/index.c  
patching file src/backend/catalog/indexing.c  
patching file src/backend/commands/constraint.c  
patching file src/backend/commands/copy.c  
patching file src/backend/executor/execUtils.c  
patching file src/backend/executor/nodeModifyTable.c  
patching file src/backend/nodes/copyfuncs.c  
patching file src/backend/nodes/equalfuncs.c  
patching file src/backend/nodes/nodeFuncs.c  
patching file src/backend/nodes/outfuncs.c  
patching file src/backend/nodes/readfuncs.c  
patching file src/backend/optimizer/plan/createplan.c  
patching file src/backend/optimizer/plan/planner.c  
patching file src/backend/parser/analyze.c  
patching file src/backend/parser/gram.y  
patching file src/include/access/genam.h  
patching file src/include/access/nbtree.h  
patching file src/include/catalog/catversion.h  
patching file src/include/catalog/pg_am.h  
patching file src/include/catalog/pg_proc.h  
patching file src/include/executor/executor.h  
patching file src/include/nodes/execnodes.h  
patching file src/include/nodes/nodes.h  
patching file src/include/nodes/parsenodes.h  
patching file src/include/nodes/plannodes.h  
patching file src/include/optimizer/planmain.h  
patching file src/include/parser/kwlist.h  
patching file src/include/utils/rel.h  
patching file src/test/isolation/expected/insert-duplicate-key.out  
patching file src/test/isolation/isolation_schedule  
patching file src/test/isolation/specs/insert-duplicate-key.spec  
  
gmake  
gmake install  

测试 :

digoal=# create table t(id int primary key, info text unique, crt_time timestamp);  
CREATE TABLE  
digoal=# \d+ t  
                                      Table "public.t"  
  Column  |            Type             | Modifiers | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+--------------+-------------  
 id       | integer                     | not null  | plain    |              |   
 info     | text                        |           | extended |              |   
 crt_time | timestamp without time zone |           | plain    |              |   
Indexes:  
    "t_pkey" PRIMARY KEY, btree (id)  
    "t_info_key" UNIQUE CONSTRAINT, btree (info)  
Has OIDs: no  
  
digoal=# insert into t values (1,'a',now());  
INSERT 0 1  
digoal=# insert into t values (1,'a',now()) on duplicate key ignore;  
INSERT 0 0  
digoal=# insert into t values (2,'b',now());  
INSERT 0 1  

使用returning rejects *可以返回未被插入的行.

digoal=# insert into t values (1,'a',now()) on duplicate key ignore returning rejects *;  
 id | info |          crt_time            
----+------+----------------------------  
  1 | a    | 2013-09-06 09:03:18.962054  
(1 row)  
INSERT 0 0  
digoal=# select ctid,* from t;  
 ctid  | id | info |          crt_time            
-------+----+------+----------------------------  
 (0,1) |  1 | a    | 2013-09-06 09:01:41.58451  
 (0,2) |  2 | b    | 2013-09-06 09:02:10.369413  
(2 rows)  

使用on duplicate key ignore 后, 插入违反唯一约束的话, 记录不会被插入, 同时也不会消耗实际的item.

如果不使用on duplicate key ignore, 会消耗item. 例如.

digoal=# insert into t values (1,'a',now());  
ERROR:  duplicate key value violates unique constraint "t_pkey"  
DETAIL:  Key (id)=(1) already exists.  
digoal=# insert into t values (1,'a',now());  
ERROR:  duplicate key value violates unique constraint "t_pkey"  
DETAIL:  Key (id)=(1) already exists.  
digoal=# insert into t values (1,'a',now());  
ERROR:  duplicate key value violates unique constraint "t_pkey"  
DETAIL:  Key (id)=(1) already exists.  

以上SQL插入报错了, 同时将消耗3条item.

digoal=# insert into t values (3,'c',now());  
INSERT 0 1  
digoal=# select ctid,* from t;  
 ctid  | id | info |          crt_time            
-------+----+------+----------------------------  
 (0,1) |  1 | a    | 2013-09-06 09:01:41.58451  
 (0,2) |  2 | b    | 2013-09-06 09:02:10.369413  
 (0,6) |  3 | c    | 2013-09-06 09:04:13.720437  
(3 rows)  

使用on duplicate key, 插入时如果遇到已有的key, 锁和不使用on duplicate key情形一样.

不使用on duplicate key ignore

session a:  
digoal=# begin;  
BEGIN  
digoal=# insert into t values (4,'d',now());  
INSERT 0 1  
session b:  
digoal=# begin;  
BEGIN  
digoal=# insert into t values (4,'d',now());  
  
-- 等待中  
  
使用on duplicate key ignore, 同样处于等待.  
  
session b:  
digoal=# begin;  
BEGIN  
digoal=# insert into t values (4,'d',now()) on duplicate key ignore;  

从锁信息中可以看到, session b 在等待session a的事务锁.

digoal=# select query,waiting from pg_stat_activity;  
                            query                            | waiting   
-------------------------------------------------------------+---------  
 insert into t values (4,'d',now());                         | f  
 insert into t values (4,'d',now()) on duplicate key ignore; | t  
 select query,waiting from pg_stat_activity;                 | f  
(3 rows)  
  
digoal=# select query,waiting,pid from pg_stat_activity;  
                            query                            | waiting |  pid    
-------------------------------------------------------------+---------+-------  
 insert into t values (4,'d',now());                         | f       | 30999  
 insert into t values (4,'d',now()) on duplicate key ignore; | t       | 24139  
 select query,waiting,pid from pg_stat_activity;             | f       | 23361  
(3 rows)  
  
digoal=# select * from pg_locks where pid=24139;  
   locktype    | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  
  pid  |       mode       | granted | fastpath   
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+  
-------+------------------+---------+----------  
 relation      |    16384 |    16426 |      |       |            |               |         |       |          | 3/3864             |  
 24139 | RowExclusiveLock | t       | t  
 relation      |    16384 |    16424 |      |       |            |               |         |       |          | 3/3864             |  
 24139 | RowExclusiveLock | t       | t  
 relation      |    16384 |    16418 |      |       |            |               |         |       |          | 3/3864             |  
 24139 | RowExclusiveLock | t       | t  
 virtualxid    |          |          |      |       | 3/3864     |               |         |       |          | 3/3864             |  
 24139 | ExclusiveLock    | t       | t  
 transactionid |          |          |      |       |            |          1712 |         |       |          | 3/3864             |  
 24139 | ShareLock        | f       | f  
(5 rows)  
  
digoal=# select * from pg_locks where pid=30999;  
   locktype    | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  
  pid  |       mode       | granted | fastpath   
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+  
-------+------------------+---------+----------  
 relation      |    16384 |    16418 |      |       |            |               |         |       |          | 2/1217             |  
 30999 | RowExclusiveLock | t       | t  
 virtualxid    |          |          |      |       | 2/1217     |               |         |       |          | 2/1217             |  
 30999 | ExclusiveLock    | t       | t  
 transactionid |          |          |      |       |            |          1712 |         |       |          | 2/1217             |  
 30999 | ExclusiveLock    | t       | f  
(3 rows)  

参考

1. http://dev.mysql.com/doc/refman/5.5/en/replace.html

2. https://commitfest.postgresql.org/action/patch_view?id=1201

3. http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html

4. https://commitfest.postgresql.org/3/35/

Flag Counter

digoal’s 大量PostgreSQL文章入口