[转载]postgresql 9.5版本之前实现upsert功能

4 minute read

背景

有没有办法实现在pg9.5版本之前实现upsert功能,整理如下

原文地址

https://yq.aliyun.com/articles/36103

正文

最近有开发人员问,有没有办法实现在pg9.5版本之前实现upsert功能,现整理如下

创建测试表,注意此处先不要主键或唯一约束

create table t2 (id int,name varchar(100));  

pg 在9.5之前实现不存在则插入

现在需要实现,当id字段的值存在时,则更新name字段的值,如果id字段的值不存在,则执行插入

with upsert as (update t2 set name='rudy1' where id=5 returning *) insert into t2 select 5,'rudy' where not exists (select 1 from upsert where id=5);  

注意使用此种方法并不能保证两个事务同时插入一条数据

session1执行该语句,成功

postgres=# begin;  
BEGIN  
postgres=# with upsert as (update t2 set name='rudy2' where id=5 returning *) insert into t2 select 5,'rudy' where not exists (select 1 from upsert where id=5);  
INSERT 0 1  

session2执行该语句,也成功

postgres=# begin;  
BEGIN  
postgres=# with upsert as (update t2 set name='rudy2' where id=5 returning *) insert into t2 select 5,'rudy' where not exists (select 1 from upsert where id=5);  
INSERT 0 1  

两者都提交后发现id=5的数据有两条

postgres=# select * from t2;  
 id | name   
----+------  
  5 | rudy  
  5 | rudy  

为了保证并发,此时可以给表加上主键或唯一键

postgres=# alter table t2 add primary key(id);  
ALTER TABLE  

此时session2再提交语句则会报错

postgres=# begin;  
BEGIN  
postgres=# with upsert as (update t2 set name='rudy3' where id=5 returning *) insert into t2 select 5,'rudy' where not exists (select 1 from upsert where id=5);   
ERROR:  duplicate key value violates unique constraint "t2_pkey"  
DETAIL:  Key (id)=(5) already exists.  

那有没有办法实现在表上没有主键或唯一约束时,也能保证并发呢?

有,此时需要使用pg_try_advisory_xact_lock(其是一个轻量级的锁,在事务回滚或提交后,会自动释放锁),但其接受的参数是整数,为了保证尽量唯一,可以使用md5函数

借助于lock实现upsert,注意此sql对于记录不存在,可以保证只有一个session插入数据,对于同一条数据更新,先来的session会lock着记录,后来的session会wait

with   
  w1 as(select ('x'||substr(md5('6'),1,16))::bit(64)::bigint as tra_id),  
  upsert as (update t2 set name='rudy2' where id=6 returning *)  
insert into t2 select 6,'rudy'   
  from w1   
where pg_try_advisory_xact_lock(tra_id) and not exists (select 1 from upsert where id=6);  

借助于lock实现upsert,注意此sql对于记录不存在,可以保证只有一个session插入数据,对于同一条数据更新,先来的session会更新数据,后来的session 失败

with w1 as(select ('x'||substr(md5('6'),1,16))::bit(64)::bigint as tra_id),  
upsert as (update t2 set name='rudy2' from w1 where pg_try_advisory_xact_lock(tra_id) and id=6 returning *)  
insert into t2 select 6,'rudy'   
from w1 where pg_try_advisory_xact_lock(tra_id) and not exists (select 1 from upsert where id=6);   

后记

为了保证性能,id字段最好有索引(但不一定是主键或唯一约束时)

如果校验字段是否字段不为id,把相应字段的替换掉id字段则可

由md5虚拟的tra_id并不保证一定是唯一的,但重复的概率极低

在pg9.5中可以直接使用upsert,注意此时要求表上有主键或唯一约束

insert into t2 values(5,'rudy1') ON CONFLICT(id) do update set name=EXCLUDED.name ;  

对于mysql可以使用insert into on duplicate key实现类似功能(其也要求有主键或唯一约束)在此不详细举例

对于oracle可以使用merge into,想想还是这个更强大,嘿嘿

rule方法

实际上PostgreSQL很早就支持RULE语法,可以在RULE中创建规则,存在则更新,不存在则插入。

但是使用时,务必参考后面的注意事项,使用volatile函数,不要直接使用exists(会当成immutable函数风格处理)。

postgres=# create table d(id int primary key, info text, crt_time timestamp);

postgres=# create rule r1 as on insert to d where (exists (select 1 from d where d.id=NEW.id)) do instead update d set info=NEW.info,crt_time=NEW.crt_time where id=NEW.id;

postgres=# insert into d values (1,'test',now());
INSERT 0 1
postgres=# select * from d;
 id | info |          crt_time          
----+------+----------------------------
  1 | test | 2017-08-10 14:12:20.053353
(1 row)

postgres=# insert into d values (1,'test123',now());
INSERT 0 0
postgres=# select * from d;
 id |  info   |          crt_time          
----+---------+----------------------------
  1 | test123 | 2017-08-10 14:12:26.964074
(1 row)

存在则不插入(忽略),不存在则更新。实现幂等写入(断点续传写入不出问题)。

postgres=# create table d(id int primary key, info text, crt_time timestamp);

postgres=# create rule r1 as on insert to d where (exists (select 1 from d where d.id=NEW.id)) do instead nothing;
CREATE RULE
postgres=# insert into d values (1,'test123',now());
INSERT 0 0
postgres=# insert into d values (1,'test123',now());
INSERT 0 0
postgres=# insert into d values (1,'test123',now());
INSERT 0 0
postgres=# insert into d values (0,'test123',now());
INSERT 0 1

性能压测

vi test.sql

\set id random(1,1000000)
insert into d values (:id, md5(random()::text), now());

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 1000

progress: 90.0 s, 132056.5 tps, lat 0.220 ms stddev 0.055
progress: 91.0 s, 131656.9 tps, lat 0.220 ms stddev 0.040
progress: 92.0 s, 134941.0 tps, lat 0.215 ms stddev 0.090
progress: 93.0 s, 134324.5 tps, lat 0.216 ms stddev 0.076
progress: 94.0 s, 136699.4 tps, lat 0.212 ms stddev 0.070
progress: 95.0 s, 139291.4 tps, lat 0.208 ms stddev 0.067
progress: 96.0 s, 136073.2 tps, lat 0.213 ms stddev 0.076
progress: 97.0 s, 135804.6 tps, lat 0.214 ms stddev 0.076
progress: 98.0 s, 146037.6 tps, lat 0.199 ms stddev 0.069
progress: 99.0 s, 129619.5 tps, lat 0.224 ms stddev 0.049
progress: 100.0 s, 129230.0 tps, lat 0.224 ms stddev 0.047
progress: 101.0 s, 131048.4 tps, lat 0.221 ms stddev 0.055
progress: 102.0 s, 128808.0 tps, lat 0.225 ms stddev 0.048
progress: 103.0 s, 128954.6 tps, lat 0.225 ms stddev 0.048
progress: 104.0 s, 131227.9 tps, lat 0.221 ms stddev 0.042
progress: 105.0 s, 129604.0 tps, lat 0.224 ms stddev 0.057

rule法使用注意事项:

1、rule 中使用exists,用在这里并不完美,仅仅适合单条insert的语句(使用volatile 函数解决这个问题),否则需要约束来保证唯一性。

postgres=# create table e(id int, info text);
CREATE TABLE
postgres=# create rule r1 as on insert to e where exists (select 1 from e t1 where t1.id=NEW.id limit 1) do instead nothing;
CREATE RULE

在一个语句中插入多条,如果多条中有重复,则在RULE中判断条件时仅判断一次(类似immutable函数)。

postgres=# insert into e values (1,'test'),(1,'test');
INSERT 0 2
postgres=# select * from e;
 id | info
----+------
  1 | test
  1 | test
(2 rows)

解决方法,使用volatile函数
postgres=# drop rule r1 on e;
DROP RULE
CREATE OR REPLACE FUNCTION public.ff(integer)
 RETURNS boolean
 LANGUAGE sql
 STRICT
 volatile
AS $function$
  select true from e where id=$1 limit 1;
$function$;

postgres=# create rule r1 as on insert to e where ff(NEW.id) do instead nothing;
CREATE RULE
postgres=# insert into e values (1,'test'),(1,'test');
INSERT 0 0
postgres=# insert into e values (2,'test'),(2,'test');
INSERT 0 1
postgres=# insert into e values (3,'test'),(3,'test');
INSERT 0 1
postgres=# truncate e;
TRUNCATE TABLE
postgres=# select * from b;
 id | info
----+------
  1 | a
  1 | b
(2 rows)

postgres=# insert into e select * from b;
INSERT 0 1
postgres=# \d+ b
                                     Table "public.b"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
 id     | integer |           |          |         | plain    |              |
 info   | text    |           |          |         | extended |              |

2、rule不支持COPY语句,也就是说COPY如果有重复,一样会导致问题。

不管怎么样,我始终建议需要UPSERT的表,必须有PK。

Flag Counter

digoal’s 大量PostgreSQL文章入口