PostgreSQL merge insert(upsert/insert into on conflict) 如何区分数据是INSERT还是UPDATE

3 minute read

背景

使用insert into on conflict update语法,可以支持UPSERT的功能,但是到底这条SQL是插入的还是更新的呢?如何判断

通过xmax字段的值是否不为0,可以判断,如果是UPDATE,XMAX里面会填充更新事务号。

注意直接用UPDATE语句更新的话,XMAX会写入0,因为是新版本,而老版本上XMAX会填入更新事务号。

例子

1 insert on conflict

postgres=# create table t(id int primary key, info text, crt_time timestamp);  
CREATE TABLE  
postgres=# insert into t values (1,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time returning xmax;  
 xmax   
------  
    0  
(1 row)  
  
INSERT 0 1  
postgres=# insert into t values (1,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time returning xmax;  
   xmax      
-----------  
 160369640  
(1 row)  
  
INSERT 0 1  
postgres=# select xmin,xmax,* from t;  
   xmin    |   xmax    | id | info |          crt_time            
-----------+-----------+----+------+----------------------------  
 160369640 | 160369640 |  1 | test | 2018-10-17 12:09:38.760926  
(1 row)  
  
postgres=# insert into t values (1,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time returning xmax;  
   xmax      
-----------  
 160369641  
(1 row)  
  
INSERT 0 1  
postgres=# select xmin,xmax,* from t;  
   xmin    |   xmax    | id | info |          crt_time            
-----------+-----------+----+------+----------------------------  
 160369641 | 160369641 |  1 | test | 2018-10-17 12:10:11.738691  
(1 row)  
  
postgres=# insert into t values (2,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time returning xmax;  
 xmax   
------  
    0  
(1 row)  
  
INSERT 0 1  
postgres=# select xmin,xmax,* from t;  
   xmin    |   xmax    | id | info |          crt_time            
-----------+-----------+----+------+----------------------------  
 160369641 | 160369641 |  1 | test | 2018-10-17 12:10:11.738691  
 160369642 |         0 |  2 | test | 2018-10-17 12:10:24.758745  
(2 rows)  
  
postgres=# select ctid,xmin,xmax,* from t;  
 ctid  |   xmin    |   xmax    | id | info |          crt_time            
-------+-----------+-----------+----+------+----------------------------  
 (0,3) | 160369641 | 160369641 |  1 | test | 2018-10-17 12:10:11.738691  
 (0,4) | 160369642 |         0 |  2 | test | 2018-10-17 12:10:24.758745  
(2 rows)  
  
postgres=# insert into t values (2,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time returning ctid,xmin,xmax,*;  
 ctid  |   xmin    |   xmax    | id | info |          crt_time            
-------+-----------+-----------+----+------+----------------------------  
 (0,5) | 160369643 | 160369643 |  2 | test | 2018-10-17 12:10:45.951351  
(1 row)  
  
INSERT 0 1  
postgres=# select ctid,xmin,xmax,* from t;  
 ctid  |   xmin    |   xmax    | id | info |          crt_time            
-------+-----------+-----------+----+------+----------------------------  
 (0,3) | 160369641 | 160369641 |  1 | test | 2018-10-17 12:10:11.738691  
 (0,5) | 160369643 | 160369643 |  2 | test | 2018-10-17 12:10:45.951351  
(2 rows)  

2 直接update

postgres=# update t set info='a' returning xmin,xmax,ctid,*;  
   xmin    | xmax | ctid  | id | info |          crt_time            
-----------+------+-------+----+------+----------------------------  
 160369644 |    0 | (0,6) |  1 | a    | 2018-10-17 12:10:11.738691  
 160369644 |    0 | (0,7) |  2 | a    | 2018-10-17 12:10:45.951351  
(2 rows)  
  
UPDATE 2  

3 update 回滚

postgres=# begin;  
BEGIN  
postgres=# update t set info='a' returning xmin,xmax,ctid,*;  
   xmin    | xmax | ctid  | id | info |          crt_time            
-----------+------+-------+----+------+----------------------------  
 160369645 |    0 | (0,8) |  1 | a    | 2018-10-17 12:10:11.738691  
 160369645 |    0 | (0,9) |  2 | a    | 2018-10-17 12:10:45.951351  
(2 rows)  
  
UPDATE 2  
postgres=# rollback;  
ROLLBACK  
postgres=# select ctid,xmin,xmax,* from t;  
 ctid  |   xmin    |   xmax    | id | info |          crt_time            
-------+-----------+-----------+----+------+----------------------------  
 (0,6) | 160369644 | 160369645 |  1 | a    | 2018-10-17 12:10:11.738691  
 (0,7) | 160369644 | 160369645 |  2 | a    | 2018-10-17 12:10:45.951351  
(2 rows)  

4 delete 回滚

postgres=# begin;  
BEGIN  
postgres=# delete from t returning ctid,xmin,xmax,*;  
 ctid  |   xmin    |   xmax    | id | info |          crt_time            
-------+-----------+-----------+----+------+----------------------------  
 (0,6) | 160369644 | 160369646 |  1 | a    | 2018-10-17 12:10:11.738691  
 (0,7) | 160369644 | 160369646 |  2 | a    | 2018-10-17 12:10:45.951351  
(2 rows)  
  
DELETE 2  
postgres=# rollback;  
ROLLBACK  
postgres=# select ctid,xmin,xmax,* from t;  
 ctid  |   xmin    |   xmax    | id | info |          crt_time            
-------+-----------+-----------+----+------+----------------------------  
 (0,6) | 160369644 | 160369646 |  1 | a    | 2018-10-17 12:10:11.738691  
 (0,7) | 160369644 | 160369646 |  2 | a    | 2018-10-17 12:10:45.951351  
(2 rows)  

小结

1、insert into on conflict do update,返回xmax不等于0,表示update,等于0表示insert。

2、直接update,并提交,提交的记录上xmax为0。

3、直接update,并回滚,老版本上的XMAX不为0,表示更新该行的事务号。

4、直接DELETE,并回滚,老版本上的XMAX不为0,表示删除该行的事务号。

ctid表示行号, xmin表示INSERT该记录的事务号,xmax表示删除该记录(update实际上是删除老版本新增新版本,所以老版本上xmax有值)的事务号。

参考

《Greenplum & PostgreSQL UPSERT udf 实现 - 2 batch批量模式》

《Greenplum & PostgreSQL UPSERT udf 实现 - 1 单行模式》

《PostgreSQL 多重含义数组检索与条件过滤 (标签1:属性, 标签n:属性) - 包括UPSERT操作如何修改数组、追加数组元素》

[《HTAP数据库 PostgreSQL 场景与性能测试之 22 - (OLTP) merge insert upsert insert on conflict 合并写入》](../201711/20171107_23.md)

《PostgreSQL upsert功能(insert on conflict do)的用法》

《PostgreSQL 如何实现upsert与新旧数据自动分离》

《[转载]postgresql 9.5版本之前实现upsert功能》

《upsert - PostgreSQL 9.4 pending patch : INSERT…ON DUPLICATE KEY IGNORE》

Flag Counter

digoal’s 大量PostgreSQL文章入口