PostgreSQL merge insert(upsert/insert into on conflict) 如何区分数据是INSERT还是UPDATE
背景
使用insert into on conflict update语法,可以支持UPSERT的功能,但是到底这条SQL是插入的还是更新的呢?如何判断
通过xmax字段的值是否不为0,可以判断,如果是UPDATE,XMAX里面会填充更新事务号。
注意直接用UPDATE语句更新的话,XMAX会写入0,因为是新版本,而老版本上XMAX会填入更新事务号。
例子
1 insert on conflict
postgres=# create table t(id int primary key, info text, crt_time timestamp);
CREATE TABLE
postgres=# insert into t values (1,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time returning xmax;
xmax
------
0
(1 row)
INSERT 0 1
postgres=# insert into t values (1,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time returning xmax;
xmax
-----------
160369640
(1 row)
INSERT 0 1
postgres=# select xmin,xmax,* from t;
xmin | xmax | id | info | crt_time
-----------+-----------+----+------+----------------------------
160369640 | 160369640 | 1 | test | 2018-10-17 12:09:38.760926
(1 row)
postgres=# insert into t values (1,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time returning xmax;
xmax
-----------
160369641
(1 row)
INSERT 0 1
postgres=# select xmin,xmax,* from t;
xmin | xmax | id | info | crt_time
-----------+-----------+----+------+----------------------------
160369641 | 160369641 | 1 | test | 2018-10-17 12:10:11.738691
(1 row)
postgres=# insert into t values (2,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time returning xmax;
xmax
------
0
(1 row)
INSERT 0 1
postgres=# select xmin,xmax,* from t;
xmin | xmax | id | info | crt_time
-----------+-----------+----+------+----------------------------
160369641 | 160369641 | 1 | test | 2018-10-17 12:10:11.738691
160369642 | 0 | 2 | test | 2018-10-17 12:10:24.758745
(2 rows)
postgres=# select ctid,xmin,xmax,* from t;
ctid | xmin | xmax | id | info | crt_time
-------+-----------+-----------+----+------+----------------------------
(0,3) | 160369641 | 160369641 | 1 | test | 2018-10-17 12:10:11.738691
(0,4) | 160369642 | 0 | 2 | test | 2018-10-17 12:10:24.758745
(2 rows)
postgres=# insert into t values (2,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time returning ctid,xmin,xmax,*;
ctid | xmin | xmax | id | info | crt_time
-------+-----------+-----------+----+------+----------------------------
(0,5) | 160369643 | 160369643 | 2 | test | 2018-10-17 12:10:45.951351
(1 row)
INSERT 0 1
postgres=# select ctid,xmin,xmax,* from t;
ctid | xmin | xmax | id | info | crt_time
-------+-----------+-----------+----+------+----------------------------
(0,3) | 160369641 | 160369641 | 1 | test | 2018-10-17 12:10:11.738691
(0,5) | 160369643 | 160369643 | 2 | test | 2018-10-17 12:10:45.951351
(2 rows)
2 直接update
postgres=# update t set info='a' returning xmin,xmax,ctid,*;
xmin | xmax | ctid | id | info | crt_time
-----------+------+-------+----+------+----------------------------
160369644 | 0 | (0,6) | 1 | a | 2018-10-17 12:10:11.738691
160369644 | 0 | (0,7) | 2 | a | 2018-10-17 12:10:45.951351
(2 rows)
UPDATE 2
3 update 回滚
postgres=# begin;
BEGIN
postgres=# update t set info='a' returning xmin,xmax,ctid,*;
xmin | xmax | ctid | id | info | crt_time
-----------+------+-------+----+------+----------------------------
160369645 | 0 | (0,8) | 1 | a | 2018-10-17 12:10:11.738691
160369645 | 0 | (0,9) | 2 | a | 2018-10-17 12:10:45.951351
(2 rows)
UPDATE 2
postgres=# rollback;
ROLLBACK
postgres=# select ctid,xmin,xmax,* from t;
ctid | xmin | xmax | id | info | crt_time
-------+-----------+-----------+----+------+----------------------------
(0,6) | 160369644 | 160369645 | 1 | a | 2018-10-17 12:10:11.738691
(0,7) | 160369644 | 160369645 | 2 | a | 2018-10-17 12:10:45.951351
(2 rows)
4 delete 回滚
postgres=# begin;
BEGIN
postgres=# delete from t returning ctid,xmin,xmax,*;
ctid | xmin | xmax | id | info | crt_time
-------+-----------+-----------+----+------+----------------------------
(0,6) | 160369644 | 160369646 | 1 | a | 2018-10-17 12:10:11.738691
(0,7) | 160369644 | 160369646 | 2 | a | 2018-10-17 12:10:45.951351
(2 rows)
DELETE 2
postgres=# rollback;
ROLLBACK
postgres=# select ctid,xmin,xmax,* from t;
ctid | xmin | xmax | id | info | crt_time
-------+-----------+-----------+----+------+----------------------------
(0,6) | 160369644 | 160369646 | 1 | a | 2018-10-17 12:10:11.738691
(0,7) | 160369644 | 160369646 | 2 | a | 2018-10-17 12:10:45.951351
(2 rows)
小结
1、insert into on conflict do update,返回xmax不等于0,表示update,等于0表示insert。
2、直接update,并提交,提交的记录上xmax为0。
3、直接update,并回滚,老版本上的XMAX不为0,表示更新该行的事务号。
4、直接DELETE,并回滚,老版本上的XMAX不为0,表示删除该行的事务号。
ctid表示行号, xmin表示INSERT该记录的事务号,xmax表示删除该记录(update实际上是删除老版本新增新版本,所以老版本上xmax有值)的事务号。
参考
《Greenplum & PostgreSQL UPSERT udf 实现 - 2 batch批量模式》
《Greenplum & PostgreSQL UPSERT udf 实现 - 1 单行模式》
《PostgreSQL 多重含义数组检索与条件过滤 (标签1:属性, 标签n:属性) - 包括UPSERT操作如何修改数组、追加数组元素》
[《HTAP数据库 PostgreSQL 场景与性能测试之 22 - (OLTP) merge insert | upsert | insert on conflict | 合并写入》](../201711/20171107_23.md) |
《PostgreSQL upsert功能(insert on conflict do)的用法》
《PostgreSQL 如何实现upsert与新旧数据自动分离》
《[转载]postgresql 9.5版本之前实现upsert功能》
《upsert - PostgreSQL 9.4 pending patch : INSERT…ON DUPLICATE KEY IGNORE》