PostgreSQL OID 源码分析
背景
PostgreSQL的许多系统表都用到了OID这个数据类型,OID是一个uint类型,最大可以存储的值为2^32-1
/*
* Object ID is a fundamental type in Postgres.
*/
typedef unsigned int Oid;
很多系统表使用OID作为主键,PostgreSQL 使用GetNewOid函数获取下一个可用的OID,那么怎么保证获取到的OID是可用的呢,并且确保不重复即不违反唯一约束?
例如,我们使用pg_resetxlog可以重置NEXTOid,如果被重置后,又是如何获取一个不会违反唯一约束的OID呢?
看代码最直接,获取OID的底层函数如下,第一次时,一次性获取VAR_OID_PREFETCH = 8192个作为缓存,
access/transam/varsup.c:#define VAR_OID_PREFETCH 8192
src/backend/access/transam/varsup.c
/*
* GetNewObjectId -- allocate a new OID
*
* OIDs are generated by a cluster-wide counter. Since they are only 32 bits
* wide, counter wraparound will occur eventually, and therefore it is unwise
* to assume they are unique unless precautions are taken to make them so.
* Hence, this routine should generally not be used directly. The only
* direct callers should be GetNewOid() and GetNewRelFileNode() in
* catalog/catalog.c.
*/
Oid
GetNewObjectId(void)
{
Oid result;
/* safety check, we should never get this far in a HS slave */
if (RecoveryInProgress())
elog(ERROR, "cannot assign OIDs during recovery");
LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
/*
* Check for wraparound of the OID counter. We *must* not return 0
* (InvalidOid); and as long as we have to check that, it seems a good
* idea to skip over everything below FirstNormalObjectId too. (This
* basically just avoids lots of collisions with bootstrap-assigned OIDs
* right after a wrap occurs, so as to avoid a possibly large number of
* iterations in GetNewOid.) Note we are relying on unsigned comparison.
*
* During initdb, we start the OID generator at FirstBootstrapObjectId, so
* we only wrap if before that point when in bootstrap or standalone mode.
* The first time through this routine after normal postmaster start, the
* counter will be forced up to FirstNormalObjectId. This mechanism
* leaves the OIDs between FirstBootstrapObjectId and FirstNormalObjectId
* available for automatic assignment during initdb, while ensuring they
* will never conflict with user-assigned OIDs.
*/
if (ShmemVariableCache->nextOid < ((Oid) FirstNormalObjectId)) // 处理非常规OID,改为常规的OID,起到保护作用
{
if (IsPostmasterEnvironment)
{
/* wraparound, or first post-initdb assignment, in normal mode */
ShmemVariableCache->nextOid = FirstNormalObjectId;
ShmemVariableCache->oidCount = 0;
}
else
{
/* we may be bootstrapping, so don't enforce the full range */
if (ShmemVariableCache->nextOid < ((Oid) FirstBootstrapObjectId))
{
/* wraparound in standalone mode (unlikely but possible) */
ShmemVariableCache->nextOid = FirstNormalObjectId;
ShmemVariableCache->oidCount = 0;
}
}
}
/* If we run out of logged for use oids then we must log more */
if (ShmemVariableCache->oidCount == 0) // 当一次获取的缓存用完后,重新获取8192个oid
{
XLogPutNextOid(ShmemVariableCache->nextOid + VAR_OID_PREFETCH); // 一次获取多个
ShmemVariableCache->oidCount = VAR_OID_PREFETCH; // 将计数器置为8192
}
result = ShmemVariableCache->nextOid;
(ShmemVariableCache->nextOid)++; // 每次使用一个oid后nextOid加1
(ShmemVariableCache->oidCount)--; // 对应的count减1
LWLockRelease(OidGenLock);
return result;
}
16384是第一个数据库启动后,用户可以使用的OID。
src/include/access/transam.h
/* ----------
* Object ID (OID) zero is InvalidOid.
*
* OIDs 1-9999 are reserved for manual assignment (see the files
* in src/include/catalog/).
*
* OIDS 10000-16383 are reserved for assignment during initdb
* using the OID generator. (We start the generator at 10000.)
*
* OIDs beginning at 16384 are assigned from the OID generator
* during normal multiuser operation. (We force the generator up to
* 16384 as soon as we are in normal operation.)
*
* The choices of 10000 and 16384 are completely arbitrary, and can be moved
* if we run low on OIDs in either category. Changing the macros below
* should be sufficient to do this.
*
* NOTE: if the OID generator wraps around, we skip over OIDs 0-16383
* and resume with 16384. This minimizes the odds of OID conflict, by not
* reassigning OIDs that might have been assigned during initdb.
* ----------
*/
#define FirstBootstrapObjectId 10000
#define FirstNormalObjectId 16384
下面是获取OID的面向上层的接口,这里有保护作用。
例如我们前面的疑问,如何确保获取的OID不违反唯一约束。
src/backend/catalog/catalog.c
/*
* GetNewOid
* Generate a new OID that is unique within the given relation.
*
* Caller must have a suitable lock on the relation.
*
* Uniqueness is promised only if the relation has a unique index on OID.
* This is true for all system catalogs that have OIDs, but might not be
* true for user tables. Note that we are effectively assuming that the
* table has a relatively small number of entries (much less than 2^32)
* and there aren't very long runs of consecutive existing OIDs. Again,
* this is reasonable for system catalogs but less so for user tables.
*
* Since the OID is not immediately inserted into the table, there is a
* race condition here; but a problem could occur only if someone else
* managed to cycle through 2^32 OIDs and generate the same OID before we
* finish inserting our row. This seems unlikely to be a problem. Note
* that if we had to *commit* the row to end the race condition, the risk
* would be rather higher; therefore we use SnapshotDirty in the test,
* so that we will see uncommitted rows.
*/
Oid
GetNewOid(Relation relation)
{
Oid oidIndex;
/* If relation doesn't have OIDs at all, caller is confused */
Assert(relation->rd_rel->relhasoids);
/* In bootstrap mode, we don't have any indexes to use */
if (IsBootstrapProcessingMode())
return GetNewObjectId();
/* The relcache will cache the identity of the OID index for us */
oidIndex = RelationGetOidIndex(relation); // 获取OID的对象的OID列的唯一索引的pg_class.oid
/* If no OID index, just hand back the next OID counter value */
if (!OidIsValid(oidIndex)) // 如果OID不为InvalidOid ,说明这个需要获取oid的对象的OID列上面没有唯一索引
{
/*
* System catalogs that have OIDs should *always* have a unique OID
* index; we should only take this path for user tables. Give a
* warning if it looks like somebody forgot an index.
*/
if (IsSystemRelation(relation))
elog(WARNING, "generating possibly-non-unique OID for \"%s\"",
RelationGetRelationName(relation));
return GetNewObjectId(); // 不考虑重复问题,因为OID列没有唯一索引
}
/* Otherwise, use the index to find a nonconflicting OID */
return GetNewOidWithIndex(relation, oidIndex, ObjectIdAttributeNumber); // 考虑重复
}
获取OID的对象的OID列的唯一索引的pg_class.oid,有唯一索引,返回pg_class.oid,没有唯一索引,返回InvalidOid 。
/*
* RelationGetOidIndex -- get the pg_class OID of the relation's OID index
*
* Returns InvalidOid if there is no such index.
*/
Oid
RelationGetOidIndex(Relation relation)
{
List *ilist;
/*
* If relation doesn't have OIDs at all, caller is probably confused. (We
* could just silently return InvalidOid, but it seems better to throw an
* assertion.)
*/
Assert(relation->rd_rel->relhasoids);
if (relation->rd_indexvalid == 0)
{
/* RelationGetIndexList does the heavy lifting. */
ilist = RelationGetIndexList(relation);
list_free(ilist);
Assert(relation->rd_indexvalid != 0);
}
return relation->rd_oidindex; // 返回OID列对应的唯一索引的pg_class.OID
}
typedef struct RelationData
{
......
Oid rd_oidindex; /* OID of unique index on OID, if any */
......
}
对于OID列有唯一索引的表,需要调用这个函数来获取OID,确保不会获得违反唯一约束的OID。
例如pg_class.oid列有唯一索引,那么在向pg_class插入数据时,需要使用GetNewOidWithIndex来获得next oid.
/*
* GetNewOidWithIndex
* Guts of GetNewOid: use the supplied index
*
* This is exported separately because there are cases where we want to use
* an index that will not be recognized by RelationGetOidIndex: TOAST tables
* have indexes that are usable, but have multiple columns and are on
* ordinary columns rather than a true OID column. This code will work
* anyway, so long as the OID is the index's first column. The caller must
* pass in the actual heap attnum of the OID column, however.
*
* Caller must have a suitable lock on the relation.
*/
Oid
GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
{
Oid newOid;
SnapshotData SnapshotDirty;
SysScanDesc scan;
ScanKeyData key;
bool collides;
InitDirtySnapshot(SnapshotDirty);
/* Generate new OIDs until we find one not in the table */ // 获取一个不会违反唯一约束的OID
do
{
CHECK_FOR_INTERRUPTS();
newOid = GetNewObjectId();
ScanKeyInit(&key,
oidcolumn,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(newOid));
/* see notes above about using SnapshotDirty */
scan = systable_beginscan(relation, indexId, true,
&SnapshotDirty, 1, &key);
collides = HeapTupleIsValid(systable_getnext(scan));
systable_endscan(scan);
} while (collides);
return newOid;
}
例子:
postgres=# select max(oid) from pg_class;
max
--------
186766
(1 row)
从控制文件获得 NextOID = 186769
postgres@digoal-> pg_controldata
Latest checkpoint's NextOID: 186769
创建一个包含oid的表,并且oid列上面没有建唯一索引。
postgres=# create table t1(id int) with oids;
CREATE TABLE
postgres=# insert into t1 values (1);
INSERT 186772 1
postgres=# select oid,* from t1;
oid | id
--------+----
186772 | 1
(1 row)
postgres=# insert into t1 values (1);
INSERT 186773 1
postgres=# select oid,* from t1;
oid | id
--------+----
186772 | 1
186773 | 1
(2 rows)
修改控制文件,把nextoid改为100,小于normaloid的一个值。
postgres@digoal-> pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
postgres@digoal-> pg_resetxlog --help
pg_resetxlog resets the PostgreSQL transaction log.
Usage:
pg_resetxlog [OPTION]... DATADIR
Options:
-e XIDEPOCH set next transaction ID epoch
-f force update to be done
-l XLOGFILE force minimum WAL starting location for new transaction log
-m MXID,MXID set next and oldest multitransaction ID
-n no update, just show what would be done (for testing)
-o OID set next OID
-O OFFSET set next multitransaction offset
-V, --version output version information, then exit
-x XID set next transaction ID
-?, --help show this help, then exit
Report bugs to <pgsql-bugs@postgresql.org>.
postgres@digoal-> pg_resetxlog -o 100 $PGDATA
Transaction log reset
启动数据库后,插入T1表,我们看到获取到的oid是16384即从FirstNormalObjectId 开始的第一个值。
postgres=# insert into t1 values (1);
INSERT 16384 1
postgres=# insert into t1 values (1);
INSERT 16385 1
postgres=# insert into t1 values (1);
INSERT 16386 1
控制文件在做完检查点后会被修改。
postgres=# \q
postgres@digoal-> pg_controldata |grep OID
Latest checkpoint's NextOID: 100
postgres=# checkpoint;
CHECKPOINT
修改后的值=24576
postgres@digoal-> pg_controldata |grep OID
Latest checkpoint's NextOID: 24576
这个值刚好等于16384+8192,验证了前面的代码,一次获取8192个值。
正常关闭数据库的时候,这个值又会被修改到真实的值,即16387。
如果是非正常关闭,那么预分配的OID会跳掉,不过没有关系,因为OID是循环使用的,以后还有可能被用到跳过的OID。
postgres=# select 16384+8192;
?column?
----------
24576
(1 row)
再次创建一个T2表,查看pg_class.oid
postgres=# create table t2(id int);
CREATE TABLE
postgres=# select oid from pg_class where relname='t2';
oid
-------
16387
(1 row)
因为建表时,默认会自动创建表结构对应的复合类型以及复合类型的数组类型。
postgres=# select * from pg_type where oid=16389;
typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | typinput | typoutput | typreceive | typsend | typmodin | typmodout | typan
alyze | typalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typcollation | typdefaultbin | typdefault | typacl
---------+--------------+----------+--------+----------+---------+-------------+----------------+--------------+----------+----------+---------+----------+-----------+------------+-------------+-------------+----------+-----------+------
------+----------+------------+------------+-------------+-----------+----------+--------------+---------------+------------+--------
t2 | 2200 | 10 | -1 | f | c | C | f | t | , | 16387 | 0 | 16388 | record_in | record_out | record_recv | record_send | - | - | -
| d | x | f | 0 | -1 | 0 | 0 | | |
(1 row)
postgres=# select * from pg_type where oid=16388;
typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | typinput | typoutput | typreceive | typsend | typmodin | typmodout | typana
lyze | typalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typcollation | typdefaultbin | typdefault | typacl
---------+--------------+----------+--------+----------+---------+-------------+----------------+--------------+----------+----------+---------+----------+----------+-----------+------------+------------+----------+-----------+----------
--------+----------+------------+------------+-------------+-----------+----------+--------------+---------------+------------+--------
_t2 | 2200 | 10 | -1 | f | b | A | f | t | , | 0 | 16389 | 0 | array_in | array_out | array_recv | array_send | - | - | array_typ
analyze | d | x | f | 0 | -1 | 0 | 0 | | |
(1 row)
所以两个类型又消耗了2个OID。
再次创建T3表,你会发现这个表的OID和T2表的pg_class.oid不是连续的,原因就在此。
postgres=# create table t3(id int);
CREATE TABLE
postgres=# select oid from pg_class where relname='t3';
oid
-------
16390
(1 row)
再次修改控制文件的OID
postgres@digoal-> pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
postgres@digoal-> pg_resetxlog -o 100 $PGDATA
Transaction log reset
postgres@digoal-> pg_ctl start
server starting
postgres@digoal-> 0LOG: 00000: redirecting log output to logging collector process
0HINT: Future log output will appear in directory "/data03/pg_log_1921".
0LOCATION: SysLogger_Start, syslogger.c:645
创建表时,用了16384这个OID,这个OID在pg_class中一定是唯一的,因为pg_class.oid有唯一约束,
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
前面分析过代码了,PG会处理唯一约束。
postgres@digoal-> psql
psql (9.4.4)
Type "help" for help.
postgres=# create table t4(id int);
CREATE TABLE
postgres=# select oid from pg_class where relname='t4';
oid
-------
16384
(1 row)
在来看一个例子,就是如果OID上没有唯一约束,是可能存进去重复的OID的,如下:
postgres@digoal-> pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
postgres@digoal-> pg_resetxlog -o 100 $PGDATA
Transaction log reset
postgres@digoal-> pg_ctl start
server starting
postgres@digoal-> 0LOG: 00000: redirecting log output to logging collector process
0HINT: Future log output will appear in directory "/data03/pg_log_1921".
0LOCATION: SysLogger_Start, syslogger.c:645
postgres@digoal-> psql
psql (9.4.4)
Type "help" for help.
postgres=# \d+ t1
Table "public.t1"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
id | integer | | plain | |
Has OIDs: yes
postgres=# insert into t1 values (100);
INSERT 16384 1
postgres=# select oid,* from t1;
oid | id
--------+-----
186772 | 1
186773 | 1
16384 | 1
16385 | 1
16386 | 1
16384 | 100
(6 rows)
如果t1表的oid上面建了唯一约束,就不会存进去重复值,获取OID也不会报错,因为PG已经帮你处理了。
postgres=# \d+ t6
Table "public.t6"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
id | integer | | plain | |
Indexes:
"i" UNIQUE, btree (oid)
Has OIDs: yes
INSERT 16397 1
postgres=# insert into t6 values (1);
INSERT 16398 1
OID 16399 被跳过.
postgres=# insert into t6 values (1);
INSERT 16400 1
postgres=# insert into t6 values (1);
INSERT 16401 1
小结
1. 每个系统表都可以使用约2^32-1个OID(排除16384之前的OID),PostgreSQL 在获取OID时,会判断需要OID的表是否为
表的oid允许重复,
我们在PostgreSQL中最多可以建多少个表呢?2^32-1吗?
否。
因为pg_type里面每个表对应了2个类型,所以最多只能创建2^31-1个表。
2. 凡是with oids的表,只要这个表的oid字段上面包含了唯一约束,PG就会帮你处理nextoid请求,确保不出现违反唯一约束的oid。但是如果oid上面没有唯一索引或唯一约束,oid就有可能重复哦。