PostgreSQL OID 源码分析

10 minute read

背景

PostgreSQL的许多系统表都用到了OID这个数据类型,OID是一个uint类型,最大可以存储的值为2^32-1

/*  
 * Object ID is a fundamental type in Postgres.  
 */  
typedef unsigned int Oid;  

很多系统表使用OID作为主键,PostgreSQL 使用GetNewOid函数获取下一个可用的OID,那么怎么保证获取到的OID是可用的呢,并且确保不重复即不违反唯一约束?

例如,我们使用pg_resetxlog可以重置NEXTOid,如果被重置后,又是如何获取一个不会违反唯一约束的OID呢?

看代码最直接,获取OID的底层函数如下,第一次时,一次性获取VAR_OID_PREFETCH = 8192个作为缓存,

access/transam/varsup.c:#define VAR_OID_PREFETCH                8192  

src/backend/access/transam/varsup.c

/*  
 * GetNewObjectId -- allocate a new OID  
 *  
 * OIDs are generated by a cluster-wide counter.  Since they are only 32 bits  
 * wide, counter wraparound will occur eventually, and therefore it is unwise  
 * to assume they are unique unless precautions are taken to make them so.  
 * Hence, this routine should generally not be used directly.  The only  
 * direct callers should be GetNewOid() and GetNewRelFileNode() in  
 * catalog/catalog.c.  
 */  
Oid  
GetNewObjectId(void)  
{  
        Oid                     result;  
  
        /* safety check, we should never get this far in a HS slave */  
        if (RecoveryInProgress())  
                elog(ERROR, "cannot assign OIDs during recovery");  
  
        LWLockAcquire(OidGenLock, LW_EXCLUSIVE);  
  
        /*  
         * Check for wraparound of the OID counter.  We *must* not return 0  
         * (InvalidOid); and as long as we have to check that, it seems a good  
         * idea to skip over everything below FirstNormalObjectId too. (This  
         * basically just avoids lots of collisions with bootstrap-assigned OIDs  
         * right after a wrap occurs, so as to avoid a possibly large number of  
         * iterations in GetNewOid.)  Note we are relying on unsigned comparison.  
         *  
         * During initdb, we start the OID generator at FirstBootstrapObjectId, so  
         * we only wrap if before that point when in bootstrap or standalone mode.  
         * The first time through this routine after normal postmaster start, the  
         * counter will be forced up to FirstNormalObjectId.  This mechanism  
         * leaves the OIDs between FirstBootstrapObjectId and FirstNormalObjectId  
         * available for automatic assignment during initdb, while ensuring they  
         * will never conflict with user-assigned OIDs.  
         */  
        if (ShmemVariableCache->nextOid < ((Oid) FirstNormalObjectId))    // 处理非常规OID,改为常规的OID,起到保护作用  
        {  
                if (IsPostmasterEnvironment)      
                {  
                        /* wraparound, or first post-initdb assignment, in normal mode */  
                        ShmemVariableCache->nextOid = FirstNormalObjectId;  
                        ShmemVariableCache->oidCount = 0;  
                }  
                else  
                {  
                        /* we may be bootstrapping, so don't enforce the full range */  
                        if (ShmemVariableCache->nextOid < ((Oid) FirstBootstrapObjectId))  
                        {  
                                /* wraparound in standalone mode (unlikely but possible) */  
                                ShmemVariableCache->nextOid = FirstNormalObjectId;  
                                ShmemVariableCache->oidCount = 0;  
                        }  
                }  
        }  
  
        /* If we run out of logged for use oids then we must log more */  
        if (ShmemVariableCache->oidCount == 0)  // 当一次获取的缓存用完后,重新获取8192个oid  
        {  
                XLogPutNextOid(ShmemVariableCache->nextOid + VAR_OID_PREFETCH);  // 一次获取多个  
                ShmemVariableCache->oidCount = VAR_OID_PREFETCH;  // 将计数器置为8192  
        }  
  
        result = ShmemVariableCache->nextOid;  
  
        (ShmemVariableCache->nextOid)++;  // 每次使用一个oid后nextOid加1  
        (ShmemVariableCache->oidCount)--;  // 对应的count减1  
  
        LWLockRelease(OidGenLock);  
  
        return result;  
}  

16384是第一个数据库启动后,用户可以使用的OID。

src/include/access/transam.h

/* ----------  
 *              Object ID (OID) zero is InvalidOid.  
 *  
 *              OIDs 1-9999 are reserved for manual assignment (see the files  
 *              in src/include/catalog/).  
 *  
 *              OIDS 10000-16383 are reserved for assignment during initdb  
 *              using the OID generator.  (We start the generator at 10000.)  
 *  
 *              OIDs beginning at 16384 are assigned from the OID generator  
 *              during normal multiuser operation.  (We force the generator up to  
 *              16384 as soon as we are in normal operation.)  
 *  
 * The choices of 10000 and 16384 are completely arbitrary, and can be moved  
 * if we run low on OIDs in either category.  Changing the macros below  
 * should be sufficient to do this.  
 *  
 * NOTE: if the OID generator wraps around, we skip over OIDs 0-16383  
 * and resume with 16384.  This minimizes the odds of OID conflict, by not  
 * reassigning OIDs that might have been assigned during initdb.  
 * ----------  
 */  
#define FirstBootstrapObjectId  10000  
#define FirstNormalObjectId             16384  

下面是获取OID的面向上层的接口,这里有保护作用。

例如我们前面的疑问,如何确保获取的OID不违反唯一约束。

src/backend/catalog/catalog.c

/*  
 * GetNewOid  
 *              Generate a new OID that is unique within the given relation.  
 *  
 * Caller must have a suitable lock on the relation.  
 *  
 * Uniqueness is promised only if the relation has a unique index on OID.  
 * This is true for all system catalogs that have OIDs, but might not be  
 * true for user tables.  Note that we are effectively assuming that the  
 * table has a relatively small number of entries (much less than 2^32)  
 * and there aren't very long runs of consecutive existing OIDs.  Again,  
 * this is reasonable for system catalogs but less so for user tables.  
 *  
 * Since the OID is not immediately inserted into the table, there is a  
 * race condition here; but a problem could occur only if someone else  
 * managed to cycle through 2^32 OIDs and generate the same OID before we  
 * finish inserting our row.  This seems unlikely to be a problem.  Note  
 * that if we had to *commit* the row to end the race condition, the risk  
 * would be rather higher; therefore we use SnapshotDirty in the test,  
 * so that we will see uncommitted rows.  
 */  
Oid  
GetNewOid(Relation relation)  
{  
        Oid                     oidIndex;  
  
        /* If relation doesn't have OIDs at all, caller is confused */  
        Assert(relation->rd_rel->relhasoids);  
  
        /* In bootstrap mode, we don't have any indexes to use */  
        if (IsBootstrapProcessingMode())  
                return GetNewObjectId();  
  
        /* The relcache will cache the identity of the OID index for us */  
        oidIndex = RelationGetOidIndex(relation);     // 获取OID的对象的OID列的唯一索引的pg_class.oid  
  
        /* If no OID index, just hand back the next OID counter value */  
        if (!OidIsValid(oidIndex))   // 如果OID不为InvalidOid ,说明这个需要获取oid的对象的OID列上面没有唯一索引  
        {  
                /*  
                 * System catalogs that have OIDs should *always* have a unique OID  
                 * index; we should only take this path for user tables. Give a  
                 * warning if it looks like somebody forgot an index.  
                 */  
                if (IsSystemRelation(relation))  
                        elog(WARNING, "generating possibly-non-unique OID for \"%s\"",  
                                 RelationGetRelationName(relation));  
  
                return GetNewObjectId();  // 不考虑重复问题,因为OID列没有唯一索引  
        }  
  
        /* Otherwise, use the index to find a nonconflicting OID */  
        return GetNewOidWithIndex(relation, oidIndex, ObjectIdAttributeNumber);  // 考虑重复  
}  

获取OID的对象的OID列的唯一索引的pg_class.oid,有唯一索引,返回pg_class.oid,没有唯一索引,返回InvalidOid 。

/*  
 * RelationGetOidIndex -- get the pg_class OID of the relation's OID index  
 *  
 * Returns InvalidOid if there is no such index.  
 */  
Oid  
RelationGetOidIndex(Relation relation)  
{  
	List	   *ilist;  
  
	/*  
	 * If relation doesn't have OIDs at all, caller is probably confused. (We  
	 * could just silently return InvalidOid, but it seems better to throw an  
	 * assertion.)  
	 */  
	Assert(relation->rd_rel->relhasoids);  
  
	if (relation->rd_indexvalid == 0)  
	{  
		/* RelationGetIndexList does the heavy lifting. */  
		ilist = RelationGetIndexList(relation);  
		list_free(ilist);  
		Assert(relation->rd_indexvalid != 0);  
	}  
  
	return relation->rd_oidindex;   // 返回OID列对应的唯一索引的pg_class.OID  
}  
  
typedef struct RelationData  
{  
......  
	Oid			rd_oidindex;	/* OID of unique index on OID, if any */    
......  
}  

对于OID列有唯一索引的表,需要调用这个函数来获取OID,确保不会获得违反唯一约束的OID。

例如pg_class.oid列有唯一索引,那么在向pg_class插入数据时,需要使用GetNewOidWithIndex来获得next oid.

/*  
 * GetNewOidWithIndex  
 *              Guts of GetNewOid: use the supplied index  
 *  
 * This is exported separately because there are cases where we want to use  
 * an index that will not be recognized by RelationGetOidIndex: TOAST tables  
 * have indexes that are usable, but have multiple columns and are on  
 * ordinary columns rather than a true OID column.  This code will work  
 * anyway, so long as the OID is the index's first column.  The caller must  
 * pass in the actual heap attnum of the OID column, however.  
 *  
 * Caller must have a suitable lock on the relation.  
 */  
Oid  
GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)  
{  
        Oid                     newOid;  
        SnapshotData SnapshotDirty;  
        SysScanDesc scan;  
        ScanKeyData key;  
        bool            collides;  
  
        InitDirtySnapshot(SnapshotDirty);  
  
        /* Generate new OIDs until we find one not in the table */  // 获取一个不会违反唯一约束的OID  
        do  
        {  
                CHECK_FOR_INTERRUPTS();  
  
                newOid = GetNewObjectId();  
  
                ScanKeyInit(&key,  
                                        oidcolumn,  
                                        BTEqualStrategyNumber, F_OIDEQ,  
                                        ObjectIdGetDatum(newOid));  
  
                /* see notes above about using SnapshotDirty */  
                scan = systable_beginscan(relation, indexId, true,  
                                                                  &SnapshotDirty, 1, &key);  
  
                collides = HeapTupleIsValid(systable_getnext(scan));  
  
                systable_endscan(scan);  
        } while (collides);  
  
        return newOid;  
}  

例子:

postgres=# select max(oid) from pg_class;  
  max     
--------  
 186766  
(1 row)  

从控制文件获得 NextOID = 186769

postgres@digoal-> pg_controldata   
Latest checkpoint's NextOID:          186769  

创建一个包含oid的表,并且oid列上面没有建唯一索引。

postgres=# create table t1(id int) with oids;  
CREATE TABLE  
postgres=# insert into t1 values (1);  
INSERT 186772 1  
postgres=# select oid,* from t1;  
  oid   | id   
--------+----  
 186772 |  1  
(1 row)  
postgres=# insert into t1 values (1);  
INSERT 186773 1  
postgres=# select oid,* from t1;  
  oid   | id   
--------+----  
 186772 |  1  
 186773 |  1  
(2 rows)  

修改控制文件,把nextoid改为100,小于normaloid的一个值。

postgres@digoal-> pg_ctl stop -m fast  
waiting for server to shut down.... done  
server stopped  
  
postgres@digoal-> pg_resetxlog --help  
pg_resetxlog resets the PostgreSQL transaction log.  
  
Usage:  
  pg_resetxlog [OPTION]... DATADIR  
  
Options:  
  -e XIDEPOCH      set next transaction ID epoch  
  -f               force update to be done  
  -l XLOGFILE      force minimum WAL starting location for new transaction log  
  -m MXID,MXID     set next and oldest multitransaction ID  
  -n               no update, just show what would be done (for testing)  
  -o OID           set next OID  
  -O OFFSET        set next multitransaction offset  
  -V, --version    output version information, then exit  
  -x XID           set next transaction ID  
  -?, --help       show this help, then exit  
  
Report bugs to <pgsql-bugs@postgresql.org>.  
postgres@digoal-> pg_resetxlog -o 100 $PGDATA  
Transaction log reset  

启动数据库后,插入T1表,我们看到获取到的oid是16384即从FirstNormalObjectId 开始的第一个值。

postgres=# insert into t1 values (1);  
INSERT 16384 1  
postgres=# insert into t1 values (1);  
INSERT 16385 1  
postgres=# insert into t1 values (1);  
INSERT 16386 1  

控制文件在做完检查点后会被修改。

postgres=# \q  
postgres@digoal-> pg_controldata |grep OID  
Latest checkpoint's NextOID:          100  
  
postgres=# checkpoint;  
CHECKPOINT  

修改后的值=24576

postgres@digoal-> pg_controldata |grep OID  
Latest checkpoint's NextOID:          24576  

这个值刚好等于16384+8192,验证了前面的代码,一次获取8192个值。

正常关闭数据库的时候,这个值又会被修改到真实的值,即16387。

如果是非正常关闭,那么预分配的OID会跳掉,不过没有关系,因为OID是循环使用的,以后还有可能被用到跳过的OID。

postgres=# select 16384+8192;  
 ?column?   
----------  
    24576  
(1 row)  

再次创建一个T2表,查看pg_class.oid

postgres=# create table t2(id int);  
CREATE TABLE  
postgres=# select oid from pg_class where relname='t2';  
  oid    
-------  
 16387  
(1 row)  

因为建表时,默认会自动创建表结构对应的复合类型以及复合类型的数组类型。

postgres=# select * from pg_type where oid=16389;  
 typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | typinput  | typoutput  | typreceive  |   typsend   | typmodin | typmodout | typan  
alyze | typalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typcollation | typdefaultbin | typdefault | typacl   
---------+--------------+----------+--------+----------+---------+-------------+----------------+--------------+----------+----------+---------+----------+-----------+------------+-------------+-------------+----------+-----------+------  
------+----------+------------+------------+-------------+-----------+----------+--------------+---------------+------------+--------  
 t2      |         2200 |       10 |     -1 | f        | c       | C           | f              | t            | ,        |    16387 |       0 |    16388 | record_in | record_out | record_recv | record_send | -        | -         | -      
      | d        | x          | f          |           0 |        -1 |        0 |            0 |               |            |   
(1 row)  
postgres=# select * from pg_type where oid=16388;  
 typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | typinput | typoutput | typreceive |  typsend   | typmodin | typmodout |    typana  
lyze    | typalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typcollation | typdefaultbin | typdefault | typacl   
---------+--------------+----------+--------+----------+---------+-------------+----------------+--------------+----------+----------+---------+----------+----------+-----------+------------+------------+----------+-----------+----------  
--------+----------+------------+------------+-------------+-----------+----------+--------------+---------------+------------+--------  
 _t2     |         2200 |       10 |     -1 | f        | b       | A           | f              | t            | ,        |        0 |   16389 |        0 | array_in | array_out | array_recv | array_send | -        | -         | array_typ  
analyze | d        | x          | f          |           0 |        -1 |        0 |            0 |               |            |   
(1 row)  

所以两个类型又消耗了2个OID。

再次创建T3表,你会发现这个表的OID和T2表的pg_class.oid不是连续的,原因就在此。

postgres=# create table t3(id int);  
CREATE TABLE  
postgres=# select oid from pg_class where relname='t3';  
  oid    
-------  
 16390  
(1 row)  

再次修改控制文件的OID

postgres@digoal-> pg_ctl stop -m fast  
waiting for server to shut down.... done  
server stopped  
postgres@digoal-> pg_resetxlog -o 100 $PGDATA  
Transaction log reset  
postgres@digoal-> pg_ctl start  
server starting  
postgres@digoal->  0LOG:  00000: redirecting log output to logging collector process  
 0HINT:  Future log output will appear in directory "/data03/pg_log_1921".  
 0LOCATION:  SysLogger_Start, syslogger.c:645  

创建表时,用了16384这个OID,这个OID在pg_class中一定是唯一的,因为pg_class.oid有唯一约束,

Indexes:  
    "pg_class_oid_index" UNIQUE, btree (oid)  

前面分析过代码了,PG会处理唯一约束。

postgres@digoal-> psql  
psql (9.4.4)  
Type "help" for help.  
postgres=# create table t4(id int);  
CREATE TABLE  
postgres=# select oid from pg_class where relname='t4';  
  oid    
-------  
 16384  
(1 row)  

在来看一个例子,就是如果OID上没有唯一约束,是可能存进去重复的OID的,如下:

postgres@digoal-> pg_ctl stop -m fast  
waiting for server to shut down.... done  
server stopped  
postgres@digoal-> pg_resetxlog -o 100 $PGDATA  
Transaction log reset  
postgres@digoal-> pg_ctl start  
server starting  
postgres@digoal->  0LOG:  00000: redirecting log output to logging collector process  
 0HINT:  Future log output will appear in directory "/data03/pg_log_1921".  
 0LOCATION:  SysLogger_Start, syslogger.c:645  
  
postgres@digoal-> psql  
psql (9.4.4)  
Type "help" for help.  
postgres=# \d+ t1  
                          Table "public.t1"  
 Column |  Type   | Modifiers | Storage | Stats target | Description   
--------+---------+-----------+---------+--------------+-------------  
 id     | integer |           | plain   |              |   
Has OIDs: yes  
postgres=# insert into t1 values (100);  
INSERT 16384 1  
postgres=# select oid,* from t1;  
  oid   | id    
--------+-----  
 186772 |   1  
 186773 |   1  
  16384 |   1  
  16385 |   1  
  16386 |   1  
  16384 | 100  
(6 rows)  

如果t1表的oid上面建了唯一约束,就不会存进去重复值,获取OID也不会报错,因为PG已经帮你处理了。

postgres=# \d+ t6  
                          Table "public.t6"  
 Column |  Type   | Modifiers | Storage | Stats target | Description   
--------+---------+-----------+---------+--------------+-------------  
 id     | integer |           | plain   |              |   
Indexes:  
    "i" UNIQUE, btree (oid)  
Has OIDs: yes  
  
INSERT 16397 1  
postgres=# insert into t6 values (1);  
INSERT 16398 1  

OID 16399 被跳过.

postgres=# insert into t6 values (1);  
INSERT 16400 1  
postgres=# insert into t6 values (1);  
INSERT 16401 1  

小结

1. 每个系统表都可以使用约2^32-1个OID(排除16384之前的OID),PostgreSQL 在获取OID时,会判断需要OID的表是否为

表的oid允许重复,

我们在PostgreSQL中最多可以建多少个表呢?2^32-1吗?

否。

因为pg_type里面每个表对应了2个类型,所以最多只能创建2^31-1个表。

2. 凡是with oids的表,只要这个表的oid字段上面包含了唯一约束,PG就会帮你处理nextoid请求,确保不出现违反唯一约束的oid。但是如果oid上面没有唯一索引或唯一约束,oid就有可能重复哦。

Flag Counter

digoal’s 大量PostgreSQL文章入口