为什么PostgreSQL UUID-OSSP可能产生重复值?

5 minute read

背景

PostgreSQL提供了一个插件来产生UUID值,但是,某些情况下是否有可能产生重复的值呢?

答案是有可能,从代码分析就知道原因了。

这里要分BSD平台和Linux平台来讲。

首先我们看看用来产生UUID的函数,其中3个是不带参数的。

postgres=# \df+ uuid_generate_v1  
                                                                     List of functions  
 Schema |       Name       | Result data type | Argument data types |  Type  | Security | Volatility |  Owner   | Language |   Source code    | Description   
--------+------------------+------------------+---------------------+--------+----------+------------+----------+----------+------------------+-------------  
 public | uuid_generate_v1 | uuid             |                     | normal | invoker  | volatile   | postgres | c        | uuid_generate_v1 |   
(1 row)  
  
postgres=# \df+ uuid_generate_v1mc  
                                                                       List of functions  
 Schema |        Name        | Result data type | Argument data types |  Type  | Security | Volatility |  Owner   | Language |    Source code     | Description   
--------+--------------------+------------------+---------------------+--------+----------+------------+----------+----------+--------------------+-------------  
 public | uuid_generate_v1mc | uuid             |                     | normal | invoker  | volatile   | postgres | c        | uuid_generate_v1mc |   
(1 row)  
  
postgres=# \df+ uuid_generate_v4  
                                                                     List of functions  
 Schema |       Name       | Result data type | Argument data types |  Type  | Security | Volatility |  Owner   | Language |   Source code    | Description   
--------+------------------+------------------+---------------------+--------+----------+------------+----------+----------+------------------+-------------  
 public | uuid_generate_v4 | uuid             |                     | normal | invoker  | volatile   | postgres | c        | uuid_generate_v4 |   
(1 row)  

还有2个带参数的不列举。我们只看以上这三个。对应的函数分别是:

uuid_generate_v1   
uuid_generate_v1mc  
uuid_generate_v4  

在uuid-ossp.c中可以看到他们分别调用了

	return uuid_generate_internal(UUID_MAKE_V1, NULL, NULL, 0);  
        return uuid_generate_internal(UUID_MAKE_V1 | UUID_MAKE_MC, NULL,  
                                                                  buf, 13);  
        return uuid_generate_internal(UUID_MAKE_V4, NULL, NULL, 0);  
  
  
/* Define some constants like OSSP's, to make the code more readable */  
#ifndef HAVE_UUID_OSSP  
#define UUID_MAKE_MC 0  
#define UUID_MAKE_V1 1  
#define UUID_MAKE_V2 2  
#define UUID_MAKE_V3 3  
#define UUID_MAKE_V4 4  
#define UUID_MAKE_V5 5  
#endif  

对于uuid_generate_internal来说,分两种情况产生UUID,是否定义了HAVE_UUID_OSSP

if def HAVE_UUID_OSSP  
    get_cached_uuid_t  
else  
    uuid_generate_time  
    uuid_generate_random  

你可以在PostgreSQL编译产生的日志config.log中看到,我这里是定义了的。

所以调用了get_cached_uuid_t,看看这个函数的介绍吧,实际上会调用uuid-ossp的uuid_create来创建uuid。

需要翻看uuid.c(包含在uuid-1.6.2的代码中)

如下:

它实际上包含了rpng,md5,sha-1,mac,时间以及序列的信息。最后封装成UUID格式。

/* UUID binary representation according to UUID standards */  
typedef struct {  
    uuid_uint32_t  time_low;                  /* bits  0-31 of time field */  
    uuid_uint16_t  time_mid;                  /* bits 32-47 of time field */  
    uuid_uint16_t  time_hi_and_version;       /* bits 48-59 of time field plus 4 bit version */  
    uuid_uint8_t   clock_seq_hi_and_reserved; /* bits  8-13 of clock sequence field plus 2 bit variant */  
    uuid_uint8_t   clock_seq_low;             /* bits  0-7  of clock sequence field */  
    uuid_uint8_t   node[IEEE_MAC_OCTETS];     /* bits  0-47 of node MAC address */  
} uuid_obj_t;  
  
/* abstract data type (ADT) of API */  
struct uuid_st {  
    uuid_obj_t     obj;                       /* inlined UUID object */  
    prng_t        *prng;                      /* RPNG sub-object */  
    md5_t         *md5;                       /* MD5 sub-object */  
    sha1_t        *sha1;                      /* SHA-1 sub-object */  
    uuid_uint8_t   mac[IEEE_MAC_OCTETS];      /* pre-determined MAC address */   / 以上用来保证全球唯一 /  
    struct timeval time_last;                 /* last retrieved timestamp */  本机可能产生重复的地方  
    unsigned long  time_seq;                  /* last timestamp sequence counter */  本机可能产生重复的位置,例如同一时间可以产生多个UUID时。  
};  
  
/* create UUID object */  
uuid_rc_t uuid_create(uuid_t **uuid)  
{  
    uuid_t *obj;  
  
    /* argument sanity check */  
    if (uuid == NULL)  
        return UUID_RC_ARG;  
  
    /* allocate UUID object */  
    if ((obj = (uuid_t *)malloc(sizeof(uuid_t))) == NULL)  
        return UUID_RC_MEM;  
  
    /* create PRNG, MD5 and SHA1 sub-objects */  
    if (prng_create(&obj->prng) != PRNG_RC_OK) {  
        free(obj);  
        return UUID_RC_INT;  
    }  
    if (md5_create(&obj->md5) != MD5_RC_OK) {  
        (void)prng_destroy(obj->prng);  
        free(obj);  
        return UUID_RC_INT;  
    }  
    if (sha1_create(&obj->sha1) != SHA1_RC_OK) {  
        (void)md5_destroy(obj->md5);  
        (void)prng_destroy(obj->prng);  
        free(obj);  
        return UUID_RC_INT;  
    }  
  
    /* set UUID object initially to "Nil UUID" */  
    if (uuid_load(obj, "nil") != UUID_RC_OK) {  
        (void)sha1_destroy(obj->sha1);  
        (void)md5_destroy(obj->md5);  
        (void)prng_destroy(obj->prng);  
        free(obj);  
        return UUID_RC_INT;  
    }  
  
    /* resolve MAC address for insertion into node field of UUIDs */  
    if (!mac_address((unsigned char *)(obj->mac), sizeof(obj->mac))) {  
        memset(obj->mac, 0, sizeof(obj->mac));  
        obj->mac[0] = BM_OCTET(1,0,0,0,0,0,0,0);  
    }  
  
    /* initialize time attributes */  
    obj->time_last.tv_sec  = 0;  
    obj->time_last.tv_usec = 0;  
    obj->time_seq = 0;  
  
    /* store result object */  
    *uuid = obj;  
  
    return UUID_RC_OK;  
}  

uuid_time.c

struct timeval { long tv_sec; long tv_usec; };  

get_cached_uuid_t代码如下

uuid-ossp.c

 * We create a uuid_t object just once per session and re-use it for all  
 * operations in this module.  OSSP UUID caches the system MAC address and  
 * other state in this object.  Reusing the object has a number of benefits:  
 * saving the cycles needed to fetch the system MAC address over and over,  
 * reducing the amount of entropy we draw from /dev/urandom, and providing a  
 * positive guarantee that successive generated V1-style UUIDs don't collide.  
 * (On a machine fast enough to generate multiple UUIDs per microsecond,  
 * or whatever the system's wall-clock resolution is, we'd otherwise risk  
 * collisions whenever random initialization of the uuid_t's clock sequence  
 * value chanced to produce duplicates.)  

这里有提到,当机器每微秒可以产生多个UUID时,在多个进程中有可能产生重复值。

原因就是前面对uuid.c的分析。因为本机唯一码必须确保同一个微秒内不能产生多个UUID,如果可以的话,就不要并行产生。

static uuid_t *  
get_cached_uuid_t(int which)  
{  
        static uuid_t *cached_uuid[2] = {NULL, NULL};  
  
        if (cached_uuid[which] == NULL)  
        {  
                uuid_rc_t       rc;  
  
                rc = uuid_create(&cached_uuid[which]);  
                if (rc != UUID_RC_OK)  
                {  
                        cached_uuid[which] = NULL;  
                        pguuid_complain(rc);  
                }  
        }  
        return cached_uuid[which];  
}  

另一种情况是没有定义HAVE_UUID_OSSP,则需要调用操作系统的uuid_generate_time或uuid_generate_random来产生UUID。这两个函数在并发场景中同样有可能产生重复值,原因见参考部分的分析。

参考

1. man uuid_generate_random

UUID_GENERATE(3)                  Libuuid API                 UUID_GENERATE(3)  
  
NAME  
       uuid_generate, uuid_generate_random, uuid_generate_time, uuid_generate_time_safe - create a new unique UUID value  
  
SYNOPSIS  
       #include <uuid/uuid.h>  
  
       void uuid_generate(uuid_t out);  
       void uuid_generate_random(uuid_t out);  
       void uuid_generate_time(uuid_t out);  
       int uuid_generate_time_safe(uuid_t out);  
  
DESCRIPTION  
       The  uuid_generate  function  creates  a new universally unique identifier (UUID).  The uuid will be generated based on high-quality randomness from /dev/urandom, if available.  If it is not available, then  
       uuid_generate will use an alternative algorithm which uses the current time, the local ethernet MAC address (if available), and random data generated using a pseudo-random generator.  
  
       The uuid_generate_random function forces the use of the all-random UUID format, even if a high-quality random number generator (i.e., /dev/urandom) is not available, in which case a pseudo-random  generator  will be substituted.    // 如果没有高品质的随机值产生器,则会使用模拟的随机值产生器替代。  
        Note that the use of a pseudo-random generator may compromise the uniqueness of UUIDs generated in this fashion.  // 这段话表明uuid_generate_random 函数如果不使用高品质的random随机值产生器,而使用模拟的random值产生器,则可能导致产生重复的值。    
  
       The  uuid_generate_time function forces the use of the alternative algorithm which uses the current time and the local ethernet MAC address (if available).    
        This algorithm used to be the default one used to generate UUID, but because of the use of the ethernet MAC address, it can leak information about when and where the UUID was generated.    
       This  can  cause  privacy  problems  in  some  applications,  so  the uuid_generate  function only uses this algorithm if a high-quality source of randomness is not available.    
        To guarantee uniqueness of UUIDs generated by concurrently running processes, the uuid library uses global clock state counter (if the process has permissions to gain exclusive access to this file) and/or the uuidd daemon,(如果要并发产生唯一的UUID,必须使用全局的时钟计数器,或者是同一个UUIDD后台进程) if it is running already or can be be spawned by the process (if installed and  the process  has enough permissions to run it).  If neither of these two synchronization mechanisms can be used, it is theoretically possible that two concurrently running processes obtain the same UUID(s). (如果两只都不能满足,则并发产生UUID可能会出现重复值)   
       To tell whether the UUID has been generated in a safe manner, use uuid_generate_time_safe.  
  
       The uuid_generate_time_safe is similar to uuid_generate_time, except that it returns a value which denotes whether any of the synchronization mechanisms (see above) has been used. (使用uuid_generate_time_safe 接口可以通过返回值知道并发的产生UUID是否不会有重复值,但是不能保证并发不重复),PG可以利用这点给用户发警告。  
  
       The UUID is 16 bytes (128 bits) long, which gives approximately 3.4x10^38 unique values (there are approximately 10^80 elementary particles in the universe according to Carl Sagan’s Cosmos).  The  new  UUID  
       can reasonably be considered unique among all UUIDs created on the local system, and among UUIDs created on other systems in the past and in the future.  
  
RETURN VALUE  
       The newly created UUID is returned in the memory location pointed to by out.  uuid_generate_time_safe returns zero if the UUID has been generated in a safe manner, -1 otherwise.  
  
CONFORMING TO  
       OSF DCE 1.1  
  
AUTHOR  
       Theodore Y. Ts’o  
  
AVAILABILITY  
       libuuid is part of the util-linux-ng package since version 2.15.1 and is available from ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/.  
  
SEE ALSO  
       uuid(3), uuidgen(1), uuidd(8), uuid_clear(3), uuid_compare(3), uuid_copy(3), uuid_is_null(3), uuid_parse(3), uuid_time(3), uuid_unparse(3)  
  
util-linux-ng                      May 2009                   UUID_GENERATE(3)  

Oracle貌似也有重复的问题。

http://sqlfascination.com/2012/01/22/oracle-duplicate-guid-values-being-returned-from-sys_guid-when-run-in-parallel/

Flag Counter

digoal’s 大量PostgreSQL文章入口