PostgreSQL 10.0 preview 功能增强 - 国际化功能增强,支持ICU(International Components for Unicode)

3 minute read



ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications.   
ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.  
ICU is released under a nonrestrictive open source license that is suitable for use with both commercial software and with other open source or free software.  



Code Page Conversion: Convert text data to or from Unicode and nearly any other character set or encoding. ICU's conversion tables are based on charset data collected by IBM over the course of many decades, and is the most complete available anywhere.  
Collation: Compare strings according to the conventions and standards of a particular language, region or country. ICU's collation is based on the Unicode Collation Algorithm plus locale-specific comparison rules from the Common Locale Data Repository, a comprehensive source for this type of data.  
Formatting: Format numbers, dates, times and currency amounts according the conventions of a chosen locale. This includes translating month and day names into the selected language, choosing appropriate abbreviations, ordering fields correctly, etc. This data also comes from the Common Locale Data Repository.  
Time Calculations: Multiple types of calendars are provided beyond the traditional Gregorian calendar. A thorough set of timezone calculation APIs are provided.  
Unicode Support: ICU closely tracks the Unicode standard, providing easy access to all of the many Unicode character properties, Unicode Normalization, Case Folding and other fundamental operations as specified by the Unicode Standard.  
Regular Expression: ICU's regular expressions fully support Unicode while providing very competitive performance.  
Bidi: support for handling text containing a mixture of left to right (English) and right to left (Arabic or Hebrew) data.  
Text Boundaries: Locate the positions of words, sentences, paragraphs within a range of text, or identify locations that would be suitable for line wrapping when displaying the text.  

PostgreSQL 以前的全球化是通过glibc库来支持,受到glibc版本的影响,在更换平台时,可能影响排序或者本土化的结果。(例如windows, linux, freebsd等跨平台使用时)。


pg_collation新增了一个字段collprovider表示libc或者icu. 增加一个collversion字段,记录当时使用的ICU版本,run time时检查,确保版本一致。

ICU support  
Add a column collprovider to pg_collation that determines which library  
provides the collation data.  The existing choices are default and libc,  
and this adds an icu choice, which uses the ICU4C library.  
The pg_locale_t type is changed to a union that contains the  
provider-specific locale handles.  Users of locale information are  
changed to look into that struct for the appropriate handle to use.  
Also add a collversion column that records the version of the collation  
when it is created, and check at run time whether it is still the same.  
This detects potentially incompatible library upgrades that can corrupt  
indexes and other structures.  This is currently only supported by  
ICU-provided collations.  
initdb initializes the default collation set as before from the   
`locale-a` output but also adds all available ICU locales with a "-x-icu"  
Currently, ICU-provided collations can only be explicitly named  
collations.  The global database locales are still always libc-provided.  
ICU support is enabled by configure --with-icu.  
Reviewed-by: Thomas Munro <>  
Reviewed-by: Andreas Karlsson <>  


  11 CREATE TABLE collate_test1 (  
  12     a int,  
  13     b text COLLATE "en-x-icu" NOT NULL  
  14 );  
  16 \d collate_test1  
  18 CREATE TABLE collate_test_fail (  
  19     a int,  
  20     b text COLLATE "ja_JP.eucjp-x-icu"  
  21 );  
  23 CREATE TABLE collate_test_fail (  
  24     a int,  
  25     b text COLLATE "foo-x-icu"  
  26 );  
  28 CREATE TABLE collate_test_fail (  
  29     a int COLLATE "en-x-icu",  
  30     b text  
  31 );  
  33 CREATE TABLE collate_test_like (  
  34     LIKE collate_test1  
  35 );  
  92 -- constant expression folding  
  93 SELECT 'bbc' COLLATE "en-x-icu" > 'äbc' COLLATE "en-x-icu" AS "true";  
  94 SELECT 'bbc' COLLATE "sv-x-icu" > 'äbc' COLLATE "sv-x-icu" AS "false";  
  96 -- upper/lower  
  98 CREATE TABLE collate_test10 (  
  99     a int,  
 100     x text COLLATE "en-x-icu",  
 101     y text COLLATE "tr-x-icu"  
 102 );  




Flag Counter

digoal’s 大量PostgreSQL文章入口