PostgreSQL 中生成随机汉字
背景
汉字unicode编码范围
php中utf-8编码下用正则表达式匹配汉字的最终正确表达式 ——   
/[\x{4e00}-\x{9fa5}]/u  
\u4e00-\u9fa5 (中文)  
\x3130-\x318F (韩文)  
\xAC00-\xD7A3 (韩文)  
\u0800-\u4e00 (日文)  
4e00 对应 19968
9fa5 对应 40869
一共 20901 个汉字
例子1
- 注意,为了保证能输出所有的汉字,确保数据库的编码为UTF8,否则会报类似错误。
db=# select '\u9fa5'::text;  
ERROR:  Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8 at or near "'\u9fa5"  
LINE 1: select '\u9fa5'::text;  
               ^  
例子
set standard_conforming_strings =off;  
set escape_string_warning=off;  
  
postgres=# select '\u9fa5'::text;  
 text   
------  
 龥  
(1 row)  
例子2
除了unicode的写法,PostgreSQL还提供了两个函数,支持数值的写法。
| Function | Return Type | Description | Example | Result | 
|---|---|---|---|---|
| ascii(string) | int | ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character. | ascii(‘x’) | 120 | 
| chr(int) | text | Character with the given code. For UTF8 the argument is treated as a Unicode code point. For other multibyte encodings the argument must designate an ASCII character. The NULL (0) character is not allowed because text data types cannot store such bytes. | chr(65) | A | 
postgres=# select chr(19968);  
 chr   
-----  
 一  
(1 row)  
输出所有汉字
例子
do language plpgsql $$  
declare  
  res text := '';  
begin  
  for i in 19968..40869 loop  
    res := res||chr(i);  
  end loop;  
  raise notice '%', res;  
end;  
$$;  
  
NOTICE:  一丁丂七丄丅丆万丈三上下丌不与丏丐丑丒专且丕世丗丘丙业丛东丝.............................  
输出随机汉字
例子
创建函数
create or replace function gen_hanzi(int) returns text as $$  
declare  
  res text;  
begin  
  if $1 >=1 then  
    select string_agg(chr(19968+(random()*20901)::int), '') into res from generate_series(1,$1);  
    return res;  
  end if;  
  return null;  
end;  
$$ language plpgsql strict;  
使用函数,生成随机汉字
postgres=# select gen_hanzi(10);  
      gen_hanzi         
----------------------  
 析埲錀噝穎灯嬪閸醿厧  
(1 row)  
  
postgres=# select gen_hanzi(10);  
      gen_hanzi         
----------------------  
 仫哸擡襖批梹繜嚪隶別  
(1 row)  
参考
http://baike.baidu.com/item/Unicode
http://unicode.org/charts/
https://en.wikipedia.org/wiki/Unicode
《PostgreSQL 转义、UNICODE、与SQL注入》
http://stackoverflow.com/questions/3970795/how-do-you-create-a-random-string-thats-suitable-for-a-session-id-in-postgresql
https://www.postgresql.org/docs/9.6/static/functions-string.html
