如何从PostgreSQL json中提取数组
背景
在PostgreSQL中使用JSON类型时,除了能存,大家更关心的是JSON中的数据的使用,例如
1. VALUE支持哪些类型,
通过以下方法可以提取JSON最外层的VALUE的数据类型
json_typeof(json)
jsonb_typeof(jsonb)
目前支持的类型如下
object, array, string, number, boolean, and null
2. 包含哪些KEY或VALUE,通过路径提取KEY或VALUE等。
3. 通过制定KEY提取VALUE,
通常通过KEY提取的VALUE还是JSON或JSONB类型,因为JSON本身就是嵌套的,但是可以通过json typeof得知它的类型。
json_typeof(json)
jsonb_typeof(jsonb)
但是SQL还是认为他是个JSON,要么就是转换为TEXT。
postgres=# create table t3(c1 jsonb);
CREATE TABLE
postgres=# insert into t3 values ('{"a":"v","b":12,"c":{"ab":"hello"},"d":12.3,"e":true,"f":[1,2,3,4],"g":["a","b"]}');
INSERT 0 1
postgres=# select pg_typeof(col), jsonb_typeof(col),col from (select c1->'a' col from t3) t;
pg_typeof | jsonb_typeof | col
-----------+--------------+-----
jsonb | string | "v"
(1 row)
postgres=# select pg_typeof(col), jsonb_typeof(col),col from (select c1->'b' col from t3) t;
pg_typeof | jsonb_typeof | col
-----------+--------------+-----
jsonb | number | 12
(1 row)
postgres=# select pg_typeof(col), jsonb_typeof(col),col from (select c1->'c' col from t3) t;
pg_typeof | jsonb_typeof | col
-----------+--------------+-----------------
jsonb | object | {"ab": "hello"}
(1 row)
postgres=# select pg_typeof(col), jsonb_typeof(col),col from (select c1->'d' col from t3) t;
pg_typeof | jsonb_typeof | col
-----------+--------------+------
jsonb | number | 12.3
(1 row)
postgres=# select pg_typeof(col), jsonb_typeof(col),col from (select c1->'e' col from t3) t;
pg_typeof | jsonb_typeof | col
-----------+--------------+------
jsonb | boolean | true
(1 row)
postgres=# select pg_typeof(col), jsonb_typeof(col),col from (select c1->'f' col from t3) t;
pg_typeof | jsonb_typeof | col
-----------+--------------+--------------
jsonb | array | [1, 2, 3, 4]
(1 row)
postgres=# select pg_typeof(col), jsonb_typeof(col),col from (select c1->'g' col from t3) t;
pg_typeof | jsonb_typeof | col
-----------+--------------+------------
jsonb | array | ["a", "b"]
(1 row)
4. 特定VALUE类型的处理,例如数组。
当VALUE是ARRAY时,如果需要返回ARRAY给SQL层,暂时还没有内置这样的操作符,需要自定义,本文将介绍。
如何返回JSON中存储的数组
大多数JSON操作符,返回的要么是TEXT要么是JSON.
例如以下查询
操作符->返回数组
postgres=# select pg_typeof('{"a":[1,2,3],"b":[4,5,6]}'::json->'a'), '{"a":[1,2,3],"b":[4,5,6]}'::json->'a';
pg_typeof | ?column?
-----------+----------
json | [1,2,3]
(1 row)
操作符-»返回text
postgres=# select pg_typeof('{"a":[1,2,3],"b":[4,5,6]}'::json->>'a'), '{"a":[1,2,3],"b":[4,5,6]}'::json->>'a';
pg_typeof | ?column?
-----------+----------
text | [1,2,3]
(1 row)
如果明确json_typeof或jsonb_typeof是array,则可以通过以下function将数组转换为行
不管是什么数组,都返回text行
json_array_elements_text(json)
jsonb_array_elements_text(jsonb)
postgres=# select pg_typeof(col),col from (select json_array_elements_text('{"a":"B","b":[1,2,3,4,5,6]}'::json->'b') col) t;
pg_typeof | col
-----------+-----
text | 1
text | 2
text | 3
text | 4
text | 5
text | 6
(6 rows)
使用数组构造器,可以将多条记录构造为数组。
postgres=# SELECT array(select json_array_elements_text('{"a":"B","b":[1,2,3,4,5,6]}'::json->'b'));
array
---------------
{1,2,3,4,5,6}
(1 row)
postgres=# SELECT pg_typeof(array(select json_array_elements_text('{"a":"B","b":[1,2,3,4,5,6]}'::json->'b')));
pg_typeof
-----------
text[]
(1 row)
如何转换JSONB数组的类型
如果想构造int[],在构造前,可以将行的输出转换为对应的格式,
postgres=# SELECT array(select (json_array_elements_text('{"a":"B","b":[1,2,3,4,5,6]}'::json->'b'))::int );
array
---------------
{1,2,3,4,5,6}
(1 row)
postgres=# SELECT pg_typeof(array(select (json_array_elements_text('{"a":"B","b":[1,2,3,4,5,6]}'::json->'b'))::int ));
pg_typeof
-----------
integer[]
(1 row)
编写JSON数组转换为SQL数组的函数
可以将以上方法转换为函数来使用,输入json typeof == array的json或jsonb对象,输出text数组。
转换jsonb, json array to text array
CREATE OR REPLACE FUNCTION json_arr2text_arr(_js jsonb)
RETURNS text[] AS
$func$
SELECT ARRAY(SELECT jsonb_array_elements_text(_js))
$func$
LANGUAGE sql IMMUTABLE;
CREATE OR REPLACE FUNCTION json_arr2text_arr(_js json)
RETURNS text[] AS
$func$
SELECT ARRAY(SELECT json_array_elements_text(_js))
$func$
LANGUAGE sql IMMUTABLE;
转换jsonb, json array to int array
CREATE OR REPLACE FUNCTION json_arr2int_arr(_js jsonb)
RETURNS int[] AS
$func$
SELECT ARRAY( SELECT (jsonb_array_elements_text(_js))::int )
$func$
LANGUAGE sql IMMUTABLE;
CREATE OR REPLACE FUNCTION json_arr2int_arr(_js json)
RETURNS int[] AS
$func$
SELECT ARRAY( SELECT (json_array_elements_text(_js))::int )
$func$
LANGUAGE sql IMMUTABLE;
例子
postgres=# select col, pg_typeof(col) from (select json_arr2text_arr(c1->'f') col from t3) t;
col | pg_typeof
-----------+-----------
{1,2,3,4} | text[]
(1 row)
postgres=# select col, pg_typeof(col) from (select json_arr2int_arr(c1->'f') col from t3) t;
col | pg_typeof
-----------+-----------
{1,2,3,4} | integer[]
(1 row)
postgres=# select col, pg_typeof(col) from (select json_arr2text_arr(c1->'g') col from t3) t;
col | pg_typeof
-------+-----------
{a,b} | text[]
(1 row)
应用场景
例如业务系统在JSON中存储了一些ARRAY,这些ARRAY通常会包括一些全包含,全不包含,相交等查询,达到查询的目的。
这里就需要用到数组的包含,相交,不相干操作符来判断,那就会用到前面提到的数组的转换需求。
例子
例如JSON存储的是乐高积木的各属性。
JSON某个KEY存储的数组代表乐高积木对应型号的小零件部件号清单,然后用户需要查询哪些积木型号包含了指定零件编号,这里涉及的就是全包含的查询。
包含’a’零件
postgres=# select * from t3 where json_arr2text_arr(c1->'g') @> array['a'];
c1
-----------------------------------------------------------------------------------------------------
{"a": "v", "b": 12, "c": {"ab": "hello"}, "d": 12.3, "e": true, "f": [1, 2, 3, 4], "g": ["a", "b"]}
(1 row)
同时包含’a’,’c’零件
postgres=# select * from t3 where json_arr2text_arr(c1->'g') @> array['a','c'];
c1
----
(0 rows)
同时包含1,2零件
postgres=# select * from t3 where json_arr2int_arr(c1->'f') @> array[1,2];
c1
-----------------------------------------------------------------------------------------------------
{"a": "v", "b": 12, "c": {"ab": "hello"}, "d": 12.3, "e": true, "f": [1, 2, 3, 4], "g": ["a", "b"]}
(1 row)
同时包含1,6零件
postgres=# select * from t3 where json_arr2int_arr(c1->'f') @> array[1,6];
c1
----
(0 rows)
包含1或6零件
postgres=# select * from t3 where json_arr2int_arr(c1->'f') && array[1,6];
c1
-----------------------------------------------------------------------------------------------------
{"a": "v", "b": 12, "c": {"ab": "hello"}, "d": 12.3, "e": true, "f": [1, 2, 3, 4], "g": ["a", "b"]}
(1 row)
即不包含1 也不包含6零件
postgres=# select * from t3 where not(json_arr2int_arr(c1->'f') && array[1,6]);
c1
----
(0 rows)
不包含6零件
postgres=# select * from t3 where not(json_arr2int_arr(c1->'f') && array[6]);
c1
-----------------------------------------------------------------------------------------------------
{"a": "v", "b": 12, "c": {"ab": "hello"}, "d": 12.3, "e": true, "f": [1, 2, 3, 4], "g": ["a", "b"]}
(1 row)
如果你不想转换为INT,那么在条件中也不要使用INT数组
postgres=# select * from t3 where not(json_arr2text_arr(c1->'f') && array['6']);
c1
-----------------------------------------------------------------------------------------------------
{"a": "v", "b": 12, "c": {"ab": "hello"}, "d": 12.3, "e": true, "f": [1, 2, 3, 4], "g": ["a", "b"]}
(1 row)
索引
为了提升这种查询的速度,我们可以这样建立索引。
postgres=# create index idx_t3_1 on t3 using gin (json_arr2text_arr(c1->'f'));
CREATE INDEX
postgres=# set enable_seqscan=off;
SET
postgres=# explain select * from t3 where json_arr2text_arr(c1->'f') && array['1','6'];
QUERY PLAN
-------------------------------------------------------------------------------
Bitmap Heap Scan on t3 (cost=12.25..16.52 rows=1 width=32)
Recheck Cond: (json_arr2text_arr((c1 -> 'f'::text)) && '{1,6}'::text[])
-> Bitmap Index Scan on idx_t3_1 (cost=0.00..12.25 rows=1 width=0)
Index Cond: (json_arr2text_arr((c1 -> 'f'::text)) && '{1,6}'::text[])
(4 rows)
postgres=# explain select * from t3 where json_arr2text_arr(c1->'f') @> array['1','6'];
QUERY PLAN
-------------------------------------------------------------------------------
Bitmap Heap Scan on t3 (cost=12.25..16.52 rows=1 width=32)
Recheck Cond: (json_arr2text_arr((c1 -> 'f'::text)) @> '{1,6}'::text[])
-> Bitmap Index Scan on idx_t3_1 (cost=0.00..12.25 rows=1 width=0)
Index Cond: (json_arr2text_arr((c1 -> 'f'::text)) @> '{1,6}'::text[])
(4 rows)
参考
http://dba.stackexchange.com/questions/54283/how-to-turn-json-array-into-postgres-array
https://www.postgresql.org/docs/9.6/static/functions-json.html
https://www.postgresql.org/docs/9.6/static/functions-array.html