PostgreSQL FDW mongo_fdw usage
背景
https://wiki.postgresql.org/wiki/Foreign_data_wrappers
mongo_fdw是PostgreSQL众多FDW中的一个, 显而易见是连接mongoDB的.
最近在看CitusDB的时候, 文档中提到了mongo_fdw, 实际上在CitusDB中并没有发挥这个extension的distribute query特性.
用在PostgreSQL单节点的环境当然也是可以的.
目前mongo_fdw由EDB在维护,建议用户使用EDB的版本。本文成文太早,还没有EDB版本的MONGO_FDW。
https://github.com/EnterpriseDB/mongo_fdw
下面介绍一下mongo_fdw在PostgreSQL数据库环境中的安装和用法.
1. 安装
下载
ocz@db-172-16-3-150-> wget --no-check-certificate https://github.com/citusdata/mongo_fdw/archive/master.zip
ocz@db-172-16-3-150-> unzip master
ocz@db-172-16-3-150-> mv mongo_fdw-master postgresql-9.2.1/contrib/
编译
su - root
[root@db-172-16-3-150 ~]# . /home/ocz/.bash_profile
root@db-172-16-3-150-> which pg_config
/home/ocz/pgsql9.2.1/bin/pg_config
root@db-172-16-3-150-> cd /home/ocz/postgresql-9.2.1/contrib/mongo_fdw-master/
root@db-172-16-3-150-> make clean
root@db-172-16-3-150-> make
root@db-172-16-3-150-> make install
2. 创建extension
root@db-172-16-3-150-> su - ocz
ocz@db-172-16-3-150-> psql
psql (9.2.1)
Type "help" for help.
postgres=# \c digoal postgres
digoal=# create extension mongo_fdw;
CREATE EXTENSION
3. 创建mongoDB测试环境
[root@db-172-16-3-150 ~]# wget http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-2.2.3.tgz
[root@db-172-16-3-150 ~]# tar -zxvf mongodb-linux-x86_64-2.2.3.tgz
[root@db-172-16-3-150 ~]# mv mongodb-linux-x86_64-2.2.3 /opt/
[root@db-172-16-3-150 ~]# useradd mongo
[root@db-172-16-3-150 ~]# chown -R mongo:mongo /opt/mongodb-linux-x86_64-2.2.3
[root@db-172-16-3-150 ~]# su - mongo
mongo@db-172-16-3-150-> vi .bash_profile
export MONGO_HOME=/opt/mongodb-linux-x86_64-2.2.3
export PATH=$MONGO_HOME/bin:$PATH:.
umask 022
alias rm='rm -i'
alias ll='ls -lh'
su - root
[root@db-172-16-3-150 data03]# mkdir -p /data03/mongodata
[root@db-172-16-3-150 data03]# chown -R mongo:mongo /data03/mongodata
[root@db-172-16-3-150 data03]# vi /opt/mongodb-linux-x86_64-2.2.3/mongo11111.conf
logpath = /data03/mongodata/mongod_5281.log
logappend = true
fork = true
port = 11111
noauth = true
bind_ip = 0.0.0.0
dbpath = /data03/mongodata/
maxConns = 20000
pidfilepath = /data03/mongodata/mongod_11111.pid
nounixsocket = true
directoryperdb = true
journal = true
journalCommitInterval = 40
profile = 0
nohttpinterface = true
nssize = 2000
oplogSize = 31280
[root@db-172-16-3-150 mongodb-linux-x86_64-2.2.3]# chown mongo:mongo /opt/mongodb-linux-x86_64-2.2.3/mongo11111.conf
4. 启动mongodb数据库
su - mongo
mongo@db-172-16-3-150-> mongod -f /opt/mongodb-linux-x86_64-2.2.3/mongo11111.conf
mongo@db-172-16-3-150-> mongo 127.0.0.1:11111/admin
MongoDB shell version: 2.2.3
connecting to: 127.0.0.1:11111/admin
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
> show dbsshow dbs
local (empty)
5. 创建外部表
[root@db-172-16-3-150 ~]# su - ocz
ocz@db-172-16-3-150-> psql digoal digoal
psql (9.2.1)
Type "help" for help.
digoal=> \c digoal postgres
You are now connected to database "digoal" as user "postgres".
digoal=# \dx
List of installed extensions
Name | Version | Schema | Description
-----------+---------+------------+--------------------------------------------------------------
dblink | 1.0 | public | connect to other PostgreSQL databases from within a database
hll | 1.0 | public | type for storing hyperloglog data
mongo_fdw | 1.0 | public | foreign data wrapper for MongoDB access
plcoffee | 1.3.0 | pg_catalog | PL/CoffeeScript (v8) trusted procedural language
plls | 1.3.0 | pg_catalog | PL/LiveScript (v8) trusted procedural language
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
plv8 | 1.3.0 | pg_catalog | PL/JavaScript (v8) trusted procedural language
(7 rows)
digoal=# CREATE SERVER mongo_server FOREIGN DATA WRAPPER mongo_fdw
OPTIONS (address '127.0.0.1', port '11111');
digoal=# CREATE FOREIGN TABLE customer_reviews
(
customer_id TEXT,
review_date TIMESTAMP,
review_rating INTEGER,
product_id CHAR(10),
product_title TEXT,
product_group TEXT,
product_category TEXT,
similar_product_ids CHAR(10)[]
)
SERVER mongo_server
OPTIONS (database 'test', collection 'customer_reviews');
6. 查询外部表
digoal=# select * from customer_reviews ;
customer_id | review_date | review_rating | product_id | product_title | product_group | product_category | similar_product_ids
-------------+-------------+---------------+------------+---------------+---------------+------------------+---------------------
(0 rows)
> show dbsshow dbs
local (empty)
test (empty)
7. 下载测试数据
[root@db-172-16-3-150 ~]# wget http://examples.citusdata.com/customer_reviews_1998.json.gz
[root@db-172-16-3-150 ~]# wget http://examples.citusdata.com/customer_reviews_1999.json.gz
[root@db-172-16-3-150 ~]# gunzip customer_reviews_1998.json.gz
8. 导入测试数据
[root@db-172-16-3-150 ~]# . /home/mongo/.bash_profile
root@db-172-16-3-150-> which mongo
/opt/mongodb-linux-x86_64-2.2.3/bin/mongo
root@db-172-16-3-150-> mongoimport --port 11111 --db test --collection customer_reviews --type json --file customer_reviews_1998.json
connected to: 127.0.0.1:11111
root@db-172-16-3-150-> mongo 127.0.0.1:11111/test
MongoDB shell version: 2.2.3
connecting to: 127.0.0.1:11111/test
> show collections
customer_reviews
system.indexes
> db.customer_reviews.count()
589859
9. PostgreSQL 外部表查询
digoal=# select * from customer_reviews limit 1;
customer_id | review_date | review_rating | product_id | product_title | product_group | p
roduct_category | similar_product_ids
---------------+---------------------+---------------+------------+--------------------------------------------+---------------+----
------------------+----------------------------------------------------------
AE22YDHSBFYIP | 1970-12-30 00:00:00 | 5 | 1551803542 | Start and Run a Coffee Bar (Start & Run a) | Book | Bus
iness & Investing | {0471136174,0910627312,047112138X,0786883561,0201570483}
(1 row)
digoal=# select count(*) from customer_reviews ;
count
--------
589859
(1 row)
digoal=# SELECT
round(avg(review_rating), 2),
width_bucket(length(product_title), 1, 50, 5) as title_length,
count(*)
FROM
customer_reviews
WHERE
product_group='Book' AND
review_date >= '1998-01-01' AND
review_date < date '1998-01-01' + interval '1 year'
GROUP BY
title_length
ORDER BY
title_length;
round | title_length | count
-------+--------------+--------
4.39 | 1 | 40058
4.29 | 2 | 125688
4.41 | 3 | 76425
4.36 | 4 | 51531
4.30 | 5 | 34974
4.39 | 6 | 30349
(6 rows)
digoal=# explain analyze verbose SELECT
round(avg(review_rating), 2),
width_bucket(length(product_title), 1, 50, 5) as title_length,
count(*)
FROM
customer_reviews
WHERE
product_group='Book' AND
review_date >= '1998-01-01' AND
review_date < date '1998-01-01' + interval '1 year'
GROUP BY
title_length
ORDER BY
title_length;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
Sort (cost=21457.71..21457.71 rows=1 width=36) (actual time=3221.772..3221.772 rows=6 loops=1)
Output: (round(avg(review_rating), 2)), (width_bucket((length(product_title))::double precision, 1::double precision, 50::double
precision, 5)), (count(*))
Sort Key: (width_bucket((length(customer_reviews.product_title))::double precision, 1::double precision, 50::double precision, 5)
)
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=21457.68..21457.70 rows=1 width=36) (actual time=3221.746..3221.754 rows=6 loops=1)
Output: round(avg(review_rating), 2), (width_bucket((length(product_title))::double precision, 1::double precision, 50::dou
ble precision, 5)), count(*)
-> Foreign Scan on public.customer_reviews (cost=5.00..21457.56 rows=15 width=36) (actual time=225.305..3036.528 rows=359
025 loops=1)
Output: width_bucket((length(product_title))::double precision, 1::double precision, 50::double precision, 5), review
_rating
Filter: ((customer_reviews.review_date >= '1998-01-01 00:00:00'::timestamp without time zone) AND (customer_reviews.r
eview_date < '1999-01-01 00:00:00'::timestamp without time zone) AND (customer_reviews.product_group = 'Book'::text))
Foreign Namespace: test.customer_reviews
Total runtime: 3222.184 ms
(11 rows)
参考
1. https://github.com/citusdata/mongo_fdw
2. http://pgxn.org/dist/mongo_fdw
3. https://github.com/EnterpriseDB/mongo_fdw
其他FDW
1. hive
https://github.com/youngwookim/hive-fdw-for-postgresql
2. PostgreSQL Foreign Table - pgsql_fdw
http://blog.163.com/digoal@126/blog/static/163877040201231514057303/
3. PostgreSQL Foreign Table - oracle_fdw 1
http://blog.163.com/digoal@126/blog/static/163877040201181505331588/
4. PostgreSQL Foreign Table - oracle_fdw 2
http://blog.163.com/digoal@126/blog/static/16387704020118151162340/
5. PostgreSQL Foreign Table - oracle_fdw 3
http://blog.163.com/digoal@126/blog/static/16387704020118951953408/
6. PostgreSQL Foreign Table - file_fdw
http://blog.163.com/digoal@126/blog/static/163877040201141641148311/
7. PostgreSQL Foreign Table - redis_fdw
http://blog.163.com/digoal@126/blog/static/16387704020119181188247/
8. PostgreSQL Foreign Table - mysql_fdw 1
http://blog.163.com/digoal@126/blog/static/1638770402011111233524987/
9. PostgreSQL Foreign Table - mysql_fdw 2
http://blog.163.com/digoal@126/blog/static/16387704020121108551698/
10. PostgreSQL file text array fdw used for unpredictable columns of text file
http://blog.163.com/digoal@126/blog/static/163877040201302410511382/