PostgreSQL clang vs gcc 编译
背景
CLANG是一个不错的编译器,本文将介绍一下使用CLANG编译以及它的优化开关,如何编译PostgreSQL,同时对比一下GCC 4.4.6版本的性能。
安装clang
安装clang,需要更高版本的gcc来进行编译。
安装gcc
找一个比较快的镜像下载源码包
https://gcc.gnu.org/mirrors.html
ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/
$ wget ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-4.9.4/gcc-4.9.4.tar.bz2
解压
$ tar -xvzf gcc-4.9.4.tar.gz
$ cd gcc-4.9.4
执行以下脚本, 下载依赖包
./contrib/download_prerequisites
下载完后,在gcc源码根目录可以看到这些下载的包,已自动解压
drwxrwxrwx 5 1114 1114 12K Nov 30 2009 mpfr-2.4.2
drwxrwxrwx 5 gpadmin gpadmin 4.0K Dec 8 2009 mpc-0.8.1
drwxrwxrwx 15 digoal wheel 4.0K Jan 8 2010 gmp-4.3.2
drwxrwxr-x 13 gpadmin gpadmin 4.0K Oct 11 2013 cloog-0.18.1
drwxrwxr-x 7 gpadmin gpadmin 4.0K Jan 12 2014 isl-0.12.2
-rw-r--r-- 1 root root 1.1M Nov 4 15:56 mpfr-2.4.2.tar.bz2
lrwxrwxrwx 1 root root 10 Nov 4 15:56 mpfr -> mpfr-2.4.2
-rw-r--r-- 1 root root 1.9M Nov 4 15:59 gmp-4.3.2.tar.bz2
lrwxrwxrwx 1 root root 9 Nov 4 15:59 gmp -> gmp-4.3.2
-rw-r--r-- 1 root root 533K Nov 4 16:00 mpc-0.8.1.tar.gz
lrwxrwxrwx 1 root root 9 Nov 4 16:00 mpc -> mpc-0.8.1
-rw-r--r-- 1 root root 1.3M Nov 4 16:02 isl-0.12.2.tar.bz2
lrwxrwxrwx 1 root root 10 Nov 4 16:02 isl -> isl-0.12.2
-rw-r--r-- 1 root root 3.7M Nov 4 16:10 cloog-0.18.1.tar.gz
lrwxrwxrwx 1 root root 12 Nov 4 16:10 cloog -> cloog-0.18.1
编译gcc
$ ./configure --prefix=/u02/digoal/gcc4.9.4 --disable-multilib
$ make -j 32
$ make install
执行,同时将环境变量加入 /etc/profile
export LD_LIBRARY_PATH=/u02/digoal/gcc4.9.4/lib:/u02/digoal/gcc4.9.4/lib64:$LD_LIBRARY_PATH
export PATH=/u02/digoal/gcc4.9.4/bin:$PATH
修改ld.so.conf
# vi /etc/ld.so.conf
/u02/digoal/gcc4.9.4/lib
/u02/digoal/gcc4.9.4/lib64
# ldconfig
gcc 6.2
如果你需要用6.2的gcc, 安装方法同上, 只是依赖的包版本有点不一样
-rw-r--r-- 1 root root 1.1M Nov 5 09:23 mpfr-2.4.2.tar.bz2
lrwxrwxrwx 1 root root 10 Nov 5 09:23 mpfr -> mpfr-2.4.2
-rw-r--r-- 1 root root 1.9M Nov 5 09:27 gmp-4.3.2.tar.bz2
lrwxrwxrwx 1 root root 9 Nov 5 09:27 gmp -> gmp-4.3.2
-rw-r--r-- 1 root root 533K Nov 5 09:28 mpc-0.8.1.tar.gz
lrwxrwxrwx 1 root root 9 Nov 5 09:28 mpc -> mpc-0.8.1
-rw-r--r-- 1 root root 1.6M Nov 5 09:29 isl-0.15.tar.bz2
lrwxrwxrwx 1 root root 8 Nov 5 09:29 isl -> isl-0.15
参考
1. https://gcc.gnu.org/install/prerequisites.html
安装clang
安装gcc, 参照前面
安装cmake
$ wget https://cmake.org/files/v3.6/cmake-3.6.3.tar.gz
$ tar -zxvf cmake-3.6.3.tar.gz
$ cd cmake-3.6.3
$ ./configure --prefix=/u02/digoal/cmake
$ make -j 32
$ make install
执行,同时将环境变量加入 /etc/profile
$ export PATH=/u02/digoal/cmake/bin:$PATH
安装python
$ wget https://www.python.org/ftp/python/2.7.12/Python-2.7.12.tar.xz
$ tar -xvf Python-2.7.12.tar.xz
$ cd Python-2.7.12
$ ./configure --prefix=/u02/digoal/python2.7.12 --enable-shared
$ make -j 32
$ make install
执行,同时将环境变量加入 /etc/profile
export PATH=/u02/digoal/python2.7.12/bin:$PATH
export LD_LIBRARY_PATH=/u02/digoal/python2.7.12/lib:$LD_LIBRARY_PATH
修改ld.so.conf
# vi /etc/ld.so.conf
/u02/digoal/python2.7.12/lib
# ldconfig
安装llvm, clang
下载软件包
$ wget http://llvm.org/releases/3.9.0/llvm-3.9.0.src.tar.xz
$ wget http://llvm.org/releases/3.9.0/cfe-3.9.0.src.tar.xz
$ wget http://llvm.org/releases/3.9.0/clang-tools-extra-3.9.0.src.tar.xz
/******* 本文不需要
$ wget http://llvm.org/releases/3.9.0/compiler-rt-3.9.0.src.tar.xz
$ wget http://llvm.org/releases/3.9.0/libcxx-3.9.0.src.tar.xz
*******/
$ tar -xvf llvm-3.9.0.src.tar.xz
$ tar -xvf cfe-3.9.0.src.tar.xz
$ tar -xvf clang-tools-extra-3.9.0.src.tar.xz
/******* 本文不需要
$ tar -xvf compiler-rt-3.9.0.src.tar.xz
$ tar -xvf libcxx-3.9.0.src.tar.xz
*******/
$ mv llvm-3.9.0.src llvm
$ mv cfe-3.9.0.src clang
$ mv clang llvm/tools/
$ mv clang-tools-extra-3.9.0.src extra
$ mv extra llvm/tools/clang/
/******* 本文不需要
$ mv compiler-rt-3.9.0.src compiler-rt
$ mv compiler-rt llvm/projects/
$ mv libcxx-3.9.0.src libcxx
$ mv libcxx llvm/projects/
*******/
使用cmake安装
$ mkdir mybuild
$ cd mybuild
$ CC=/u02/digoal/gcc4.9.4/gcc cmake -G "Unix Makefiles" ../llvm
编译
$ CC=/u02/digoal/gcc4.9.4/gcc cmake --build .
或使用make
$ CC=/u02/digoal/gcc4.9.4/gcc make -j 32
安装到目标目录
$ CC=/u02/digoal/gcc4.9.4/gcc cmake -DCMAKE_INSTALL_PREFIX=/u02/digoal/llvm -P cmake_install.cmake
执行,同时将环境变量加入 /etc/profile
export PATH=/u02/digoal/llvm/bin:$PATH
export LD_LIBRARY_PATH=/u02/digoal/llvm/lib:$LD_LIBRARY_PATH
修改ld.so.conf
# vi /etc/ld.so.conf
/u02/digoal/llvm/lib
# ldconfig
参考
1. http://btorpey.github.io/blog/2015/01/02/building-clang/
2. http://clang.llvm.org/get_started.html
3. http://llvm.org/docs/CMake.html
4. cmake –help-variable-list 查看CMAKE支持的变量
5. 查看cmake变量的含义, 例如 cmake –help-variable PROJECT_SOURCE_DIR
PROJECT_SOURCE_DIR
------------------
Top level source directory for the current project.
This is the source directory of the most recent ``project()`` command.
6. http://www.cnblogs.com/ralphjzhang/archive/2011/12/02/2272671.html
7. http://www.cnblogs.com/Frandy/archive/2012/10/20/llvm_clang_libcxx_cxx11.html
8. http://llvm.1065342.n5.nabble.com/llvm-dev-llvm-build-failed-while-Linking-CXX-shared-library-lib-libc-so-td93393.html
clang, GCC优化开关介绍
参考clang man手册
-cl-fast-relaxed-math OpenCL only. Sets -cl-finite-math-only and -cl-unsafe-math-optimizations, and defines __FAST_RELAXED_MATH__.
-cl-finite-math-only OpenCL only. Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf.
-cl-opt-disable OpenCL only. This option disables all optimizations. By default optimizations are enabled.
-cl-unsafe-math-optimizations
OpenCL only. Allow unsafe floating-point optimizations. Also implies -cl-no-signed-zeros and -cl-mad-enable.
Enable device-side debug info generation. Disables ptxas optimizations.
-ffast-math Allow aggressive, lossy floating-point optimizations
-fno-profile-instr-use Disable using instrumentation data for profile-guided optimization
-fno-signed-zeros Allow optimizations that ignore the sign of floating point zeros
Use instrumentation data for profile-guided optimization
Enable sample-based profile guided optimizations
Use instrumentation data for profile-guided optimization. If pathname is a directory, it reads from <pathname>/default.profdata. Otherwise, it reads from file <pathname>.
-fstrict-enums Enable optimizations based on the strict definition of an enum's value range
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects
-fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
-Rpass-analysis=<value> Report transformation analysis from optimization passes whose name matches the given POSIX regular expression
-Rpass-missed=<value> Report missed transformations by optimization passes whose name matches the given POSIX regular expression
-Rpass=<value> Report transformations performed by optimization passes whose name matches the given POSIX regular expression
clang常用优化开关
-O3 -fstrict-enums -fno-signed-zeros
gcc的优化开关
Optimization Options
-faggressive-loop-optimizations -falign-functions[=n] -falign-jumps[=n] -falign-labels[=n] -falign-loops[=n] -fassociative-math -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize
-fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves -fcheck-data-deps -fcombine-stack-adjustments -fconserve-stack -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps -fcse-skip-blocks
-fcx-fortran-rules -fcx-limited-range -fdata-sections -fdce -fdelayed-branch -fdelete-null-pointer-checks -fdevirtualize -fdse -fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects -ffast-math
-ffinite-math-only -ffloat-store -fexcess-precision=style -fforward-propagate -ffp-contract=style -ffunction-sections -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity -fgcse-sm -fhoist-adjacent-loads
-fif-conversion -fif-conversion2 -findirect-inlining -finline-functions -finline-functions-called-once -finline-limit=n -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-pta -fipa-profile -fipa-pure-const
-fipa-reference -fira-algorithm=algorithm -fira-region=region -fira-hoist-pressure -fira-loop-pressure -fno-ira-share-save-slots -fno-ira-share-spill-slots -fira-verbose=n -fivopts -fkeep-inline-functions
-fkeep-static-consts -floop-block -floop-interchange -floop-strip-mine -floop-nest-optimize -floop-parallelize-all -flto -flto-compression-level -flto-partition=alg -flto-report -fmerge-all-constants -fmerge-constants
-fmodulo-sched -fmodulo-sched-allow-regmoves -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg -fno-default-inline -fno-defer-pop -fno-function-cse -fno-guess-branch-probability -fno-inline
-fno-math-errno -fno-peephole -fno-peephole2 -fno-sched-interblock -fno-sched-spec -fno-signed-zeros -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss -fomit-frame-pointer -foptimize-register-move
-foptimize-sibling-calls -fpartial-inlining -fpeel-loops -fpredictive-commoning -fprefetch-loop-arrays -fprofile-report -fprofile-correction -fprofile-dir=path -fprofile-generate -fprofile-generate=path -fprofile-use
-fprofile-use=path -fprofile-values -freciprocal-math -free -fregmove -frename-registers -freorder-blocks -freorder-blocks-and-partition -freorder-functions -frerun-cse-after-loop -freschedule-modulo-scheduled-loops
-frounding-math -fsched2-use-superblocks -fsched-pressure -fsched-spec-load -fsched-spec-load-dangerous -fsched-stalled-insns-dep[=n] -fsched-stalled-insns[=n] -fsched-group-heuristic -fsched-critical-path-heuristic
-fsched-spec-insn-heuristic -fsched-rank-heuristic -fsched-last-insn-heuristic -fsched-dep-count-heuristic -fschedule-insns -fschedule-insns2 -fsection-anchors -fselective-scheduling -fselective-scheduling2
-fsel-sched-pipelining -fsel-sched-pipelining-outer-loops -fshrink-wrap -fsignaling-nans -fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector -fstack-protector-all
-fstack-protector-strong -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop
-ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-if-convert-stores -ftree-loop-im -ftree-phiprop -ftree-loop-distribution
-ftree-loop-distribute-patterns -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize -ftree-parallelize-loops=n -ftree-pre -ftree-partial-pre -ftree-pta -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra
-ftree-switch-conversion -ftree-tail-merge -ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp -funit-at-a-time -funroll-all-loops -funroll-loops -funsafe-loop-optimizations -funsafe-math-optimizations
-funswitch-loops -fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb -fwhole-program -fwpa -fuse-ld=linker -fuse-linker-plugin --param name=value -O -O0 -O1 -O2 -O3 -Os -Ofast -Og
gcc常用优化开关
-O3 -flto
参考 https://www.postgresql.org/message-id/7146D19B7FC9D5119F6400D0B7B993080318C991@az33exm22.corp.mot.com
clang编译PostgreSQL
CC=/u02/digoal/llvm/bin/clang CFLAGS="-O2 -fstrict-enums -fno-signed-zeros" ./configure --prefix=/u02/digoal/soft_bak/pgsql9.5
CC=/u02/digoal/llvm/bin/clang make world -j 32
CC=/u02/digoal/llvm/bin/clang make install-world
性能对比测试
clang 3.9.0对比gcc 6.2.0编译的PostgreSQL。
避免IO瓶颈,使用内存较大的主机,观察profile。
select
1000万记录,全内存命中,基于主键查询压测。400连接。
$ psql
create table test(id int primary key, info text, crt_time timestamp);
insert into test select generate_series(1,10000000);
$ vi test.sql
\set id random(1, 10000000)
SELECT * FROM test where id=:id;
$ pgbench -M prepared -n -f ./test.sql -h xxx.xxx.xxx.xxx -p 1921 -c 400 -j 400 -T 120
测试结果
-- gcc 6.2.0
tps
1124951
profile
61962.00 10.9% GetSnapshotData /home/digoal/pgsql9.6_gcc/bin/postgres
20189.00 3.6% _bt_compare /home/digoal/pgsql9.6_gcc/bin/postgres
16353.00 2.9% hash_search_with_hash_value /home/digoal/pgsql9.6_gcc/bin/postgres
14725.00 2.6% AllocSetAlloc /home/digoal/pgsql9.6_gcc/bin/postgres
13601.00 2.4% SearchCatCache /home/digoal/pgsql9.6_gcc/bin/postgres
11787.00 2.1% LWLockAttemptLock /home/digoal/pgsql9.6_gcc/bin/postgres
-- clang 3.9.0
tps
1120610
profile
61727.00 10.8% GetSnapshotData /home/digoal/pgsql9.6/bin/postgres
19754.00 3.5% _bt_compare /home/digoal/pgsql9.6/bin/postgres
17741.00 3.1% AllocSetAlloc /home/digoal/pgsql9.6/bin/postgres
15902.00 2.8% hash_search_with_hash_value /home/digoal/pgsql9.6/bin/postgres
13122.00 2.3% LWLockAcquire /home/digoal/pgsql9.6/bin/postgres
insert
一张表,一个自增序列以及索引,并发插入,异步提交。
$ psql
create table test(id serial primary key, info text, crt_time timestamp) with (autovacuum_enabled=off);
alter sequence test_id_seq cache 100000;
$ vi test.sql
insert into test(info) values (null);
$ pgbench -M prepared -n -P 1 -f ./test.sql -h xxx.xxx.xxx.xxx -p 1921 -c 128 -j 128 -T 120
测试结果
-- gcc 6.2.0
tps
356761
profile
-- clang 3.9.0
tps
372643
profile
update
1000万记录,全内存命中,基于主键查询更新,异步提交。
$ psql
create table test(id int primary key, info text, crt_time timestamp) with (fillfactor=90);
insert into test select generate_series(1,10000000);
$ vi test.sql
\set id random(1, 10000000)
update test set info=info where id=:id;
$ pgbench -M prepared -n -f ./test.sql -h xxx.xxx.xxx.xxx -p 1921 -c 64 -j 64 -T 120
测试结果
-- gcc 6.2.0
tps
273016
profile
-- clang 3.9.0
tps
283776
profile
copy bulk
一张表,一个索引,并发COPY,异步提交。
$ psql
create table test(id int , info text, crt_time timestamp) with (autovacuum_enabled=off);
create index idx on test(id);
copy (select id,null,null from generate_series(1,100000) t(id)) to '/home/digoal/test.csv';
$ vi test.sql
copy test from '/home/digoal/test.csv';
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 16 -j 16 -h /u01/digoal/pg_root1921 -p 1921 -T 100
测试结果
-- gcc 6.2.0
tps
17.81
profile
-- clang 3.9.0
tps
18.146376
profile
如何诊断瓶颈
《Greenplum PostgreSQL –enable-profiling 产生gprof性能诊断代码》
《PostgreSQL 代码性能诊断之 - OProfile & Systemtap》
参考
1. http://www.kitware.com/blog/home/post/1016
2. http://grokbase.com/t/postgresql/pgsql-hackers/10bggd42rt/gcc-vs-clang
3. http://llvm.org/releases/download.html
4. http://www.tuicool.com/articles/Yz2Q7nz