PostgreSQL clang vs gcc 编译

6 minute read

背景

CLANG是一个不错的编译器,本文将介绍一下使用CLANG编译以及它的优化开关,如何编译PostgreSQL,同时对比一下GCC 4.4.6版本的性能。

安装clang

安装clang,需要更高版本的gcc来进行编译。

安装gcc

找一个比较快的镜像下载源码包

https://gcc.gnu.org/mirrors.html

ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/

$ wget ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-4.9.4/gcc-4.9.4.tar.bz2  

解压

$ tar -xvzf gcc-4.9.4.tar.gz
$ cd gcc-4.9.4

执行以下脚本, 下载依赖包

./contrib/download_prerequisites

下载完后,在gcc源码根目录可以看到这些下载的包,已自动解压

drwxrwxrwx  5    1114    1114  12K Nov 30  2009 mpfr-2.4.2
drwxrwxrwx  5 gpadmin gpadmin 4.0K Dec  8  2009 mpc-0.8.1
drwxrwxrwx 15 digoal  wheel   4.0K Jan  8  2010 gmp-4.3.2
drwxrwxr-x 13 gpadmin gpadmin 4.0K Oct 11  2013 cloog-0.18.1
drwxrwxr-x  7 gpadmin gpadmin 4.0K Jan 12  2014 isl-0.12.2
-rw-r--r--  1 root    root    1.1M Nov  4 15:56 mpfr-2.4.2.tar.bz2
lrwxrwxrwx  1 root    root      10 Nov  4 15:56 mpfr -> mpfr-2.4.2
-rw-r--r--  1 root    root    1.9M Nov  4 15:59 gmp-4.3.2.tar.bz2
lrwxrwxrwx  1 root    root       9 Nov  4 15:59 gmp -> gmp-4.3.2
-rw-r--r--  1 root    root    533K Nov  4 16:00 mpc-0.8.1.tar.gz
lrwxrwxrwx  1 root    root       9 Nov  4 16:00 mpc -> mpc-0.8.1
-rw-r--r--  1 root    root    1.3M Nov  4 16:02 isl-0.12.2.tar.bz2
lrwxrwxrwx  1 root    root      10 Nov  4 16:02 isl -> isl-0.12.2
-rw-r--r--  1 root    root    3.7M Nov  4 16:10 cloog-0.18.1.tar.gz
lrwxrwxrwx  1 root    root      12 Nov  4 16:10 cloog -> cloog-0.18.1

编译gcc

$ ./configure --prefix=/u02/digoal/gcc4.9.4 --disable-multilib
$ make -j 32
$ make install

执行,同时将环境变量加入 /etc/profile

export LD_LIBRARY_PATH=/u02/digoal/gcc4.9.4/lib:/u02/digoal/gcc4.9.4/lib64:$LD_LIBRARY_PATH
export PATH=/u02/digoal/gcc4.9.4/bin:$PATH

修改ld.so.conf

# vi /etc/ld.so.conf
/u02/digoal/gcc4.9.4/lib
/u02/digoal/gcc4.9.4/lib64

# ldconfig

gcc 6.2

如果你需要用6.2的gcc, 安装方法同上, 只是依赖的包版本有点不一样

-rw-r--r--  1 root    root    1.1M Nov  5 09:23 mpfr-2.4.2.tar.bz2
lrwxrwxrwx  1 root    root      10 Nov  5 09:23 mpfr -> mpfr-2.4.2
-rw-r--r--  1 root    root    1.9M Nov  5 09:27 gmp-4.3.2.tar.bz2
lrwxrwxrwx  1 root    root       9 Nov  5 09:27 gmp -> gmp-4.3.2
-rw-r--r--  1 root    root    533K Nov  5 09:28 mpc-0.8.1.tar.gz
lrwxrwxrwx  1 root    root       9 Nov  5 09:28 mpc -> mpc-0.8.1
-rw-r--r--  1 root    root    1.6M Nov  5 09:29 isl-0.15.tar.bz2
lrwxrwxrwx  1 root    root       8 Nov  5 09:29 isl -> isl-0.15

参考

1. https://gcc.gnu.org/install/prerequisites.html

安装clang

安装gcc, 参照前面

安装cmake

$ wget https://cmake.org/files/v3.6/cmake-3.6.3.tar.gz
$ tar -zxvf cmake-3.6.3.tar.gz
$ cd cmake-3.6.3
$ ./configure --prefix=/u02/digoal/cmake
$ make -j 32
$ make install

执行,同时将环境变量加入 /etc/profile

$ export PATH=/u02/digoal/cmake/bin:$PATH

安装python

$ wget https://www.python.org/ftp/python/2.7.12/Python-2.7.12.tar.xz
$ tar -xvf Python-2.7.12.tar.xz
$ cd Python-2.7.12
$ ./configure --prefix=/u02/digoal/python2.7.12 --enable-shared
$ make -j 32
$ make install

执行,同时将环境变量加入 /etc/profile

export PATH=/u02/digoal/python2.7.12/bin:$PATH
export LD_LIBRARY_PATH=/u02/digoal/python2.7.12/lib:$LD_LIBRARY_PATH

修改ld.so.conf

# vi /etc/ld.so.conf
/u02/digoal/python2.7.12/lib

# ldconfig

安装llvm, clang

下载软件包

$ wget http://llvm.org/releases/3.9.0/llvm-3.9.0.src.tar.xz
$ wget http://llvm.org/releases/3.9.0/cfe-3.9.0.src.tar.xz
$ wget http://llvm.org/releases/3.9.0/clang-tools-extra-3.9.0.src.tar.xz

/******* 本文不需要
  $ wget http://llvm.org/releases/3.9.0/compiler-rt-3.9.0.src.tar.xz
  $ wget http://llvm.org/releases/3.9.0/libcxx-3.9.0.src.tar.xz
*******/

$ tar -xvf llvm-3.9.0.src.tar.xz
$ tar -xvf cfe-3.9.0.src.tar.xz
$ tar -xvf clang-tools-extra-3.9.0.src.tar.xz

/******* 本文不需要
  $ tar -xvf compiler-rt-3.9.0.src.tar.xz
  $ tar -xvf libcxx-3.9.0.src.tar.xz
*******/

$ mv llvm-3.9.0.src llvm

$ mv cfe-3.9.0.src clang
$ mv clang llvm/tools/

$ mv clang-tools-extra-3.9.0.src extra
$ mv extra llvm/tools/clang/

/******* 本文不需要
  $ mv compiler-rt-3.9.0.src compiler-rt
  $ mv compiler-rt llvm/projects/

  $ mv libcxx-3.9.0.src libcxx
  $ mv libcxx llvm/projects/
*******/

使用cmake安装

$ mkdir mybuild
$ cd mybuild

$ CC=/u02/digoal/gcc4.9.4/gcc cmake -G "Unix Makefiles" ../llvm

编译

$ CC=/u02/digoal/gcc4.9.4/gcc cmake --build .
或使用make 
$ CC=/u02/digoal/gcc4.9.4/gcc make -j 32

安装到目标目录

$ CC=/u02/digoal/gcc4.9.4/gcc cmake -DCMAKE_INSTALL_PREFIX=/u02/digoal/llvm -P cmake_install.cmake

执行,同时将环境变量加入 /etc/profile

export PATH=/u02/digoal/llvm/bin:$PATH
export LD_LIBRARY_PATH=/u02/digoal/llvm/lib:$LD_LIBRARY_PATH

修改ld.so.conf

# vi /etc/ld.so.conf
/u02/digoal/llvm/lib

# ldconfig

参考

1. http://btorpey.github.io/blog/2015/01/02/building-clang/

2. http://clang.llvm.org/get_started.html

3. http://llvm.org/docs/CMake.html

4. cmake –help-variable-list 查看CMAKE支持的变量

5. 查看cmake变量的含义, 例如 cmake –help-variable PROJECT_SOURCE_DIR

PROJECT_SOURCE_DIR
------------------

Top level source directory for the current project.

This is the source directory of the most recent ``project()`` command.

6. http://www.cnblogs.com/ralphjzhang/archive/2011/12/02/2272671.html

7. http://www.cnblogs.com/Frandy/archive/2012/10/20/llvm_clang_libcxx_cxx11.html

8. http://llvm.1065342.n5.nabble.com/llvm-dev-llvm-build-failed-while-Linking-CXX-shared-library-lib-libc-so-td93393.html

clang, GCC优化开关介绍

参考clang man手册

  -cl-fast-relaxed-math   OpenCL only. Sets -cl-finite-math-only and -cl-unsafe-math-optimizations, and defines __FAST_RELAXED_MATH__.
  -cl-finite-math-only    OpenCL only. Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf.
  -cl-opt-disable         OpenCL only. This option disables all optimizations. By default optimizations are enabled.
  -cl-unsafe-math-optimizations
                          OpenCL only. Allow unsafe floating-point optimizations.  Also implies -cl-no-signed-zeros and -cl-mad-enable.
                          Enable device-side debug info generation. Disables ptxas optimizations.
  -ffast-math             Allow aggressive, lossy floating-point optimizations
  -fno-profile-instr-use  Disable using instrumentation data for profile-guided optimization
  -fno-signed-zeros       Allow optimizations that ignore the sign of floating point zeros
                          Use instrumentation data for profile-guided optimization
                          Enable sample-based profile guided optimizations
                          Use instrumentation data for profile-guided optimization. If pathname is a directory, it reads from <pathname>/default.profdata. Otherwise, it reads from file <pathname>.
  -fstrict-enums          Enable optimizations based on the strict definition of an enum's value range
                          Enable optimizations based on the strict rules for overwriting polymorphic C++ objects
  -fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
  -Rpass-analysis=<value> Report transformation analysis from optimization passes whose name matches the given POSIX regular expression
  -Rpass-missed=<value>   Report missed transformations by optimization passes whose name matches the given POSIX regular expression
  -Rpass=<value>          Report transformations performed by optimization passes whose name matches the given POSIX regular expression

clang常用优化开关

-O3 -fstrict-enums -fno-signed-zeros

gcc的优化开关

Optimization Options
    -faggressive-loop-optimizations -falign-functions[=n] -falign-jumps[=n] -falign-labels[=n] -falign-loops[=n] -fassociative-math -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize
    -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves -fcheck-data-deps -fcombine-stack-adjustments -fconserve-stack -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps -fcse-skip-blocks
    -fcx-fortran-rules -fcx-limited-range -fdata-sections -fdce -fdelayed-branch -fdelete-null-pointer-checks -fdevirtualize -fdse -fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects -ffast-math
    -ffinite-math-only -ffloat-store -fexcess-precision=style -fforward-propagate -ffp-contract=style -ffunction-sections -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity -fgcse-sm -fhoist-adjacent-loads
    -fif-conversion -fif-conversion2 -findirect-inlining -finline-functions -finline-functions-called-once -finline-limit=n -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-pta -fipa-profile -fipa-pure-const
    -fipa-reference -fira-algorithm=algorithm -fira-region=region -fira-hoist-pressure -fira-loop-pressure -fno-ira-share-save-slots -fno-ira-share-spill-slots -fira-verbose=n -fivopts -fkeep-inline-functions
    -fkeep-static-consts -floop-block -floop-interchange -floop-strip-mine -floop-nest-optimize -floop-parallelize-all -flto -flto-compression-level -flto-partition=alg -flto-report -fmerge-all-constants -fmerge-constants
    -fmodulo-sched -fmodulo-sched-allow-regmoves -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg -fno-default-inline -fno-defer-pop -fno-function-cse -fno-guess-branch-probability -fno-inline
    -fno-math-errno -fno-peephole -fno-peephole2 -fno-sched-interblock -fno-sched-spec -fno-signed-zeros -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss -fomit-frame-pointer -foptimize-register-move
    -foptimize-sibling-calls -fpartial-inlining -fpeel-loops -fpredictive-commoning -fprefetch-loop-arrays -fprofile-report -fprofile-correction -fprofile-dir=path -fprofile-generate -fprofile-generate=path -fprofile-use
    -fprofile-use=path -fprofile-values -freciprocal-math -free -fregmove -frename-registers -freorder-blocks -freorder-blocks-and-partition -freorder-functions -frerun-cse-after-loop -freschedule-modulo-scheduled-loops
    -frounding-math -fsched2-use-superblocks -fsched-pressure -fsched-spec-load -fsched-spec-load-dangerous -fsched-stalled-insns-dep[=n] -fsched-stalled-insns[=n] -fsched-group-heuristic -fsched-critical-path-heuristic
    -fsched-spec-insn-heuristic -fsched-rank-heuristic -fsched-last-insn-heuristic -fsched-dep-count-heuristic -fschedule-insns -fschedule-insns2 -fsection-anchors -fselective-scheduling -fselective-scheduling2
    -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops -fshrink-wrap -fsignaling-nans -fsingle-precision-constant -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector -fstack-protector-all
    -fstack-protector-strong -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop
    -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-if-convert-stores -ftree-loop-im -ftree-phiprop -ftree-loop-distribution
    -ftree-loop-distribute-patterns -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize -ftree-parallelize-loops=n -ftree-pre -ftree-partial-pre -ftree-pta -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra
    -ftree-switch-conversion -ftree-tail-merge -ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp -funit-at-a-time -funroll-all-loops -funroll-loops -funsafe-loop-optimizations -funsafe-math-optimizations
    -funswitch-loops -fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb -fwhole-program -fwpa -fuse-ld=linker -fuse-linker-plugin --param name=value -O  -O0  -O1  -O2  -O3  -Os -Ofast -Og

gcc常用优化开关

-O3 -flto

参考 https://www.postgresql.org/message-id/7146D19B7FC9D5119F6400D0B7B993080318C991@az33exm22.corp.mot.com  

clang编译PostgreSQL

CC=/u02/digoal/llvm/bin/clang CFLAGS="-O2 -fstrict-enums -fno-signed-zeros" ./configure --prefix=/u02/digoal/soft_bak/pgsql9.5
CC=/u02/digoal/llvm/bin/clang make world -j 32
CC=/u02/digoal/llvm/bin/clang make install-world

性能对比测试

clang 3.9.0对比gcc 6.2.0编译的PostgreSQL。

避免IO瓶颈,使用内存较大的主机,观察profile。

select

1000万记录,全内存命中,基于主键查询压测。400连接。

$ psql
create table test(id int primary key, info text, crt_time timestamp);
insert into test select generate_series(1,10000000);

$ vi test.sql
\set id random(1, 10000000)
SELECT * FROM test where id=:id; 

$ pgbench -M prepared -n -f ./test.sql -h xxx.xxx.xxx.xxx -p 1921 -c 400 -j 400 -T 120

测试结果

-- gcc 6.2.0
tps
1124951

profile
61962.00 10.9% GetSnapshotData               /home/digoal/pgsql9.6_gcc/bin/postgres
20189.00  3.6% _bt_compare                   /home/digoal/pgsql9.6_gcc/bin/postgres
16353.00  2.9% hash_search_with_hash_value   /home/digoal/pgsql9.6_gcc/bin/postgres
14725.00  2.6% AllocSetAlloc                 /home/digoal/pgsql9.6_gcc/bin/postgres
13601.00  2.4% SearchCatCache                /home/digoal/pgsql9.6_gcc/bin/postgres
11787.00  2.1% LWLockAttemptLock             /home/digoal/pgsql9.6_gcc/bin/postgres

-- clang 3.9.0
tps
1120610

profile

61727.00 10.8% GetSnapshotData             /home/digoal/pgsql9.6/bin/postgres
19754.00  3.5% _bt_compare                 /home/digoal/pgsql9.6/bin/postgres
17741.00  3.1% AllocSetAlloc               /home/digoal/pgsql9.6/bin/postgres
15902.00  2.8% hash_search_with_hash_value /home/digoal/pgsql9.6/bin/postgres
13122.00  2.3% LWLockAcquire               /home/digoal/pgsql9.6/bin/postgres

insert

一张表,一个自增序列以及索引,并发插入,异步提交。

$ psql
create table test(id serial primary key, info text, crt_time timestamp) with (autovacuum_enabled=off);
alter sequence test_id_seq cache 100000;

$ vi test.sql
insert into test(info) values (null);

$ pgbench -M prepared -n -P 1 -f ./test.sql -h xxx.xxx.xxx.xxx -p 1921 -c 128 -j 128 -T 120

测试结果

-- gcc 6.2.0
tps
356761

profile

-- clang 3.9.0
tps
372643

profile

update

1000万记录,全内存命中,基于主键查询更新,异步提交。

$ psql
create table test(id int primary key, info text, crt_time timestamp) with (fillfactor=90);
insert into test select generate_series(1,10000000);

$ vi test.sql
\set id random(1, 10000000)
update test set info=info where id=:id;

$ pgbench -M prepared -n -f ./test.sql -h xxx.xxx.xxx.xxx -p 1921 -c 64 -j 64 -T 120

测试结果

-- gcc 6.2.0
tps
273016

profile

-- clang 3.9.0
tps
283776

profile

copy bulk

一张表,一个索引,并发COPY,异步提交。

$ psql
create table test(id int , info text, crt_time timestamp) with (autovacuum_enabled=off);
create index idx on test(id);
copy (select id,null,null from generate_series(1,100000) t(id)) to '/home/digoal/test.csv';

$ vi test.sql
copy test from '/home/digoal/test.csv';

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 16 -j 16 -h /u01/digoal/pg_root1921 -p 1921 -T 100

测试结果

-- gcc 6.2.0
tps
17.81

profile

-- clang 3.9.0
tps
18.146376

profile

如何诊断瓶颈

《Greenplum PostgreSQL –enable-profiling 产生gprof性能诊断代码》

《PostgreSQL 代码性能诊断之 - OProfile & Systemtap》

参考

1. http://www.kitware.com/blog/home/post/1016

2. http://grokbase.com/t/postgresql/pgsql-hackers/10bggd42rt/gcc-vs-clang

3. http://llvm.org/releases/download.html

4. http://www.tuicool.com/articles/Yz2Q7nz

Flag Counter

digoal’s 大量PostgreSQL文章入口