PostgreSQL 使用 pgbench 测试 sysbench 相关case - pg_oltp_bench

9 minute read

背景

pgbench是PostgreSQL的性能测试工具,C写的,调用LIBPQ,效率非常高。pgbench也支持自定义测试脚本,支持自定义随机算法,支持自定义脚本的weight设置等等,用途非常广泛。

sysbench是一个比较流行的测试软件框架,可测试内存,CPU,数据库,存储等。测试时调LUA的脚本进行测试,支持多线程,并发测试。

相比pgbench, sysbench的效率略低,另外pgbench的功能也非常强大,所以如果能把sysbench test case迁移到pgbench,对PG的用户来说不失为一桩美事。

题外话,pg还支持ecpg,嵌入式的开发,所以如果要测试更加复杂的场景,并且又要高效的话,可以尝试使用ecpg。

sysbench test case

sysbench自带了一些lua脚本,用于测试数据库。

https://github.com/akopytov/sysbench/tree/1.0/sysbench/tests/db

Makefile.am	  {common,bulkinsert}.lua missing from Makefile	a month ago
bulk_insert.lua	  Minor cleanups in bulk_insert.lua.	a year ago
common.lua	  Use DROP TABLE IF EXISTS on cleanup.	8 days ago
delete.lua	  Change lua scripts to 'require common' if a testdir wasn't passed	a month ago
insert.lua	  Change lua scripts to 'require common' if a testdir wasn't passed	a month ago
oltp.lua	  Merge branch '0.5' into 1.0	a month ago
oltp_simple.lua	  Change lua scripts to 'require common' if a testdir wasn't passed	a month ago
parallel_prepare.lua	        Change lua scripts to 'require common' if a testdir wasn't passed	a month ago
select.lua	                Change lua scripts to 'require common' if a testdir wasn't passed	a month ago
select_random_points.lua	Add libAttachSQL driver to Sysbench	2 years ago
select_random_ranges.lua	Add libAttachSQL driver to Sysbench	2 years ago
update_index.lua	        Change lua scripts to 'require common' if a testdir wasn't passed	a month ago
update_non_index.lua	        Change lua scripts to 'require common' if a testdir wasn't passed	a month ago

以oltp.lua为例,涉及的SQL如下

function get_range_str()
   local start = sb_rand(1, oltp_table_size)
   return string.format(" WHERE id BETWEEN %u AND %u",
                        start, start + oltp_range_size - 1)
end

function event(thread_id)
   local rs
   local i
   local table_name
   local c_val
   local pad_val
   local query

   table_name = "sbtest".. sb_rand_uniform(1, oltp_tables_count)
   if not oltp_skip_trx then
      db_query(begin_query)
   end

   if not oltp_write_only then

   for i=1, oltp_point_selects do
      rs = db_query("SELECT c FROM ".. table_name .." WHERE id=" ..
                       sb_rand(1, oltp_table_size))
   end

   if oltp_range_selects then

   for i=1, oltp_simple_ranges do
      rs = db_query("SELECT c FROM ".. table_name .. get_range_str())
   end

   for i=1, oltp_sum_ranges do
      rs = db_query("SELECT SUM(K) FROM ".. table_name .. get_range_str())
   end

   for i=1, oltp_order_ranges do
      rs = db_query("SELECT c FROM ".. table_name .. get_range_str() ..
                    " ORDER BY c")
   end

   for i=1, oltp_distinct_ranges do
      rs = db_query("SELECT DISTINCT c FROM ".. table_name .. get_range_str() ..
                    " ORDER BY c")
   end

   end

   end
   
   if not oltp_read_only then

   for i=1, oltp_index_updates do
      rs = db_query("UPDATE " .. table_name .. " SET k=k+1 WHERE id=" .. sb_rand(1, oltp_table_size))
   end

   for i=1, oltp_non_index_updates do
      c_val = sb_rand_str("###########-###########-###########-###########-###########-###########-###########-###########-###########-###########")
      query = "UPDATE " .. table_name .. " SET c='" .. c_val .. "' WHERE id=" .. sb_rand(1, oltp_table_size)
      rs = db_query(query)
      if rs then
        print(query)
      end
   end

   for i=1, oltp_delete_inserts do

   i = sb_rand(1, oltp_table_size)

   rs = db_query("DELETE FROM " .. table_name .. " WHERE id=" .. i)
   
   c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])
   pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]])

   rs = db_query("INSERT INTO " .. table_name ..  " (id, k, c, pad) VALUES " .. string.format("(%d, %d, '%s', '%s')",i, sb_rand(1, oltp_table_size) , c_val, pad_val))

   end

   end -- oltp_read_only

   if not oltp_skip_trx then
      db_query(commit_query)
   end

end

都是很常见的语句,只不过多了一个sb_rand_str函数,用于生成随机的字符。

pg生成随机字符

可以使用如下C函数生成类似的随机字符。

$ git clone https://github.com/postgrespro/pg_oltp_bench
  or
$ git clone https://github.com/digoal/pg_oltp_bench

创建了一个函数,用于生成随机数值,这个是sysbench中使用的。

pg_oltp_bench.c


#include "postgres.h"

#include "fmgr.h"
#include "mb/pg_wchar.h"
#include "utils/datum.h"

PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(sb_rand_str);

Datum
sb_rand_str(PG_FUNCTION_ARGS)
{
        text       *result;
        char       *p,
                           *end;

        result = (text *) DatumGetPointer(datumCopy(PG_GETARG_DATUM(0), false, -1));

        p = VARDATA_ANY(result);
        end = p + VARSIZE_ANY_EXHDR(result);

        while (p < end)
        {
                int len = pg_mblen(p);

                if (len == 1)
                {
                        if (*p == '#')
                                *p = (random() % ('9' - '0' + 1)) + '0';
                        else if (*p == '@')
                                *p = (random() % ('z' - 'a' + 1)) + 'a';
                }
                p += len;
        }

        PG_RETURN_POINTER(result);
}

安装和测试

$ cd pg_oltp_bench
$ make USE_PGXS=1
$ make USE_PGXS=1 install

$ psql


postgres=# create extension pg_oltp_bench ;
postgres=# select sb_rand_str('###########-###########-###########-###########-###########-###########-###########-###########-###########-###########');
                                                       sb_rand_str                                                       
-------------------------------------------------------------------------------------------------------------------------
 20038540132-50323775630-82458365307-53537542912-24813396489-03961491366-81366879005-03843501626-78919266849-32710742245
(1 row)

postgres=# select sb_rand_str('###########-###########-###########-###########-########');
                       sb_rand_str                        
----------------------------------------------------------
 38000383151-08324920063-89353598228-34982407725-07119227
(1 row)

postgres=# select sb_rand_str('###########-###########-###########-###########-########');
                       sb_rand_str                        
----------------------------------------------------------
 50905276291-14523885757-56469681351-25872594628-34314810
(1 row)

符合测试要求。

对比pgbench与sysbench性能

接下来使用select.lua作为测试对比的CASE

初始化

DROP TABLE IF EXISTS sbtest;

CREATE TABLE sbtest(
	id SERIAL PRIMARY KEY,
	k INTEGER DEFAULT '0' NOT NULL,
	c CHAR(120) DEFAULT '' NOT NULL,
	pad CHAR(60) DEFAULT '' NOT NULL);

INSERT INTO sbtest (k, c, pad)
SELECT
	(random() * 10000000)::int + 1 AS k,
	sb_rand_str('###########-###########-###########-###########-###########-###########-###########-###########-###########-###########') AS c,
	sb_rand_str('###########-###########-###########-###########-###########') AS pad
FROM
	generate_series(1, 10000000) j;

CREATE INDEX sbtest_k_idx ON sbtest(k);

1. pgbench

$ vi test.sql


\set id1 random(1, 10000000)
SELECT pad FROM sbtest WHERE id = :id1;

测试

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -h xxx.xxx.xxx.xxx -p 1924 -U postgres postgres -c 400 -j 400 -T 100

transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 400
number of threads: 400
duration: 100 s
number of transactions actually processed: 112965285
latency average = 0.349 ms
latency stddev = 0.186 ms
tps = 1128708.538575 (including connections establishing)
tps = 1140986.576274 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.001  \set id1 random(1, 10000000)
         0.348  SELECT pad FROM sbtest WHERE id = :id1;

pgbench本身的CPU开销

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                         
47899 digoal  20   0 28.8g  17m  980 S 1549.0  0.0  10:10.19 pgbench -M prepared -n -r -P 1 -f ./test.sql -h xxx.xxx.xxx.xxx -p 1924 -U postgres postgres -c 400 -j 400 -T 100

数据库的开销

top - 22:55:41 up 35 days,  4:24,  1 user,  load average: 157.34, 44.42, 15.25
Tasks: 1747 total, 288 running, 1459 sleeping,   0 stopped,   0 zombie
Cpu(s): 73.0%us, 16.1%sy,  0.0%ni,  0.3%id,  0.0%wa,  0.0%hi, 10.6%si,  0.0%st
Mem:  529321832k total, 192584004k used, 336737828k free,   834960k buffers
Swap:        0k total,        0k used,        0k free, 21107224k cached

2. sysbench

安装软件

$ git clone https://github.com/digoal/sysbench

https://github.com/akopytov/sysbench/blob/1.0/sysbench/tests/db/select.lua

编辑select.lua,使用服务端绑定变量

$ vi select.lua

pathtest = string.match(test, "(.*/)") or ""

dofile(pathtest .. "common.lua")

function thread_init(thread_id)
   set_vars()

   local table_name
   table_name = "sbtest"

   db_query("prepare p1(int) as select pad from " .. table_name .. " WHERE id=$1")
end

function event(thread_id)
   db_query("execute p1(" .. sb_rand(1, oltp_table_size) .. ")")
end

测试

./sysbench_pg   \
--test=lua/select.lua   \
--db-driver=pgsql   \
--pgsql-host=xxx.xxx.xxx.xxx   \
--pgsql-port=1924   \
--pgsql-user=postgres   \
--pgsql-password=postgres   \
--pgsql-db=postgres   \
--oltp-tables-count=64   \
--oltp-table-size=10000000   \
--num-threads=400  \
--max-time=120  \
--max-requests=0 \
--report-interval=5 \
run



OLTP test statistics:
    queries performed:
        read:                            0
        write:                           0
        other:                           69504575
        total:                           69504575
    transactions:                        0      (0.00 per sec.)
    deadlocks:                           0      (0.00 per sec.)
    read/write requests:                 0      (0.00 per sec.)
    other operations:                    69504575 (579193.85 per sec.)

General statistics:
    total time:                          120.0023s
    total number of events:              69504175
    total time taken by event execution: 43394.3854s
    response time:
         min:                                  0.07ms
         avg:                                  0.62ms
         max:                                 49.63ms
         approx.  95 percentile:               1.91ms

Threads fairness:
    events (avg/stddev):           173760.4375/2683.48
    execution time (avg/stddev):   108.4860/0.25

sysbench本身的CPU开销

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
49712 digoal  20   0 25.1g 114m 1188 S 6383.5  0.0  60:43.46 ./sysbench_pg --test lua/select.lua --db-driver pgsql --pgsql-host xxx.xxx.xxx.xxx .....

数据库的开销

top - 23:09:38 up 35 days,  4:38,  1 user,  load average: 4.20, 5.79, 10.98
Tasks: 1747 total,  49 running, 1698 sleeping,   0 stopped,   0 zombie
Cpu(s): 38.9%us,  8.2%sy,  0.0%ni, 48.6%id,  0.0%wa,  0.0%hi,  4.3%si,  0.0%st
Mem:  529321832k total, 192590260k used, 336731572k free,   835032k buffers
Swap:        0k total,        0k used,        0k free, 21108036k cached

小结

sysbench自身的开销更大,几乎把CPU 64个核用满了,而pgbench只用了25%的CPU。

另一方面pgbench已经把PG数据库的硬件资源全部用光,达到了峰值性能。

sysbench并没有把PostgreSQL数据库的资源打满,只用了50%的硬件资源的样子。

pgbench简单讲解

既然pgbench效率较好,所以简单的讲解一下pgbench的用法。

1. 自定义变量

\set 变量名 表达式/值 


设置range为常量
\set range 10000

设置id为随机值, 范围1到range
\set id random(1, :range)

设置x为range的一半
\set x :range/2

2. 使用变量

使用:引用变量即可

\set id random(1,100)

select * from test where id = :id;

3. 变量支持的表达式

The expression may contain :
  integer constants such as 5432, 
  double constants such as 3.14159, 
  references to variables :variablename, 
  unary operators (+, -) 
    and 
  binary operators (+, -, *, /, %) 
  with their usual precedence and associativity, function calls, and parentheses.

           Examples:

               \set ntellers 10 * :scale
               \set aid (1021 * random(1, 100000 * :scale)) % (100000 * :scale) + 1

表达式中支持的函数

+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|Function                           | Return Type                     | Description                         | Example                        | Result                      |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|abs(a)                             | same as a                       | absolute value                      | abs(-17)                       | 17                          |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|debug(a)                           | same as a                       | print a to stderr,                  | debug(5432.1)                  | 5432.1                      |
|                                   |                                 |         and return a                |                                |                             |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|double(i)                          | double                          | cast to double                      | double(5432)                   | 5432.0                      |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|greatest(a [, ... ] )              | double if any a is double, else | largest value among arguments       | greatest(5, 4, 3, 2)           | 5                           |
|                                   | integer                         |                                     |                                |                             |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|int(x)                             | integer                         | cast to int                         | int(5.4 + 3.8)                 | 9                           |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|least(a [, ... ] )                 | double if any a is double, else | smallest value among arguments      | least(5, 4, 3, 2.1)            | 2.1                         |
|                                   | integer                         |                                     |                                |                             |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|pi()                               | double                          | value of the constant PI            | pi()                           | 3.14159265358979323846      |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|random(lb, ub)                     | integer                         | uniformly-distributed random        | random(1, 10)                  | an integer between 1 and 10 |
|                                   |                                 | integer in [lb, ub]                 |                                |                             |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|random_exponential(lb, ub,         | integer                         | exponentially-distributed random    | random_exponential(1, 10, 3.0) | an integer between 1 and 10 |
|parameter)                         |                                 | integer in [lb, ub],                |                                |                             |
|                                   |                                 |               see below             |                                |                             |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|random_gaussian(lb, ub, parameter) | integer                         | Gaussian-distributed random integer | random_gaussian(1, 10, 2.5)    | an integer between 1 and 10 |
|                                   |                                 | in [lb, ub],                        |                                |                             |
|                                   |                                 |               see below             |                                |                             |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|sqrt(x)                            | double                          | square root                         | sqrt(2.0)                      | 1.414213562                 |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+

4. 传入变量

可以通过pgbench向script传入变量,也可以使用内置的变量。

       -D varname=value
       --define=varname=value
           Define a variable for use by a custom script (see below). Multiple -D options are allowed.

自动变量,很有用,特别是client_id,可以用来拼接表名
+----------+-----------------------------------------------------------------+
|Variable  | Description                                                     |
+----------+-----------------------------------------------------------------+
|scale     | current scale factor                                            |
+----------+-----------------------------------------------------------------+
|client_id | unique number identifying the client session (starts from zero) |
+----------+-----------------------------------------------------------------+

5. 随机值算法

+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|random(lb, ub)                     | integer                         | uniformly-distributed random        | random(1, 10)                  | an integer between 1 and 10 |
|                                   |                                 | integer in [lb, ub]                 |                                |                             |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|random_exponential(lb, ub,         | integer                         | exponentially-distributed random    | random_exponential(1, 10, 3.0) | an integer between 1 and 10 |
|parameter)                         |                                 | integer in [lb, ub],                |                                |                             |
|                                   |                                 |               see below             |                                |                             |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+
|random_gaussian(lb, ub, parameter) | integer                         | Gaussian-distributed random integer | random_gaussian(1, 10, 2.5)    | an integer between 1 and 10 |
|                                   |                                 | in [lb, ub],                        |                                |                             |
|                                   |                                 |               see below             |                                |                             |
+-----------------------------------+---------------------------------+-------------------------------------+--------------------------------+-----------------------------+

6. 模拟程序交互式, 睡眠

       \sleep number [ us | ms | s ]
           Causes script execution to sleep for the specified duration in microseconds (us), milliseconds (ms) or seconds (s). 
	   If the unit is omitted then seconds are the default.  
	   number can be either an integer constant or a :variablename reference to a variable having an integer value.  

           Example:

               \sleep 10 ms

7. 脚本weight

当使用多个脚本时,可以为每个脚本设置weight,即权重。如果不设置,默认为1.

pgbench -f ./test1.sql@100 -f ./test2.sql@200

那么有1/3的机会调用test1.sql, 有2/3的机会调用test2.sql  

详见 man pgbench

https://www.postgresql.org/docs/9.6/static/pgbench.html

pgbench动态表名

目前pgbench不支持动态表名,建议可以修改pgbench程序来实现动态表名,例如client_id是一个很不错的选择。

$ vi test.sql
\set id1 random(1, 10000000)
SELECT pad FROM "sbtest:client_id" WHERE id = :id1;

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -h xxx.xxx.xxx.xxx -p 1924 -U postgres postgres -c 3 -j 3 -T 100 

目前会报错,只支持simple模式。  如果要让prepared模式支持,建议改pgbench代码来支持. 例如使用:::varname时,拼接identifid。   
ERROR:  relation "sbtest$1" does not exist
LINE 1: SELECT pad FROM "sbtest$1" WHERE id = $2;
                        ^
client 0 aborted in state 1: ERROR:  prepared statement "P0_1" does not exist
ERROR:  relation "sbtest$1" does not exist
LINE 1: SELECT pad FROM "sbtest$1" WHERE id = $2;
                        ^
client 1 aborted in state 1: ERROR:  prepared statement "P0_1" does not exist
ERROR:  relation "sbtest$1" does not exist
LINE 1: SELECT pad FROM "sbtest$1" WHERE id = $2;
                        ^
client 2 aborted in state 1: ERROR:  prepared statement "P0_1" does not exist

可以修改这个代码来支持

/*
 * Parse the raw sql and replace :param to $n.
 */
static bool
parseQuery(Command *cmd, const char *raw_sql)
{
        char       *sql,
                           *p;

        sql = pg_strdup(raw_sql);
        cmd->argc = 1;

        p = sql;
        while ((p = strchr(p, ':')) != NULL)
        {
                char            var[12];
                char       *name;
                int                     eaten;

                name = parseVariable(p, &eaten);
                if (name == NULL)
                {
                        while (*p == ':')
                        {
                                p++;
                        }
                        continue;
                }

                if (cmd->argc >= MAX_ARGS)
                {
                        fprintf(stderr, "statement has too many arguments (maximum is %d): %s\n", MAX_ARGS - 1, raw_sql);
                        pg_free(name);
                        return false;
                }

                sprintf(var, "$%d", cmd->argc);
                p = replaceVariable(&sql, p, eaten, var);

                cmd->argv[cmd->argc] = name;
                cmd->argc++;
        }

        cmd->argv[0] = sql;
        return true;
}

小结

sysbench自身的开销更大,几乎把CPU 64个核用满了,而pgbench只用了25%的CPU。

另一方面pgbench已经把PG数据库的硬件资源全部用光,达到了峰值性能。

sysbench并没有把PostgreSQL数据库的资源打满,只用了50%的硬件资源的样子。

pgbench如果要支持动态表名,需要调整代码,此前,可以使用多个FILE来代替此功能。

ecpg参考 https://www.postgresql.org/docs/current/static/ecpg.html

Flag Counter

digoal’s 大量PostgreSQL文章入口