PostgreSQL 那些使用临时文件的操作与跟踪

9 minute read

背景

在数据库中的一些操作,可能会用到临时文件,比如排序,HASH JOIN,中间结果存储,聚合等等。

为了提高数据库的执行效率,一些操作可能会使用内存代替临时存储,仅仅当内存不足时使用临时文件。

通过work_mem可以设置会话Query使用的临时内存的阈值,当然一个Query中如果使用了大量并行的排序等操作时,或者使用了并行计算多个WORKER PROCESS时,可能用到多个WORK_MEM空间,那么内存的使用量也许会比较大。

注意,临时表与Query执行过程中使用的临时文件并不一样,包括我们后面提到的几个临时文件使用跟踪的参数,实际上跟踪的都是Query执行过程中产生的临时文件,而不是临时表。请注意。

在控制临时文件使用量,使用个数的参数上,控制的也是Query执行过程中产生的临时文件,并不会控制临时表使用多少文件。

相关参数

临时表相关

虽然与本文提到的Query使用临时文件不相干,但是还是提一下

#temp_buffers = 8MB                     # min 800kB  

1、temp_buffers (integer)

Sets the maximum number of temporary buffers used by each database session. These are session-local buffers used only for access to temporary tables. The default is eight megabytes (8MB). The setting can be changed within individual sessions, but only before the first use of temporary tables within the session; subsequent attempts to change the value will have no effect on that session.

A session will allocate temporary buffers as needed up to the limit given by temp_buffers. The cost of setting a large value in sessions that do not actually need many temporary buffers is only a buffer descriptor, or about 64 bytes, per increment in temp_buffers. However if a buffer is actually used an additional 8192 bytes will be consumed for it (or in general, BLCKSZ bytes).

#temp_tablespaces = ''                  # a list of tablespace names, '' uses  
                                        # only default tablespace  

2、temp_tablespaces (string)

This variable specifies tablespaces in which to create temporary objects (temp tables and indexes on temp tables) when a CREATE command does not explicitly specify a tablespace. Temporary files for purposes such as sorting large data sets are also created in these tablespaces.

The value is a list of names of tablespaces. When there is more than one name in the list, PostgreSQL chooses a random member of the list each time a temporary object is to be created; except that within a transaction, successively created temporary objects are placed in successive tablespaces from the list. If the selected element of the list is an empty string, PostgreSQL will automatically use the default tablespace of the current database instead.

When temp_tablespaces is set interactively, specifying a nonexistent tablespace is an error, as is specifying a tablespace for which the user does not have CREATE privilege. However, when using a previously set value, nonexistent tablespaces are ignored, as are tablespaces for which the user lacks CREATE privilege. In particular, this rule applies when using a value set in postgresql.conf.

The default value is an empty string, which results in all temporary objects being created in the default tablespace of the current database.

See also default_tablespace.

Query使用临时文件相关

限制一条个PostgreSQL process最多能使用多少临时空间,通常临时空间在事务结束、Query结束后会自动回收。

包括数据库启动时,也会自动清理临时文件。

#temp_file_limit = -1                   # limits per-process temp file space  
                                        # in kB, or -1 for no limit  

1、temp_file_limit (integer)

Specifies the maximum amount of disk space that a process can use for temporary files, such as sort and hash temporary files, or the storage file for a held cursor. A transaction attempting to exceed this limit will be canceled. The value is specified in kilobytes, and -1 (the default) means no limit. Only superusers can change this setting.

This setting constrains the total space used at any instant by all temporary files used by a given PostgreSQL process. It should be noted that disk space used for explicit temporary tables, as opposed to temporary files used behind-the-scenes in query execution, does not count against this limit.

当临时文件使用量大于设置阈值时,记录日志。

#log_temp_files = -1                    # log temporary files equal or larger  
                                        # than the specified size in kilobytes;  
                                        # -1 disables, 0 logs all temp files  

2、log_temp_files (integer)

Controls logging of temporary file names and sizes. Temporary files can be created for sorts, hashes, and temporary query results. A log entry is made for each temporary file when it is deleted. A value of zero logs all temporary file information, while positive values log only files whose size is greater than or equal to the specified number of kilobytes. The default setting is -1, which disables such logging. Only superusers can change this setting.

当一些Query的操作,使用的内存量大于指定阈值时,就会触发使用临时文件。包括排序,IDSTINCT,MERGE JOIN,HASH JOIN,哈希聚合,分组聚合,SRF,递归查询 等操作。

3、work_mem (integer)

Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. The value defaults to four megabytes (4MB). Note that for a complex query, several sort or hash operations might be running in parallel; each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries.

跟踪临时文件使用

1、设置跟踪阈值为0KB

log_temp_files = 0  

哪些查询可能用到临时文件,例子

Query要使用的内存超出work_mem的大小时(包括排序,IDSTINCT,MERGE JOIN,HASH JOIN,哈希聚合,分组聚合,SRF,递归查询 等)

1、SRF中间结果很大,超过work_mem的大小。

postgres=# select * from generate_series(1,10000000);  
2018-03-14 10:15:34.284 CST,"postgres","postgres",49481,"[local]",5aa885b7.c149,3,"SELECT",2018-03-14 10:15:19 CST,3/5018569,0,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp49481.0"", size 140000000",,,,,,"select * from generate_series(1,10000000);",,"FileClose, fd.c:1564","psql"  

2、如果srf的结果直接写表(ProjectSet),不使用临时文件,而是直接落盘

postgres=# insert into ttbl1 select generate_series(1,10000000);  
INSERT 0 10000000  

3、如果SRF的结果需要计算后写表(假设计算无法流式的话,一样要使用临时文件)

postgres=# insert into ttbl1 select count(*) from generate_series(1,10000000);  
INSERT 0 1  
2018-03-14 10:42:34.412 CST,"postgres","postgres",49481,"[local]",5aa885b7.c149,8232,"INSERT",2018-03-14 10:15:19 CST,3/5018657,372671965,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp49481.8216"", size 140000000",,,,,,"insert into ttbl1 select count(*) from generate_series(1,10000000);",,"FileClose, fd.c:1564","psql"  
insert into ttbl1 select id+1 from generate_series(1,10000000) t(id);  
2018-03-14 10:43:58.365 CST,"postgres","postgres",49481,"[local]",5aa885b7.c149,8233,"INSERT",2018-03-14 10:15:19 CST,3/5018658,372671966,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp49481.8217"", size 140000000",,,,,,"insert into ttbl1 select id+1 from generate_series(1,10000000) t(id);",,"FileClose, fd.c:1564","psql"  

insert into ttbl1 select id from generate_series(1,10000000) t(id);

2018-03-14 10:44:41.623 CST,"postgres","postgres",49481,"[local]",5aa885b7.c149,8235,"INSERT",2018-03-14 10:15:19 CST,3/5018659,372671968,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp49481.8218"", size 140000000",,,,,,"insert into ttbl1 select id from generate_series(1,10000000) t(id);",,"FileClose, fd.c:1564","psql"  
postgres=# explain insert into ttbl1 select id from generate_series(1,10000000) t(id);  
                                   QUERY PLAN                                     
--------------------------------------------------------------------------------  
 Insert on ttbl1  (cost=0.00..10.00 rows=1000 width=4)  
   ->  Function Scan on generate_series t  (cost=0.00..10.00 rows=1000 width=4)  
(2 rows)  
  
postgres=# explain insert into ttbl1 select  generate_series(1,10000000);  
                      QUERY PLAN                         
-------------------------------------------------------  
 Insert on ttbl1  (cost=0.00..15.02 rows=1000 width=4)  
   ->  ProjectSet  (cost=0.00..5.02 rows=1000 width=4)  
         ->  Result  (cost=0.00..0.01 rows=1 width=0)  
(3 rows)  

4、排序超过了WORK_MEM

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t order by uid1+1 ;  
                                                        QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------  
 Sort  (cost=335108.17..341406.12 rows=2519177 width=12) (actual time=1675.496..2085.586 rows=2519177 loops=1)  
   Output: uid1, uid2, ((uid1 + 1))  
   Sort Key: ((t.uid1 + 1))  
   Sort Method: external merge  Disk: 54240kB  
   Buffers: shared hit=11171, temp read=6780 written=6782  
   ->  Seq Scan on public.t  (cost=0.00..42660.71 rows=2519177 width=12) (actual time=0.010..428.943 rows=2519177 loops=1)  
         Output: uid1, uid2, (uid1 + 1)  
         Buffers: shared hit=11171  
 Planning time: 0.087 ms  
 Execution time: 2373.000 ms  
(10 rows)  
2018-03-14 10:17:34.646 CST,"postgres","postgres",49481,"[local]",5aa885b7.c149,5,"EXPLAIN",2018-03-14 10:15:19 CST,3/5018589,0,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp49481.1"", size 55541760",,,,,,"explain (analyze,verbose,timing,costs,buffers) select * from t order by uid1+1 ;",,"FileClose, fd.c:1564","psql"  

5、merge 或 hash JOIN

当merge join需要排序并且超出排序所需的work_mem时。

postgres=# set enable_mergejoin=off;  
SET  
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from t t1 join t t2 using (uid1,uid2);  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=188386.00..188386.01 rows=1 width=8) (actual time=4198.762..4198.762 rows=1 loops=1)  
   Output: count(*)  
   Buffers: shared hit=22342, temp read=31780 written=23590  
   ->  Hash Join  (cost=83991.43..188294.64 rows=36541 width=0) (actual time=1336.612..3907.501 rows=2519177 loops=1)  
         Inner Unique: true  
         Hash Cond: ((t1.uid1 = t2.uid1) AND (t1.uid2 = t2.uid2))  
         Buffers: shared hit=22342, temp read=31780 written=23590  
         ->  Seq Scan on public.t t1  (cost=0.00..36362.77 rows=2519177 width=8) (actual time=0.010..352.510 rows=2519177 loops=1)  
               Output: t1.uid1, t1.uid2  
               Buffers: shared hit=11171  
         ->  Hash  (cost=36362.77..36362.77 rows=2519177 width=8) (actual time=1294.523..1294.523 rows=2519177 loops=1)  
               Output: t2.uid1, t2.uid2  
               Buckets: 2048  Batches: 4096  Memory Usage: 39kB  
               Buffers: shared hit=11171, temp written=7700  
               ->  Seq Scan on public.t t2  (cost=0.00..36362.77 rows=2519177 width=8) (actual time=0.003..399.532 rows=2519177 loops=1)  
                     Output: t2.uid1, t2.uid2  
                     Buffers: shared hit=11171  
 Planning time: 0.265 ms  
 Execution time: 4198.828 ms  
(19 rows)  

用了不止一个文件(4096),对应Batches个数。

..............  
2018-03-14 10:34:37.866 CST,"postgres","postgres",49481,"[local]",5aa885b7.c149,8212,"EXPLAIN",2018-03-14 10:15:19 CST,3/5018640,0,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp49481.595"", size 15596",,,,,,"explain (analyze,verbose,timing,costs,buffers) select count(*) from t t1 join t t2 using (uid1,uid2);",,"FileClose, fd.c:1564","psql"  
2018-03-14 10:34:37.867 CST,"postgres","postgres",49481,"[local]",5aa885b7.c149,8213,"EXPLAIN",2018-03-14 10:15:19 CST,3/5018640,0,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp49481.4690"", size 15596",,,,,,"explain (analyze,verbose,timing,costs,buffers) select count(*) from t t1 join t t2 using (uid1,uid2);",,"FileClose, fd.c:1564","psql"  

Batches个数。与work_mem设置相关,越大,约少Batches。

postgres=# set work_mem='64MB';  
SET  
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from t t1 join t t2 using (uid1,uid2);  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=188386.00..188386.01 rows=1 width=8) (actual time=3201.812..3201.812 rows=1 loops=1)  
   Output: count(*)  
   Buffers: shared hit=22342, temp read=12924 written=12918  
   ->  Hash Join  (cost=83991.43..188294.64 rows=36541 width=0) (actual time=987.136..2915.131 rows=2519177 loops=1)  
         Inner Unique: true  
         Hash Cond: ((t1.uid1 = t2.uid1) AND (t1.uid2 = t2.uid2))  
         Buffers: shared hit=22342, temp read=12924 written=12918  
         ->  Seq Scan on public.t t1  (cost=0.00..36362.77 rows=2519177 width=8) (actual time=0.010..292.348 rows=2519177 loops=1)  
               Output: t1.uid1, t1.uid2  
               Buffers: shared hit=11171  
         ->  Hash  (cost=36362.77..36362.77 rows=2519177 width=8) (actual time=984.214..984.214 rows=2519177 loops=1)  
               Output: t2.uid1, t2.uid2  
               Buckets: 2097152  Batches: 4  Memory Usage: 40988kB  
               Buffers: shared hit=11171, temp written=6456  
               ->  Seq Scan on public.t t2  (cost=0.00..36362.77 rows=2519177 width=8) (actual time=0.006..324.215 rows=2519177 loops=1)  
                     Output: t2.uid1, t2.uid2  
                     Buffers: shared hit=11171  
 Planning time: 0.245 ms  
 Execution time: 3201.982 ms  
(19 rows)  

6、HASH 或 group agg

postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from t group by (random()*10)::int;  
                                                           QUERY PLAN                                                             
--------------------------------------------------------------------------------------------------------------------------------  
 GroupAggregate  (cost=401830.06..464809.48 rows=2519177 width=12) (actual time=1907.227..2622.407 rows=11 loops=1)  
   Output: count(*), (((random() * '10'::double precision))::integer)  
   Group Key: (((random() * '10'::double precision))::integer)  
   Buffers: shared hit=11171, temp read=24908 written=27447  
   ->  Sort  (cost=401830.06..408128.00 rows=2519177 width=4) (actual time=1869.656..2249.684 rows=2519177 loops=1)  
         Output: (((random() * '10'::double precision))::integer)  
         Sort Key: (((random() * '10'::double precision))::integer)  
         Sort Method: external merge  Disk: 34544kB  
         Buffers: shared hit=11171, temp read=24908 written=27447  
         ->  Seq Scan on public.t  (cost=0.00..55256.60 rows=2519177 width=4) (actual time=0.014..399.124 rows=2519177 loops=1)  
               Output: ((random() * '10'::double precision))::integer  
               Buffers: shared hit=11171  
 Planning time: 0.104 ms  
 Execution time: 2629.891 ms  
(14 rows)  
2018-03-14 10:33:14.781 CST,"postgres","postgres",49481,"[local]",5aa885b7.c149,17,"EXPLAIN",2018-03-14 10:15:19 CST,3/5018620,0,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp49481.7"", size 35373056",,,,,,"explain (analyze,verbose,timing,costs,buffers) select count(*) from t group by (random()*10)::int;",,"FileClose, fd.c:1564","psql"  

7、distinct

postgres=# explain (analyze,verbose,timing,costs,buffers) select count(distinct uid1) from t;  
                                                        QUERY PLAN                                                          
--------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=42660.71..42660.72 rows=1 width=8) (actual time=1961.611..1961.612 rows=1 loops=1)  
   Output: count(DISTINCT uid1)  
   Buffers: shared hit=11171, temp read=18053 written=18971  
   ->  Seq Scan on public.t  (cost=0.00..36362.77 rows=2519177 width=4) (actual time=0.009..279.037 rows=2519177 loops=1)  
         Output: uid1, uid2  
         Buffers: shared hit=11171  
 Planning time: 0.081 ms  
 Execution time: 1961.654 ms  
(8 rows)  
2018-03-14 10:32:14.004 CST,"postgres","postgres",49481,"[local]",5aa885b7.c149,16,"EXPLAIN",2018-03-14 10:15:19 CST,3/5018618,0,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp49481.6"", size 30318592",,,,,,"explain (analyze,verbose,timing,costs,buffers) select count(distinct uid1) from t;",,"FileClose, fd.c:1564","psql"  

8、递归查询

特别是死循环

postgres=# WITH RECURSIVE t(n) AS (  
    SELECT 1  
  UNION ALL  
    SELECT n+1 FROM t  
)  
SELECT n FROM t LIMIT 10000;  
2018-03-14 11:03:58.574 CST,"postgres","postgres",49481,"[local]",5aa885b7.c149,8238,"SELECT",2018-03-14 10:15:19 CST,3/5018664,0,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp49481.8219"", size 140000",,,,,,"WITH RECURSIVE t(n) AS (  
    SELECT 1  
  UNION ALL  
    SELECT n+1 FROM t  
)  
SELECT n FROM t LIMIT 10000;",,"FileClose, fd.c:1564","psql"  

《PostgreSQL 递归死循环案例及解法》

普通的CTE 会用到临时文件吗?

取决于内部执行的QUERY,以及执行计划是否有必要使用临时文件,同样使用临时文件的阈值还是work_mem。一条QUERY可以使用多个work_mem,还是看执行计划,你可以简单理解为执行树中,每一个NODE都有机会用到work_mem的大小的内存。

9、在当前会话中跟踪temp file的使用

postgres=# set log_temp_files =0;
SET
postgres=# set client_min_messages =log;
SET

postgres=# create table t_j(id int , info text);
LOG:  statement: create table t_j(id int , info text);
CREATE TABLE
postgres=# insert into t_j select generate_series(1,1000000);
INSERT 0 1000000

postgres=# set work_mem ='4MB';
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from t_j t1 join t_j t2 using (id);
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2056", size 1498584
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2071", size 1498584
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2058", size 1497192
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2073", size 1497192
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2053", size 1495632
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2068", size 1495632
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2050", size 1506096
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2065", size 1506096
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2055", size 1503504
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2070", size 1503504
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2052", size 1505064
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2067", size 1505064
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2051", size 1495416
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2066", size 1495416
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2060", size 1498392
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2075", size 1498392
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2046", size 1495440
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2061", size 1495440
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2049", size 1501536
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2064", size 1501536
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2057", size 1498752
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2072", size 1498752
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2054", size 1498032
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2069", size 1498032
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2059", size 1513032
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2074", size 1513032
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2048", size 1500360
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2063", size 1500360
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2047", size 1489584
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp27966.2062", size 1489584
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=73228.00..73228.01 rows=1 width=8) (actual time=1167.670..1167.671 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=8850, temp read=5532 written=5502
   ->  Hash Join  (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=375.622..1053.586 rows=1000000 loops=1)
         Hash Cond: (t1.id = t2.id)
         Buffers: shared hit=8850, temp read=5532 written=5502
         ->  Seq Scan on public.t_j t1  (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.012..155.899 rows=1000000 loops=1)
               Output: t1.id
               Buffers: shared hit=4425
         ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=375.207..375.207 rows=1000000 loops=1)
               Output: t2.id
               Buckets: 131072  Batches: 16  Memory Usage: 3227kB
               Buffers: shared hit=4425, temp written=2736
               ->  Seq Scan on public.t_j t2  (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.005..157.634 rows=1000000 loops=1)
                     Output: t2.id
                     Buffers: shared hit=4425
 Planning time: 0.115 ms
 Execution time: 1167.726 ms
(18 rows)

小结

当一些Query的操作,使用的内存量大于work_mem指定阈值时,就会触发使用临时文件。包括排序,IDSTINCT,MERGE JOIN,HASH JOIN,哈希聚合,分组聚合,SRF,递归查询 等操作。

通过设置 log_temp_files ,当会话使用临时文件大小超过了设置大小,就可以跟踪到临时文件的使用。

Flag Counter

digoal’s 大量PostgreSQL文章入口