[转]一篇超赞的 PostgreSQL query 生命周期分析

6 minute read

背景

一个查询从客户端发起请求到客户端收到请求,一般分为6个阶段:
客户端请求
Parser将原始的query string转换为parsed query tree,通过打开参数debug_print_parse 可以观察这部分信息。
Analyzer,根据重写规则重构query tree,(例如view),输出为Query对象,通过打开参数debug_print_rewritten 可以观察这部分信息。
Planner产生plan tree,可能是成本最优算法或遗传算法。通过打开参数debug_print_plan 可以观察这部分信息。
Executor递归的处理plan tree的每个节点,
将结果返回给客户端。

详见以下:

[原文]
https://github.com/pipelinedb/pipelinedb/wiki/Lifecycle-of-a-query

This document discusses each major component of the query path, from client requests down to query execution. Thus it is effectively a breadth-first tour of the code base.

Client request

The lifecycle of a query begins when a client issues a request to the server. PostgreSQL uses a simple message format: a message’s first byte indicates its type (e.g. query, row transfer, open cursor, etc.), and its next four bytes specify its length. The message body is then encoded in whatever format is associated with the given message type. Have a look at some of the message types that PostgreSQL uses.

The message types used by clients are wrapped in a library called libpq. Its prototypes can be found here: https://github.com/pipelinedb/pipelinedb/tree/master/src/include/libpq.

Each cient request is serviced by a single process. Since PostgreSQL is a single-threaded system, all concurrency happens at the process level. A client can obtain its backend process in one of two ways:

There is already a backend process running that isn’t busy so it connects to the client. Running backend processes continuously try to read new incoming messages when they’re not actually processing a query, so a process that’s not doing any work can pick up a client’s request.

No backend processes are available, so a new one is forked by the postmaster process. The postmaster is essentially a proxy server as well as a supervisor for its children processes.

Parser

After a client connects to a backend, the backend process reads the client’s message. Once the message type is known, it is passed to a switch statement that invokes the appropriate functionality for the given message type. For the purposes of this document, we’re going to focus on ‘Q’ messages, which are, not surprisingly, queries.

Query messages are ultimately passed to postgres.c:exec_simple_query. It’s worth noting that the name of this function is somewhat misleading: most queries are processed by exec_simple_query, including ones that are extremely complex. exec_standard_query or exec_query might have been better names for it. However, other query types such as cursor opens, fetches, and BIND statements are processed by their own functions.

Parsing the query is the first step of exec_simple_query. The parser takes the raw input SQL encoded in the ‘Q’ message and parses it into a tree of query nodes, which is the ideal data structure for representing arbitrary combinations of inputs, outputs, and operations. The node types comprising a parsed query tree are in parsenodes.h. Have a look at some of the more familiar parse nodes such as SelectStmt and InsertStmt to get an idea of how they relate to the raw SQL that produced them.

If you’d like to see what parse trees actually look like, the debug_print_parse configuration parameter can be set to ‘on’. These configuration files are found in /path/to/dev/installation/<data directory>/pipelinedb.conf.

Analyzer

Once a raw query has been parsed into a tree, it can be semantically analyzed and transformed into one or more equivalent queries. An example of why this is necessary is views. A non-materialized view doesn’t have an underlying table, but clients can query the view as if it’s a regular table. This is possible because queries against views are rewritten as one or more queries against the actual tables that the views relate to. For example, consider the view:

CREATE VIEW example_view AS SELECT * FROM table WHERE table.column = 'foo';  

When a SELECT is issued against example_view,

SELECT * FROM example_view;  

the analyzer will recognize that example_view is a view and it will rewrite the parsed SELECT tree into the tree given by

SELECT * FROM table WHERE table.column = 'foo';  

The analyzer also normalizes query trees as much as possible so that further levels of query processing can safely make assumptions about their input. Take a look at analyze.c:transformStmt to get an idea of how different query types are analyzed and normalized before being passed to the next stage of execution.

To see the results of the analyzer, turn ‘on’ debug_print_rewritten.

Planner

After a parse tree has been analyzed and transformed, it becomes an actual Query object, which is then passed to the planner. Given a Query, the planner generates another type of tree: a plan tree, whose node types can be found in plannodes.h. Plan trees describe how a query result can actually be obtained as cheaply as possible. For example, if a table has an index on it and a query with a WHERE clause is being planned, the plan will likely use an IndexScan instead of a SeqScan.

Assume we have a table given by,

CREATE TABLE planz (id integer, data text)  

Query plans can be examined using the EXPLAIN ANALYZE command. Note: if you use EXPLAIN ANALYZE while connected to a coordinator process, the underlying plan executed on the datanodes will be hidden:

  
                                                      QUERY PLAN                                                        
----------------------------------------------------------------------------------------------------------------------  
 Data Node Scan on "__REMOTE_FQS_QUERY__"  (cost=0.00..0.00 rows=0 width=0) (actual time=1.037..1.037 rows=0 loops=1)  
   Node/s: datanode0, datanode1, datanode2, datanode3  
 Total runtime: 1.136 ms  
(3 rows)  

To understand what plans will actually be carried out on the datanodes, connect to a datanode rather than the coordinator by specifying the datanode’s port:

pipeline -p 5433  
  
EXPLAIN ANALYZE SELECT * FROM table WHERE id = 42;  
                                            QUERY PLAN                                              
--------------------------------------------------------------------------------------------------  
 Seq Scan on table  (cost=0.00..34.00 rows=2400 width=4) (actual time=0.002..0.002 rows=0 loops=1)  
 Total runtime: 0.108 ms  
(2 rows)  
Now let's add an index to see if we can make the plan smarter:  
pipeline -c "CREATE INDEX planz_id_index ON planz(id)"  
pipeline -p 5433  
  
EXPLAIN ANALYZE SELECT * FROM table WHERE id = 42;  
                                                   QUERY PLAN                                                      
-----------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on planz  (cost=4.20..13.67 rows=6 width=36) (actual time=0.029..0.029 rows=0 loops=1)  
   Recheck Cond: (id = 42)  
   ->  Bitmap Index Scan on id_index  (cost=0.00..4.20 rows=6 width=0) (actual time=0.015..0.015 rows=0 loops=1)  
         Index Cond: (id = 42)  
 Total runtime: 0.167 ms  
(5 rows)  

That looks better. The bitmap index scan is created in createplan.c:create_bitmap_scan_plan. Plan trees can also be examined by turning ‘on’ debug_print_plan.

Executor

Once a query plan has been chosen, it is passed to the executor. PortalRun is the executor’s entry point, which ultimately calls execMain.c:ExecutePlan. ExecutePlan recursively calls execMain.c:execProcNode on each query node, which produces the appropriate output for the given node. The output of a query node is the input of its parent node, and the output of the root node is the final result of the query that is to be consumed by the client.

To maintain state as the query plan is being executed, the executor associates executor nodes with plan tree nodes. These node types are found in execnodes.h. Executor nodes are created when the executor plan is initialized in execMain.c:InitPlan, which recursively calls execProcNode.c:ExecInitNode on each plan node. ExecInitNode takes a plan node as input and builds and returns the corresponding executor node.

Start at the switch statement in execProcNode to get an idea of how each node type is executed.

Client response

Result tuples are sent to the client from within the ExecutePlan function. Each tuple produced by ExecProcNode is passed to the plan’s destination receiver via a function associated with that receiver. The most common receiver is a regular client receiver that is created in exec_simple_query, which uses the printtup.c:printtup function to “print” the tuple to the client connection.

Flag Counter

digoal’s 大量PostgreSQL文章入口