PostgreSQL on ECS多云盘的部署、快照备份和恢复

11 minute read

背景

随着阿里云云盘的发展,云盘的性能已经越来越好了。IOPS可以做到十万以上,读写吞吐也超过1GB/s了。相信随着网络的发展,SSD云盘IOPS突破40万,读写吞吐突破4GB/s也不远了。

不过这里的IOPS和吞吐是指并发的IO能力,单次IO的延迟与本地还是不能比(将来RDMA网络也许能解决这个问题)。

PostgreSQL 如何解决SSD云盘IO延迟高缺陷

某些业务对数据库的(要求响应时间很快的写小事务)对单次IO延迟比较敏感,不过PostgreSQL有方法可以解决这个小缺陷。

1、后台IO,(write syscall)

PostgreSQL 的大多数IO为后台IO(bgwriter, backend writer),所以刷shared buffer对IO延迟不敏感。

2、异步提交

事务提交时,写WAL异步提交。不会造成数据不一致,但是当数据库CRASH,可能丢失在wal buffer中未提交的事务(最多10毫秒)。

这种方法是最有效的。

3、组提交

组提交,解决WAL写瓶颈,将多个同时提交的事务的WAL fsync动作合并为单次,从而减少FSYNC次数,提高高并发时的写小事务的TPS。

PostgreSQL 如何充分利用并发IO的能力

由于我们看到的云盘IOPS和读写吞吐指标是并发指标,数据库如何利用好这么好的指标呢?对高并发小事务不是问题,肯定是能将它用起来的,但是对于低并发,长事务(分析型业务),如何利用云盘的IOPS能力和读写吞吐能力呢?

PostgreSQL的并行计算特性可以充分利用云盘的并发IOPS和读写带宽。

多云盘卷组

单块云盘的IOPS能力有上限,容量有上限,读写带宽也有上限。好在ECS支持多块云盘,目前已支持一台ECS挂载16块云盘。

通过多块云盘,组卷条带后,提高读写带宽。以Linux RHEL/CentOS 7.x为例,组卷和条带方法:

1、逻辑卷

yum install lvm2  

2、软RAID

yum install -y mdadm  

3、ZFS

http://zfsonlinux.org/

以centos 7.3为例  
  
wget http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm  
rpm -ivh zfs-release.el7_3.noarch.rpm   
yum install -y zfs  

逻辑卷例子

假设环境中有16块SSD云盘

1、创建PV

pvcreate /dev/vd[b-q]  

2、创建VG

vgcreate -A y -s 128M vgdata01 /dev/vd[b-q]  
-s, --physicalextentsize PhysicalExtentSize[bBsSkKmMgGtTpPeE]  
       Sets  the physical extent size on physical volumes of this volume group.  A size suffix   
       (k for kilobytes up to t for terabytes) is optional, megabytes is the default if no suffix is present.   
       For LVM2 format, the value  
       must be a power of 2 of at least 1 sector (where the sector size is the largest sector size of the   
       PVs currently used in the VG) or, if not a power of 2, at least 128KiB.  For the older LVM1 format, it must be a power  
       of  2  of at least 8KiB.  The default is 4 MiB.  Once this value has been set, it is difficult to   
       change it without recreating the volume group which would involve backing up and restoring data on any logical volumes.  
       However, if no extents need moving for the new value to apply, it can be altered using vgchange -s.  
  
       If the volume group metadata uses lvm1 format, extents can vary in size from 8KiB to 16GiB and   
       there is a limit of 65534 extents in each logical volume.  The default of 4 MiB leads to a maximum logical volume size  of  
       around 256GiB.  
  
       If  the  volume group metadata uses lvm2 format those restrictions do not apply, but having a   
       large number of extents will slow down the tools but have no impact on I/O performance to the logical volume.  The smallest  
       PE is 1KiB  
  
       The 2.4 kernel has a limitation of 2TiB per block device.  

3、创建LV,设置条带

16块盘,每个条带单位为8KB。

lvcreate -A y -i 16 -I 8 -l 100%VG -n lv01 vgdata01  
-i|--stripes Stripes  
       Gives the number of stripes.  This is equal to the number of physical volumes to scatter the logical volume data.    
       When creating a RAID 4/5/6 logical volume, the extra devices which are necessary for parity are internally accounted for.    
       Specifying -i 3 would cause 3 devices for striped and RAID 0 logical volumes, 4 devices for   
       RAID 4/5, 5 devices for RAID 6 and 6 devices for RAID 10.  Alternatively, RAID 0 will stripe  across  2  
       devices,  RAID  4/5  across  3  PVs,  RAID  6  across  5  PVs and RAID 10 across 4 PVs in the volume group   
       if the -i argument is omitted.  In order to stripe across all PVs of the VG if the -i argument is omitted, set  
       raid_stripe_all_devices=1 in the allocation section of lvm.conf (5) or add  
       --config allocation/raid_stripe_all_devices=1  
       to the command.  
  
       Note the current maxima for stripes depend on the created RAID type.  For raid10, the maximum of stripes is 32,   
       for raid0, it is 64, for raid4/5, it is 63 and for raid6 it is 62.  
  
       See the --nosync option to optionally avoid initial syncrhonization of RaidLVs.  
  
       Two implementations of basic striping are available in the kernel.  The original device-mapper implementation   
       is the default and should normally be used.  The alternative implementation using MD, available since  ver‐  
       sion  1.7  of  the  RAID  device-mapper  kernel  target (kernel version 4.2) is provided to facilitate the   
       development of new RAID features.  It may be accessed with --type raid0[_meta], but is best avoided at present  
       because of assorted restrictions on resizing and converting such devices.  
  
-I|--stripesize StripeSize  
       Gives the number of kilobytes for the granularity of the stripes.  
       StripeSize must be 2^n (n = 2 to 9) for metadata in LVM1 format.  For metadata in LVM2 format,   
       the stripe size may be a larger power of 2 but must not exceed the physical extent size.  

4、创建文件系统,设置条带

当数据库数据块=32K时,chunk大小(单块盘读写32KB,再写下一块)。 条带大小512KB(32KB*16)。  
mkfs.ext4 /dev/mapper/vgdata01-lv01 -m 0 -O extent,uninit_bg -E lazy_itable_init=1,stride=8,stripe_width=128 -b 4096 -T largefile -L lv01  
  
或  
  
当数据库数据块=8K时,chunk大小(单块盘读写8KB,再写下一块)。 条带大小128KB(8KB*16)。  
mkfs.ext4 /dev/mapper/vgdata01-lv01 -m 0 -O extent,uninit_bg -E lazy_itable_init=1,stride=2,stripe_width=32 -b 4096 -T largefile -L lv01  
-b block-size  
       Specify the size of blocks in bytes.  Valid block-size values are 1024, 2048 and 4096 bytes per block.    
       If omitted, block-size is heuristically determined by the filesystem size and the expected usage of the  filesys‐  
       tem  (see  the  -T option).  If block-size is preceded by a negative sign ('-'), then mke2fs will use   
       heuristics to determine the appropriate block size, with the constraint that the block size will be at least block-  
       size bytes.  This is useful for certain hardware devices which require that the blocksize be a multiple of 2k.  
  
stride=stride-size  单位为blocks  
       Configure the filesystem for a RAID array with stride-size filesystem blocks. This is the number   
        of blocks read or written to disk before moving to the next disk, which is  sometimes  referred  to  as  the  
       chunk  size.   This mostly affects placement of filesystem metadata like bitmaps at mke2fs   
        time to avoid placing them on a single disk, which can hurt performance.    
        It may also be used by the block allocator.  
  
stripe_width=stripe-width  单位为blocks  
       Configure the filesystem for a RAID array with stripe-width filesystem blocks per stripe.   
        This is typically stride-size * N, where N is the number of data-bearing disks in the RAID   
        (e.g. for RAID  5  there  
       is  one  parity disk, so N will be the number of disks in the array minus 1).    
        This allows the block allocator to prevent read-modify-write of the parity in a   
        RAID stripe if possible when the data is written.  
  
lazy_itable_init[= <0 to disable, 1 to enable>]  
      If enabled and the uninit_bg feature is enabled, the inode table will not be fully   
      initialized by mke2fs.  This speeds up filesystem initialization noticeably,   
      but it requires the kernel to finish initial‐  
      izing the filesystem in the background when the filesystem is first mounted.    
      If the option value is omitted, it defaults to 1 to enable lazy inode table zeroing.  

5、MOUNT文件系统

如果不使用云盘镜像备份,使用这种挂载模式。  
mount -o defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback LABEL=lv01 /data01  
  
或  
  
如果使用云盘镜像备份,使用这种挂载模式。  
mount -o defaults,noatime,nodiratime,nodelalloc,barrier=1,data=ordered LABEL=lv01 /data01  

6、FIO测试

yum install -y fio  
  
fio -filename=/data01/testdir -direct=1 -thread -rw=write -ioengine=libaio -bs=8K -size=16G -numjobs=128 -runtime=60 -group_reporting -name=mytest >/tmp/fio_write.log 2>&1  
fio -filename=/data01/testdir -direct=1 -thread -rw=read -ioengine=libaio -bs=8K -size=16G -numjobs=128 -runtime=60 -group_reporting -name=mytest >/tmp/fio_read.log 2>&1  
fio -filename=/data01/testdir -direct=1 -thread -rw=randwrite -ioengine=libaio -bs=8K -size=16G -numjobs=128 -runtime=60 -group_reporting -name=mytest >/tmp/fio_randwrite.log 2>&1  
fio -filename=/data01/testdir -direct=1 -thread -rw=randread -ioengine=libaio -bs=8K -size=16G -numjobs=128 -runtime=60 -group_reporting -name=mytest >/tmp/fio_randread.log 2>&1  
顺序  
READ: io=72621MB, aggrb=1210.3MB/s, minb=1210.3MB/s, maxb=1210.3MB/s, mint=60003msec, maxt=60003msec  
WRITE: io=36845MB, aggrb=628743KB/s, minb=628743KB/s, maxb=628743KB/s, mint=60007msec, maxt=60007msec  
  
随机  
READ: io=53390MB, aggrb=911160KB/s, minb=911160KB/s, maxb=911160KB/s, mint=60002msec, maxt=60002msec  
WRITE: io=26078MB, aggrb=445032KB/s, minb=445032KB/s, maxb=445032KB/s, mint=60004msec, maxt=60004msec  

数据库部署

1、安装软件

《PostgreSQL on Linux 最佳部署手册》

2、初始化数据库

initdb -E SQL_ASCII -U postgres --lc-collate=C --lc-ctype=en_US.utf8 -D $PGDATA  

3、配置postgresql.conf

port = 1921                               
max_connections = 1000                    
superuser_reserved_connections = 13       
unix_socket_directories = '.'     
shared_buffers = 64GB                     
#  vm.nr_hugepages = 102352    
#  建议shared buffer设置超过64GB时 使用大页,页大小 /proc/meminfo Hugepagesize
maintenance_work_mem = 1GB                
dynamic_shared_memory_type = posix        
vacuum_cost_delay = 0                     
bgwriter_delay = 10ms                     
bgwriter_lru_maxpages = 500               
bgwriter_lru_multiplier = 5.0             
bgwriter_flush_after = 0                  
effective_io_concurrency = 16             
max_worker_processes = 128                
max_parallel_workers_per_gather = 64      
max_parallel_workers = 128                
backend_flush_after = 0           
wal_level = minimal                       
synchronous_commit = off                  
wal_sync_method = fsync           
full_page_writes = on                     
wal_buffers = 1GB                         
wal_writer_delay = 10ms           
wal_writer_flush_after = 0                
checkpoint_timeout = 35min                
max_wal_size = 128GB  
min_wal_size = 8GB  
checkpoint_completion_target = 0.5        
checkpoint_flush_after = 0                
checkpoint_warning = 30s                  
max_wal_senders = 0               
random_page_cost = 1.2                    
parallel_tuple_cost = 0           
parallel_setup_cost = 0   
min_parallel_table_scan_size = 0   
min_parallel_index_scan_size = 0  
effective_cache_size = 200GB  
log_destination = 'csvlog'                
logging_collector = on            
log_directory = 'log'                     
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'   
log_file_mode = 0600                      
log_truncate_on_rotation = on             
log_checkpoints = on  
log_connections = on  
log_disconnections = on  
log_error_verbosity = verbose   
log_timezone = 'PRC'  
log_autovacuum_min_duration = 0   
autovacuum_max_workers = 8                
autovacuum_vacuum_cost_delay = 0          
autovacuum_vacuum_cost_limit = 0          
datestyle = 'iso, mdy'  
timezone = 'PRC'  
lc_messages = 'C'                         
lc_monetary = 'C'                         
lc_numeric = 'C'                          
lc_time = 'C'                             
default_text_search_config = 'pg_catalog.english'  
max_locks_per_transaction = 6400  
pg_ctl start  

4、配置归档,归档目录不能放在数据盘的云盘里面,你可以放到其他云盘,或者放到OSS。

这里为了演示方便,放在本地云盘。(保证与数据云盘不是同一云盘)

mkdir /archivedir  
chown digoal:digoal /archivedir  
vi postgresql.conf  
  
wal_level = replica  
archive_mode = on  
archive_command = 'test ! -f /archivedir/%f && cp %p /archivedir/%f'  
pg_ctl restart -m fast  

5、压测

5.1、写入大吞吐压测

初始化

-- 建表  
do language plpgsql $$   
declare   
begin   
  for i in 0..65535 loop    
    execute 'create table IF NOT EXISTS test_'||i||'(id int, info text, crt_time timestamp) with (autovacuum_enabled=off, toast.autovacuum_enabled=off)';  
  end loop;  
end;  
$$;  
  
-- 删表  
do language plpgsql $$   
declare   
begin   
  for i in 0..65535 loop    
    execute 'drop table IF EXISTS test_'||i;  
  end loop;  
end;  
$$;  
  
-- 写表  
create or replace function batch_ins(int) returns void as $$  
declare  
begin  
  execute 'insert into test_'||$1||' select generate_series(1,10000), md5(random()::text), now()';  
end;  
$$ language plpgsql strict;  

大吞吐写入测试,写入速度直逼200万行/s

vi test.sql  
  
\set id random(0,65535)  
select batch_ins(:id);  
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 32 -j 32 -T 120  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 32  
number of threads: 32  
duration: 120 s  
number of transactions actually processed: 20801  
latency average = 184.678 ms  
latency stddev = 57.522 ms  
tps = 173.218345 (including connections establishing)  
tps = 173.233151 (excluding connections establishing)  
script statistics:  
 - statement latencies in milliseconds:  
         0.003  \set id random(0,65535)  
       184.684  select batch_ins(:id);  

实际为写WAL的瓶颈,采用UNLOGGED TABLE,写入速度直逼1000万行/s。

transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 32  
number of threads: 32  
duration: 120 s  
number of transactions actually processed: 117367  
latency average = 32.731 ms  
latency stddev = 23.047 ms  
tps = 977.228590 (including connections establishing)  
tps = 977.331826 (excluding connections establishing)  
script statistics:  
 - statement latencies in milliseconds:  
         0.002  \set id random(0,65535)  
        32.727  select batch_ins(:id);  

5.2、10亿数据量,OLTP压测

pgbench -i -s 10000  
pgbench -M prepared -n -r -P 1 -c 64 -j 64 -T 120  
  
transaction type: <builtin: TPC-B (sort of)>  
scaling factor: 10000  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 120 s  
number of transactions actually processed: 580535  
latency average = 13.229 ms  
latency stddev = 30.073 ms  
tps = 4829.300799 (including connections establishing)  
tps = 4829.989470 (excluding connections establishing)  
script statistics:  
 - statement latencies in milliseconds:  
         0.003  \set aid random(1, 100000 * :scale)  
         0.001  \set bid random(1, 1 * :scale)  
         0.001  \set tid random(1, 10 * :scale)  
         0.001  \set delta random(-5000, 5000)  
         0.097  BEGIN;  
         5.650  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;  
         0.153  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;  
         3.122  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;  
         1.631  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;  
         1.418  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);  
         1.148  END;  

5.3、并行读压测

postgres=# \dt+ e  
                   List of relations  
 Schema | Name | Type  |  Owner   | Size  | Description   
--------+------+-------+----------+-------+-------------  
 public | e    | table | postgres | 24 GB |   
(1 row)  
  
-- 32个并行读,24G数据处理耗时0.3秒。 
postgres=# alter table e set (parallel_workers =32);  
postgres=# set max_parallel_workers_per_gather =32;  
postgres=# set min_parallel_table_scan_size =0;  
postgres=# set min_parallel_index_scan_size =0;  
postgres=# set parallel_setup_cost =0;  
postgres=# set parallel_tuple_cost =0;
postgres=# set force_parallel_mode =on;
  
postgres=# select count(*) from e;  
  count     
----------  
 28766181  
(1 row)  
Time: 993.876 ms  
  
postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from e;  
                                                                           QUERY PLAN                                                                              
-----------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Finalize Aggregate  (cost=178479.59..178479.60 rows=1 width=8) (actual time=305.796..305.796 rows=1 loops=1)  
   Output: count(*)  
   Buffers: shared hit=27422  
   ->  Gather  (cost=178479.50..178479.51 rows=32 width=8) (actual time=304.868..305.784 rows=33 loops=1)  
         Output: (PARTIAL count(*))  
         Workers Planned: 32  
         Workers Launched: 32  
         Buffers: shared hit=27422  
         ->  Partial Aggregate  (cost=178479.50..178479.51 rows=1 width=8) (actual time=291.483..291.483 rows=1 loops=33)  
               Output: PARTIAL count(*)  
               Buffers: shared hit=23934  
               Worker 0: actual time=281.287..281.287 rows=1 loops=1  
                 Buffers: shared hit=718  
..........  
               Worker 30: actual time=299.304..299.304 rows=1 loops=1  
                 Buffers: shared hit=438  
               Worker 31: actual time=300.227..300.227 rows=1 loops=1  
                 Buffers: shared hit=460  
               ->  Parallel Index Only Scan using uk_e on public.e  (cost=0.44..176232.78 rows=898689 width=0) (actual time=0.085..178.852 rows=871702 loops=33)  
                     Heap Fetches: 0  
                     Buffers: shared hit=23934  
                     Worker 0: actual time=0.078..173.121 rows=1014806 loops=1  
                       Buffers: shared hit=718  
..........  
                     Worker 30: actual time=0.089..184.251 rows=635007 loops=1  
                       Buffers: shared hit=438  
                     Worker 31: actual time=0.070..184.000 rows=649729 loops=1  
                       Buffers: shared hit=460  
 Planning time: 0.092 ms  
 Execution time: 314.527 ms  
(144 rows)  

基于云盘镜像的备份

备份过程

1、开始备份

select pg_start_backup('xx');  

2、打云盘快照

调用API或在阿里云ECS控制台执行。

3、压测

打快照的过程中,产生一些数据库变更。

使用前面的例子,大吞吐写入测试(使用logged table)、OLTP测试、夹杂建表、删表。

4、云盘快照结束

5、停止压测

6、结束备份

select pg_stop_backup();  

7、切换日志

checkpoint;  
select pg_switch_wal();  
checkpoint;  
select pg_switch_wal();  

8、检查归档已正常

确保打快照过程中产生的WAL全部归档成功。

9、生成每个测试表的CHECKSUM

do language plpgsql $$  
declare  
  n name;  
  res int8;  
begin  
  for n in select tablename from pg_tables where schemaname='public'    
  loop  
    execute 'select sum(hashtext(t.*::text)) from '||n||' t' into res;  
    raise notice 'checksum %: %', n, res;  
  end loop;  
end;  
$$;  
  
NOTICE:  checksum pgbench_history: -422046586146  
NOTICE:  checksum pgbench_tellers: 215006661683  
NOTICE:  checksum pgbench_branches: 158568443210  
......  

基于云盘镜像的恢复

实际上PostgreSQL本身已支持增量备份、块级增量备份的功能,但是既然有云盘快照,可以只备归档和云盘镜像,数据少走一次网络开销。

基于PG本身的备份也简单介绍一下,见本文参考部分。

恢复过程

1、创建新ECS(可以根据镜像创建)

2、如果根据数据库ECS镜像创建,则不需要这一步。

部署postgresql软件,注意需要与主库软件、插件等一致。

3、根据云盘快照创建云盘

4、复原逻辑卷

pvscan  
  
vgscan  
  
lvscan  

5、检查文件系统

fsck.ext4 -y /dev/mapper/vgdata01-lv01  

6、加载文件系统

mount -o defaults,noatime,nodiratime,nodelalloc,barrier=1,data=ordered LABEL=lv01 /data01  

7、清理WAL文件,(因为是多盘快照,快照不可能在同一时间点,因此WAL文件可能出现partial write。wal文件是恢复数据文件的关键,因此必须完整。所以我们从归档来获取WAL。)。

rm -f $PGDATA/pg_wal/*  

注意,如果是单盘,可以只删除最后一个WAL。

8、配置恢复

cp $PGHOME/share/recovery.conf.sample $PGDATA/recovery.conf  
vi $PGDATA/recovery.conf     # (auto copy archive to $PGDATA/pg_wal)  
  
restore_command = 'cp /archivedir/%f %p'           # e.g. 'cp /mnt/server/archivedir/%f %p'  
standby_mode = on  

9、启动数据库

pg_ctl start  

10、等待恢复完成,恢复完成即恢复到最后一个已归档的WAL文件。

11、激活数据库

pg_ctl promote  

12、检查CHECKSUM

do language plpgsql $$  
declare  
  n name;  
  res int8;  
begin  
  for n in select tablename from pg_tables where schemaname='public'    
  loop  
    execute 'select sum(hashtext(t.*::text)) from '||n||' t' into res;  
    raise notice 'checksum %: %', n, res;  
  end loop;  
end;  
$$;  

13、检查所有数据块是否正常。

set vacuum_freeze_min_age =0;  
vacuum freeze;  

小结

1、PostgreSQL 具备WAL和FULL PAGE WRITE,可以实现数据文件的不一致恢复,也就是说备份时的数据文件partial write可以被检查点后的wal full page write PAGE修复。最终实现一致性。

2、多块云盘并行备份,提高了数据备份的速度。

3、多块云盘,提升了整体的读写IOPS和读写带宽。

4、PostgreSQL 通过组提交、异步提交、数据文件异步写,解决了云盘单次IO延迟较本地延迟更高一点的性能问题。也就是说对IO延迟不敏感。

5、PostgreSQL 利用多核并行,可以在单条SQL中,充分利用多块云盘带来的高IOPS和高读写带宽的能力。

6、通过逻辑卷、软RADI、ZFS等手段,可以充分利用多云盘的能力。

参考

1、《PostgreSQL on Linux 最佳部署手册》

2、zfs

《PostgreSQL 最佳实践 - 块级增量备份(ZFS篇)验证 - recovery test script for zfs snapshot clone + postgresql stream replication + archive》

《PostgreSQL 最佳实践 - 块级增量备份(ZFS篇)双机HA与块级备份部署》

《PostgreSQL 最佳实践 - 块级增量备份(ZFS篇)单个数据库采用多个zfs卷(如表空间)时如何一致性备份》

《PostgreSQL 最佳实践 - 块级增量备份(ZFS篇)备份集自动校验》

《PostgreSQL 最佳实践 - 块级增量备份(ZFS篇)方案与实战》

3、man lvm

4、man mdadm

5、备份原理

《PostgreSQL 最佳实践 - 块级别增量备份(pg_rman baseon LSN)源码浅析与使用》

6、时间点恢复原理与最佳实践

《PostgreSQL 最佳实践 - 任意时间点恢复源码分析》

《PostgreSQL 最佳实践 - 在线增量备份与任意时间点恢复》

7、https://github.com/pgbackrest/pgbackrest

Flag Counter

digoal’s 大量PostgreSQL文章入口