Greenplum 大集群应该调整的sshd_config配置

2 minute read

背景

Greenplum是MPP数据库,所以大的集群可能涉及很多的主机以及很多的segments。

Greenplum的很多管理脚本都会涉及ssh的连接,通过SSH进行远程的管理或命令的调用。

因此如果有并发的管理任务,会建立很多的SSH会话。

但是默认情况下Linux的sshd_config配置是比较保守的,没有想到应用会发起那么多的SSH会话。

如果你遇到这样的报错就要关注一下sshd的配置了

ssh_exchange_identification: Connection closed by remote host  

哪些sshd参数会影响Greenplum的使用

主要涉及的是sshd的MaxStartups参数. 我在前篇文章有讲到

https://yq.aliyun.com/articles/57903

1. 指定当前最多有多少个未完成认证的并发连接,由三个值决定

“start:rate:full” (e.g. "10:30:60").    

start表示未完成认证的连接数,当未完成认证的连接数超过start时,rate/100表示新发起的连接有多大的概率被拒绝连接。

如果未完成认证的连接数达到full,则新发起的连接全部拒绝。

     MaxStartups  
             Specifies the maximum number of concurrent unauthenticated connections to the SSH daemon.    
	     Additional connections will be dropped until authentication succeeds or the LoginGraceTime expires for a connection.    
	     The default is 10:30:100.  
  
             Alternatively, random early drop can be enabled by specifying the three colon separated values “start:rate:full” (e.g. "10:30:60").    
	     sshd(8) will refuse connection attempts with a probability of “rate/100” (30%) if there are currently “start” (10) unauthenticated connections.    
	     The probability increases linearly and all connection attempts are refused if the number of unauthenticated connections reaches “full” (60).  

外界可以利用这个对SSHD服务进行类似的ddos攻击,例如用户并发的发起连接,并且不输入密码,等待LoginGraceTime超时。

使得未完成认证的连接大于MaxStartups(full),这样的话,用户正常的连接请求就会被drop掉,非常的危险。

如何减轻呢?似乎不好搞,但是如果不是DDOS工具,则可以尝试一下以下方法

  • LoginGraceTime 调低,例如10秒(用户自己要确保10秒能输入密码,否则设置了太复杂的密码就歇菜了)

  • MaxStartups的几个值都调大,例如 300:30:1000

如果因为Greenplum不断的发起SSH请求,导致了主机不能登录,那就糟了。

赶紧改一改MaxStartups吧。

vi /etc/ssh/sshd_confg  
1000:30:3000  

改后之后, 使用 sshd -T 检测一下配置是否正确

sshd -T|grep -i startup  
maxstartups 1000:30:3000  

检查正确后应用sshd配置即可

service sshd restart  
  
or   
  
service sshd reload  
  
or 直接发信号给sshd  (man sshd)    
sshd rereads its configuration file when it receives a hangup signal, SIGHUP,     
by executing itself with the name and options it was started with, e.g. /usr/sbin/sshd.    

调整maxstartups的理由

gpexpand --help   
  
-B batch_size  
Batch size of remote commands to send to a given host before making a one-second pause.   
  Default is 16. Valid values are 1-128.  
The gpexpand utility issues a number of setup commands that may exceed the host's maximum threshold for authenticated connections as defined by MaxStartups in the SSH daemon configuration. The one-second pause allows authentications to be completed before gpexpand issues any more commands.  
The default value does not normally need to be changed. However, it may be necessary to reduce the maximum number of commands if gpexpand fails with connection errors such as 'ssh_exchange_identification: Connection closed by remote host.'  
  
  
http://www.openkb.info/2014/06/greenplum-ssh-connection-issue-due-to.html    
One best practice is to increase the MaxStartups to a value which is larger than the total segments count(primary+mirror).  
  
http://blog.csdn.net/jameswangcnbj/article/details/50801727  
8.vi /etc/ssh/sshd_config  
  
 MaxStartups 10000:30:20000  
 注意service sshd restart  
  

2.

单个连接允许的最大会话数,也”可能”需要调整。

     MaxSessions  
             Specifies the maximum number of open sessions permitted per network connection.    
             The default is 10.  

注意事项

 sshd 的行为可以通过使用命令行选项和配置文件(默认是sshd_config(5))进行控制,但命令行选项会覆盖配置文件中的设置。    

 sshd 会在收到 SIGHUP 信号后重新读取配置文件,但是最初启动的命令行选项仍然有效(仍会覆盖配置文件中的设置)。  

查看当前sshd已有的配置

通过gcore和gdb输出当前sshd的配置

https://yq.aliyun.com/articles/57916

摘录

Greenplum is MPP architecture, and sometimes it utilizes ssh sessions to complete some tasks.

For large clusters, MaxStartups in /etc/ssh/sshd_config needs to be increased to large enough.

What is MaxStartups?

Specifies the maximum number of concurrent unauthenticated connections to the sshd daemon.

Additional connections will be dropped until authentication succeeds or the LoginGraceTime expires for a connection.

The default is 10.

Under which situations, Greenplum may use ssh sessions? greenplum有好多用到SSH的

Utility tools, for example, gpexpand, gpinitstandby, gpinitsystem, gpstart, etc.    
Web external table.    
Could be more which I do not remember...    

Symptoms:

1. Utility tools may error out: // 遇到这个错误就改关心一下了

ssh_exchange_identification: Connection closed by remote host  

2. Below writable external table may error out:

create writable external web table test ( col1  int )   
execute 'ssh mdw "mkdir -p ''/tmp/test''; cat > /tmp/test/test_$GP_SEGMENT_ID"'  
format 'TEXT' ( delimiter as '|'  null as E'\\N'  escape as E'\\') ;   
   
insert into test select generate_series(1,3);   
ERROR:  external table test command ended with SHELL TERMINATED by signal UNRECOGNIZED (127)    
(seg26 xxx.xxx.xxx.xxx:40002 pid=23091)   
DETAIL:  Command: execute:ssh mdw "mkdir -p '/tmp/test'; cat > /tmp/test/test_$GP_SEGMENT_ID"  

One best practice is to increase the MaxStartups to a value which is larger than the total segments count(primary+mirror).

1.Increase MaxStartups in /etc/ssh/sshd_config on all servers.

2. Restart sshd on all servers. // or reload or send signal SIGHUP to sshd

/etc/init.d/sshd restart    

祝大家玩得开心,欢迎随时来 阿里云促膝长谈 业务需求 ,恭候光临。

阿里云的小伙伴们加油,努力做 最贴地气的云数据库

Flag Counter

digoal’s 大量PostgreSQL文章入口