当系统服务的MySQL错误日志中,发现大量以下类似信息:经常收到客户关于通信故障错误的问题—客户面临间歇性的”Got an error reading communication packet”错误,这里分析这个错误出现的原因,以及如何解决这个问题。
Aborted connection 1055898 to db: 'xxx' user: 'yyy' host: 'xxx.xxx.xxx.xxx' (Got timeout reading communication packets)
下面看看官网怎么说:
If a client is unable even to connect, the server increments the Aborted_connects status variable.
A client attempts to access a database but has no privileges for it. #客户端没有权限但是尝试访问MySQL数据库 A client uses an incorrect password. #客户端输入的密码有误。 A connection packet does not contain the right information. #连接包不包含正确信息 takes more than connect_timeout seconds to obtain a connect packet. #超过连接时间限制,主要是这个系统变量connect_timeout控制(mysql默认是10s,基本上,除非网络环境极端不好,一般不会超时。)
lIf a client successfully connects but later disconnects improperly or is terminated, the server increments the Aborted_clients status variable
The client program did not call mysql_close() before exiting… #客户端没有进行关闭 The client had been sleeping more than wait_timeout or interactive_timeout seconds without issuing any requests to the server. #客户端睡眠时间超过了wait_timeout或interactive_timeout秒,而没有向服务器发出任何请求。 The client program ended abruptly in the middle of a data transfer. #客户端程序在数据传输过程中突然终止。
Other reasons for problems with aborted connections or aborted clients:
the max_allowed_packet variable value is too small or queries require more memory than you have allocated for mysqld #max_allow_packet设置过小 Use of Ethernet protocol with Linux, both half and full duplex. Some Linux Ethernet drivers have this bug #Linux以太网驱动程序有这个bug A problem with the thread library that causes interrupts on reads.#线程库中导致读取中断的问题。 Badly configured TCP/IP. #tcp/iip 配置信息混乱 Faulty Ethernets, hubs, switches, cables, and so forth. This can be diagnosed properly only by replacing hardware. #故障的以太网、集线器、交换机、电缆等等
https://dev.mysql.com/doc/refman/5.7/en/communication-errors.html
Aborted connection情况下,这也意味着以下几个问题:
set global max_allowed_packet = 1024*1024*1024;
mysql> show variables like '%max_allowed_packet%'; +--------------------------+------------+ | Variable_name | Value | +--------------------------+------------+ | max_allowed_packet | 16777216 | | slave_max_allowed_packet | 1073741824 | +--------------------------+------------+
[mysqld] max_allowed_packet=256M
当然,也可能是其它原因导致的。坦白讲,异常中断是很难诊断的,也有可能是和网络、防火墙有关。
netstat -ano|grep TIME_WAIT tcp 0 0 xxx.xxx.xxx.xxx:10054 xxx.xxx.xxx.xxx:55586 TIME_WAIT timewait (32.97/0/0) tcp 0 0 xxx.xxx.xxx.xxx:10054 xxx.xxx.xxx.xxx:55367 TIME_WAIT timewait (27.82/0/0) tcp 0 0 xxx.xxx.xxx.xxx:10054 xxx.xxx.xxx.xxx:55776 TIME_WAIT timewait (37.09/0/0) tcp 0 0 xxx.xxx.xxx.xxx:10054 xxx.xxx.xxx.xxx:56505 TIME_WAIT timewait (54.61/0/0) tcp 0 0 xxx.xxx.xxx.xxx:10054 xxx.xxx.xxx.xxx:55553 TIME_WAIT timewait (31.94/0/0) tcp 0 0 xxx.xxx.xxx.xxx:10054 xxx.xxx.xxx.xxx:56643 TIME_WAIT timewait (57.73/0/0) tcp 0 0 xxx.xxx.xxx.xxx:10054 xxx.xxx.xxx.xxx:55221 TIME_WAIT timewait (23.70/0/0) tcp 0 0 xxx.xxx.xxx.xxx:10054 xxx.xxx.xxx.xxx:55920 TIME_WAIT timewait (41.18/0/0)
检查DNS配置是否有延迟问题。
增加net_read_timeout、net_write_timeout的值,并观察是否还有该错误发生;
mysql> show variables like '%timeout%'; +-----------------------------+----------+ | Variable_name | Value | +-----------------------------+----------+ | connect_timeout | 10 | | interactive_timeout | 1800 | | lock_wait_timeout | 31536000 | | net_read_timeout | 30 | | net_write_timeout | 60 | | wait_timeout | 1800 | +-----------------------------+----------+
tcpdump,netstat -s
mysql> show global variables like '%log_warning%'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | log_warnings | 2 | +---------------+-------+ 1 row in set (0.00 sec)
如果log_warnings的值大于1,mysql会将类似信息写入错误日志:
[Warning] Aborted connection 305628 to db: 'db' user: 'dbuser' host: 'hostname' (Got an error reading communication packets) [Warning] Aborted connection 305627 to db: 'db' user: 'dbuser' host: 'hostname' (Got an error reading communication packets)
可以修改一下log_waring的值:
set global log_warnings=1;
但这样直接修改,重启后会失效,修改配置文件mysql.cnf log_warnings = 1