The master-slave connection is disconnected: Error reading packet from server: Lost connection to MySQL server during query (server_errno=20131)

created at 11-10-2021 views: 29

Problem Description

Received four or five master-slave disconnection and recovery text messages in one night. Check the ERROR information of the production environment (master-slave architecture) MySQL (5.6) slave library error log. Generally speaking, the slave library has lost the connection with the master library. , Causing the IO Thread to reconnect.

2019-12-02 03:46:44 47114 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2019-12-02 03:46:44 47114 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.002295' at position 386140629
2019-12-02 03:46:44 47114 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2019-12-02 03:46:54 47114 [ERROR] Slave I/O: error reconnecting to master 'repli@192.168.11.10:3306' - retry-time: 60  retries: 1, Error_code: 2013
2019-12-02 03:48:04 47114 [ERROR] Slave I/O: error reconnecting to master 'repli@192.168.11.10:3306' - retry-time: 60  retries: 2, Error_code: 2013
2019-12-02 03:49:14 47114 [ERROR] Slave I/O: error reconnecting to master 'repli@192.168.11.10:3306' - retry-time: 60  retries: 3, Error_code: 2013
2019-12-02 03:50:24 47114 [ERROR] Slave I/O: error reconnecting to master 'repli@192.168.11.10:3306' - retry-time: 60  retries: 4, Error_code: 2013
2019-12-02 03:51:34 47114 [ERROR] Slave I/O: error reconnecting to master 'repli@192.168.11.10:3306' - retry-time: 60  retries: 5, Error_code: 2013
2019-12-02 03:52:44 47114 [ERROR] Slave I/O: error reconnecting to master 'repli@192.168.11.10:3306' - retry-time: 60  retries: 6, Error_code: 2013
2019-12-02 03:53:54 47114 [ERROR] Slave I/O: error reconnecting to master 'repli@192.168.11.10:3306' - retry-time: 60  retries: 7, Error_code: 2013
2019-12-02 03:55:04 47114 [ERROR] Slave I/O: error reconnecting to master 'repli@192.168.11.10:3306' - retry-time: 60  retries: 8, Error_code: 2013
2019-12-02 03:56:06 47114 [Note] Slave: connected to master 'repli@192.168.11.10:3306',replication resumed in log 'binlog.002295' at position 386140629
2019-12-02 04:01:00 47114 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2019-12-02 04:01:00 47114 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binlog.002295' at position 45291556

Query official documents:

B.4.2.3 Lost connection to MySQL server There are three likely causes for this error message. Usually it indicates network connectivity trouble and you should check the condition of your network if this error occurs frequently. If the error message includes “during query,” this is probably the case you are experiencing. Sometimes the “during query” form happens when millions of rows are being sent as part of one or more queries. If you know that this is happening, you should try increasing net_read_timeout from its default of 30 seconds to 60 seconds or longer, sufficient for the data transfer to complete. More rarely, it can happen when the client is attempting the initial connection to the server. In this case, if your connect_timeout value is set to only a few seconds, you may be able to resolve the problem by increasing it to ten seconds, perhaps more if you have a very long distance or slow connection. You can determine whether you are experiencing this more uncommon cause by using SHOW GLOBAL STATUS LIKE 'Aborted_connects'. It will increase by one for each initial connection attempt that the server aborts. You may see “reading authorization packet” as part of the error message; if so, that also suggests that this is the solution that you need. If the cause is none of those just described, you may be experiencing a problem with BLOB values that are larger than max_allowed_packet, which can cause this error with some clients. Sometime you may see an ER_NET_PACKET_TOO_LARGE error, and that confirms that you need to increase max_allowed_packet. 

According to official documents, step by step investigation:

  • First of all, the network reason can be ruled out, because it is a local area network, there is no problem with the network environment, and there is no firewall interception, etc.;
  • Since the error message of the slave library contains during query, there is no other content such as: reading authorization packet, ER_NET_PACKET_TOO_LARGE, etc., so the reason for this error should be the following:

Sometimes the “during query” form happens when millions of rows are being sent as part of one or more queries. If you know that this is happening, you should try increasing net_read_timeout from its default of 30 seconds to 60 seconds or longer, sufficient for the data transfer to complete.

According to official recommendations, the value of the parameter net_read_timeout can be increased

Combining the scenario of master-slave replication, query the binary log, and ask the business personnel, the master database will generate many large transactions during this time period. Since the default read data from the library to the main library is 30 seconds by default (the default value of the net_read_timeout parameter), the slave library connection process will be closed after 30s. Specifically, when the IO Thread of the slave library reads a transaction, the transaction may not be completed within 30 seconds because the transaction is too large. However, the analysis from the library to the transaction is incomplete, and it is considered that there is a problem with the connection to the main library, so the main library will be reconnected.

But after adjusting the net_read_timeout=900 of the slave library, the error still occurs! !

Therefore, the guess may be that when the main library sends data to the slave library, a large transaction cannot be sent within a limited time (net_write_timeout, default 30s), so no matter how long the read time from the library is, the transaction cannot be read.

Therefore adjust the net_write_timeout=300 of the main library

The error was not reported later.

There are two ways to change the value:

  1. Use the set GLOBAL command. Such as: set GLOBAL net_write_timeout=120;
  2. Modify the parameter value in the Mysql configuration file: net_write_timeout=120
    Then restart the database service.
created at:11-10-2021
edited at: 11-10-2021: