Common Causes of Data Discrepancies in MySQL Replication
MySQL replication relies on logical SQL statement replication. During this process, the executed SQL statements are validated, but the overall structure (such as tables and databases) is not checked. In current high availability architectures, the load and performance of a single node can affect the speed of Relay Log replay. If this exceeds capacity, delays are inevitable.
In such cases, during master-slave switching, data discrepancies may occur, potentially leading to severe data inconsistencies. For core enterprise systems, data inconsistency is a critical issue.
Causes of Data Discrepancies
Data inconsistency in master-slave replication can arise from various factors. Based on known cases, the following are common causes:
1. Write Operations on the Slave
- This can happen in multiple scenarios. For example, if the VIP used by the front-end application experiences network heartbeat issues, it might switch between the master and slave nodes, causing writes on both sides—commonly referred to as a split-brain scenario.
- In dual-master setups, both nodes may perform write operations for load balancing or business requirements. However, if the replication process is interrupted or delayed, both nodes may write the same data, resulting in replication failure and increasing the risk of data inconsistency.
- Manual operations on the slave node, such as accidental modifications, can also cause discrepancies.
2. Non-Row Format Binary Logs
Binary logs can be in statement or mixed format, which may lead to unintended changes in data due to SQL statements being applied differently.
3. Stored Procedures or Triggers
Stored procedures or triggers may produce different results due to their local execution characteristics, especially with time-based or auto-increment fields.
auto increment, datetime, timestamp
4. Mismatched sql_mode Settings
Differences in sql_mode settings can lead to varying data handling. For example, the STRICT_TRANS_TABLES mode enforces strict validation, and mismatched settings can result in data truncation and inconsistency.
mysql> CREATE TABLE `t1` (
`id` int NOT NULL,
`name` varchar(10) ,
PRIMARY KEY (`id`)
) ENGINE=InnoDB ;
Query OK, 0 rows affected (0.01 sec)
mysql> show variables like '%sql_mode%';
+---------------+-----------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+---------------+-----------------------------------------------------------------------------------------------------------------------+
| sql_mode | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
+---------------+-----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> set sql_mode="";
Query OK, 0 rows affected (0.00 sec)
mysql> insert into t1(id,name) values(1,"hello beijing nice to meet you");
Query OK, 1 row affected, 1 warning (0.00 sec)
mysql> select * from t1;
+----+------------+
| id | name |
+----+------------+
| 1 | hello beij |
+----+------------+
1 row in set (0.00 sec)
5. Skipping Replication Errors
Using options like slave_skip_errors, sql_slave_skip_counter, or gtid_next can skip transactions, wich means they are not applied on the slave, potentially leading to data loss.
sql_slave_skip_counter: Skips N events after the current one in position-based replication, useful for recovering from errors caused by specific SQL statements.gtid_next: Manually skips transactions when there are GTID conflicts, ensuring precise GTID information and transaction numbers.slave_skip_errors: Skips errors listed in the configuration, allowing replication to continue but risking data inconsistency.
Example configuration:
[mysqld]
slave_skip_errors=1062,1053,1032
6. Double 1 Configuration
The sync_binlog and innodb_flush_log_at_trx_commit parameters control how logs are written to disk. If these are not set to 1, data inconsistency can occur in case of crashes.
7. Semi-Synchronous Replication
Improper configuration of semi-synchronous replication can introduce data inconsistency risks. The paramter rpl_semi_sync_master_wait_point determines whether the master waits for an acknowledgment after committing or before committing.
8. Replication Filtering Rules
Replication filtering rules such as ignore/do/rewrite can cause issues when cross-schema operations are performed.
9. binlog_row_image Settings
The binlog_row_image parameter defines what data is recorded in the binary log. It can be set to FULL, MINIMAL, or NOBLOB.
| Value | Description |
|---|---|
| full | Records all before and after images regardless of primary key constraints. |
| minimal | Retains only primary key columns in the before image and modified columns in the after image if a primary key exists. Otherwise, retains all before image and modified columns in the after image. |
| noblob | Excludes text/blob columns from before and after images if a primary key exists. Otherwise, retains all before and after images. |
10. sql_log_bin Parameter
The sql_log_bin parameter controls whether changes are logged in the binary log. If disabled, changes won't be replicated to the slave.
mysql> SET @@SESSION.SQL_LOG_BIN= 0;
Query OK, 0 rows affected (0.00 sec)
mysql> update t1 set age=10 where id =4;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> SET @@SESSION.SQL_LOG_BIN= 1;
11. Schema Inconsistencies
Discrepancies in table structures, such as differences in primary keys, unique indexes, or column types, can lead to unexpected issues.
12. Foreign Key Constraints
If foreign_key_checks is disabled on the slave, foreign key validation may not occur during SQL playback.
13. Backup Consistency Loss
Inconsistent backups, such as those taken without using single-transaction or master-data options, can result in replication issues.
14. Version Mismatches
Incompatibilities between the master and slave versions can lead to functional mismatches, especially when the master uses features unsupported by the slave.
15. Internal Bugs
MySQL bugs can also cause replication inconsistencies. For example, even if the slave processes are runing, replication may appear stuck.
Mitigating Data Discrepancies
Despite widespread use of MySQL, data inconsistencies still occur. These can stem from individual conditions or combinations of multiple factors. Timely detection and resolution are essential.
Preventing Inconsistencies
- Avoid multi-node writes; ensure business logic prevents conflicts.
- Avoid using `slave_skip_errors` unless absolutely necessary.
- Use double 1 settings for data safety.
- Set the slave to read-only mode.
- Enable semi-synchronous replication with `after_sync` mode.
- Use row-based binary logs and `binlog_row_image` set to `FULL`.
- Be cautious when promoting a slave to master during replication delays.
- Avoid cross-schema operations in replication filters.
- Limit the use of stored procedures and triggers that depend on special fields.
- Enforce strict `sql_mode` settings.
- Ensure consistent table structures between master and slave.
- Guarantee consistency during backup and recovery.
Detecting Inconsistencies
Regular monitoring and periodic consistency checks are crucial for identifying discrepancies early.
Fixing Data Discrepancies
1. Rebuild the Slave
Rebuilding the slave is slow and may impact query performance. However, it's often the most reliable solution when inconsistencies occur.
2. Use Percona Toolkit
Tools like pt-table-checksum and pt-table-sync help detect and resolve inconsistencies quickly without interrupting replication.
3. Manual Fixes
Manual repairs require temporarily stopping replication and addressing inconsistencies based on error messages.
Other than these, regular backups are essential. If all other methods fail, restoring from a backup is the final option.