Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Common Causes of Data Discrepancies in MySQL Replication

Tech May 18 2

MySQL replication relies on logical SQL statement replication. During this process, the executed SQL statements are validated, but the overall structure (such as tables and databases) is not checked. In current high availability architectures, the load and performance of a single node can affect the speed of Relay Log replay. If this exceeds capacity, delays are inevitable.

In such cases, during master-slave switching, data discrepancies may occur, potentially leading to severe data inconsistencies. For core enterprise systems, data inconsistency is a critical issue.

Causes of Data Discrepancies

Data inconsistency in master-slave replication can arise from various factors. Based on known cases, the following are common causes:

1. Write Operations on the Slave

  • This can happen in multiple scenarios. For example, if the VIP used by the front-end application experiences network heartbeat issues, it might switch between the master and slave nodes, causing writes on both sides—commonly referred to as a split-brain scenario.
  • In dual-master setups, both nodes may perform write operations for load balancing or business requirements. However, if the replication process is interrupted or delayed, both nodes may write the same data, resulting in replication failure and increasing the risk of data inconsistency.
  • Manual operations on the slave node, such as accidental modifications, can also cause discrepancies.

2. Non-Row Format Binary Logs

Binary logs can be in statement or mixed format, which may lead to unintended changes in data due to SQL statements being applied differently.

3. Stored Procedures or Triggers

Stored procedures or triggers may produce different results due to their local execution characteristics, especially with time-based or auto-increment fields.

auto increment, datetime, timestamp

4. Mismatched sql_mode Settings

Differences in sql_mode settings can lead to varying data handling. For example, the STRICT_TRANS_TABLES mode enforces strict validation, and mismatched settings can result in data truncation and inconsistency.

mysql> CREATE TABLE `t1` (
       `id` int NOT NULL,
       `name` varchar(10) ,
       PRIMARY KEY (`id`)
     ) ENGINE=InnoDB ;
Query OK, 0 rows affected (0.01 sec)

mysql> show variables like '%sql_mode%';
+---------------+-----------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value                                                                                                                 |
+---------------+-----------------------------------------------------------------------------------------------------------------------+
| sql_mode      | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
+---------------+-----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> set sql_mode="";
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1(id,name) values(1,"hello beijing nice to meet you");
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> select * from t1;
+----+------------+
| id | name       |
+----+------------+
|  1 | hello beij |
+----+------------+
1 row in set (0.00 sec)


5. Skipping Replication Errors

Using options like slave_skip_errors, sql_slave_skip_counter, or gtid_next can skip transactions, wich means they are not applied on the slave, potentially leading to data loss.

  • sql_slave_skip_counter: Skips N events after the current one in position-based replication, useful for recovering from errors caused by specific SQL statements.
  • gtid_next: Manually skips transactions when there are GTID conflicts, ensuring precise GTID information and transaction numbers.
  • slave_skip_errors: Skips errors listed in the configuration, allowing replication to continue but risking data inconsistency.

Example configuration:

[mysqld]
slave_skip_errors=1062,1053,1032

6. Double 1 Configuration

The sync_binlog and innodb_flush_log_at_trx_commit parameters control how logs are written to disk. If these are not set to 1, data inconsistency can occur in case of crashes.

7. Semi-Synchronous Replication

Improper configuration of semi-synchronous replication can introduce data inconsistency risks. The paramter rpl_semi_sync_master_wait_point determines whether the master waits for an acknowledgment after committing or before committing.

8. Replication Filtering Rules

Replication filtering rules such as ignore/do/rewrite can cause issues when cross-schema operations are performed.

9. binlog_row_image Settings

The binlog_row_image parameter defines what data is recorded in the binary log. It can be set to FULL, MINIMAL, or NOBLOB.

Value Description
full Records all before and after images regardless of primary key constraints.
minimal Retains only primary key columns in the before image and modified columns in the after image if a primary key exists. Otherwise, retains all before image and modified columns in the after image.
noblob Excludes text/blob columns from before and after images if a primary key exists. Otherwise, retains all before and after images.

10. sql_log_bin Parameter

The sql_log_bin parameter controls whether changes are logged in the binary log. If disabled, changes won't be replicated to the slave.

mysql> SET @@SESSION.SQL_LOG_BIN= 0;
Query OK, 0 rows affected (0.00 sec)
mysql> update t1 set age=10 where id =4;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> SET @@SESSION.SQL_LOG_BIN= 1;

11. Schema Inconsistencies

Discrepancies in table structures, such as differences in primary keys, unique indexes, or column types, can lead to unexpected issues.

12. Foreign Key Constraints

If foreign_key_checks is disabled on the slave, foreign key validation may not occur during SQL playback.

13. Backup Consistency Loss

Inconsistent backups, such as those taken without using single-transaction or master-data options, can result in replication issues.

14. Version Mismatches

Incompatibilities between the master and slave versions can lead to functional mismatches, especially when the master uses features unsupported by the slave.

15. Internal Bugs

MySQL bugs can also cause replication inconsistencies. For example, even if the slave processes are runing, replication may appear stuck.

Mitigating Data Discrepancies

Despite widespread use of MySQL, data inconsistencies still occur. These can stem from individual conditions or combinations of multiple factors. Timely detection and resolution are essential.

Preventing Inconsistencies

- Avoid multi-node writes; ensure business logic prevents conflicts.
- Avoid using `slave_skip_errors` unless absolutely necessary.
- Use double 1 settings for data safety.
- Set the slave to read-only mode.
- Enable semi-synchronous replication with `after_sync` mode.
- Use row-based binary logs and `binlog_row_image` set to `FULL`.
- Be cautious when promoting a slave to master during replication delays.
- Avoid cross-schema operations in replication filters.
- Limit the use of stored procedures and triggers that depend on special fields.
- Enforce strict `sql_mode` settings.
- Ensure consistent table structures between master and slave.
- Guarantee consistency during backup and recovery.

Detecting Inconsistencies

Regular monitoring and periodic consistency checks are crucial for identifying discrepancies early.

Fixing Data Discrepancies

1. Rebuild the Slave

Rebuilding the slave is slow and may impact query performance. However, it's often the most reliable solution when inconsistencies occur.

2. Use Percona Toolkit

Tools like pt-table-checksum and pt-table-sync help detect and resolve inconsistencies quickly without interrupting replication.

3. Manual Fixes

Manual repairs require temporarily stopping replication and addressing inconsistencies based on error messages.

Other than these, regular backups are essential. If all other methods fail, restoring from a backup is the final option.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.