MySQL
Lag de réplica MySQL : 9 causes classées par fréquence
Seconds_Behind_Master est un entier menteur. Voici un véritable ordre de diagnostic — apply mono-thread, longues transactions, migrations de schéma, réseau — avec le SQL pour chaque.
Seconds_Behind_Master is a lying integer. It’s a delta between two clocks that could already be skewed, computed only when the IO thread is healthy. When you’re lagging, you need to know why, not how much. Here are the nine common causes ranked by how often they’re the culprit on a real production cluster, with the SQL to confirm each.
On this page
First two minutes
SHOW REPLICA STATUS\G -- (or SHOW SLAVE STATUS on 5.7) -- Look at: -- Replica_IO_Running, Replica_SQL_Running (must be Yes) -- Last_IO_Errno, Last_SQL_Errno (must be 0) -- Seconds_Behind_Source (the lying integer — but useful as a trend) -- Replica_SQL_Running_State (the SQL thread's current activity)
If Replica_SQL_Running_State is “Reading event from the relay log” but progressing slowly, you’re bottlenecked on apply. If it’s “Waiting for an event from Coordinator” for long stretches, you’re bottlenecked on parallel apply slot allocation.
Nine causes, ranked by frequency
| # | Cause | How to confirm |
|---|---|---|
| 1 | Single-threaded apply on a write-heavy primary | Check replica_parallel_workers; if 1, that’s your fix |
| 2 | Long transaction holding a lock | Replica side: SHOW PROCESSLIST, look for the apply thread waiting |
| 3 | DDL replication | SELECT * FROM performance_schema.events_statements_history on replica |
| 4 | Network bandwidth saturation | Compare relay-log growth rate vs. binlog generation rate |
| 5 | Disk IO saturation on replica | iostat / Performance Schema file_summary_by_instance |
| 6 | Different binlog_format causing extra work | RBS replicating row events while replica needs statement-level |
| 7 | Large GTID transaction (one statement = whole table) | Check Executed_Gtid_Set diff |
| 8 | Replica running heavy reads (BI queries) | Replica QPS > expected from app reads |
| 9 | Hardware downgrade vs primary | Replica instance class smaller than primary |
Parallel replication changes the picture
From 8.0 with replica_parallel_type=LOGICAL_CLOCK and replica_parallel_workers > 1, multiple workers apply non-conflicting transactions in parallel. Setup:
SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK'; SET GLOBAL replica_parallel_workers = 8; SET GLOBAL replica_preserve_commit_order = ON;
What to monitor (and what to alert on)
- Lag in binlog bytes (not seconds) — accurate even when the SQL thread is stuck.
- Apply throughput in events/s; sudden drop = SQL thread blocked.
- Worker imbalance from
replication_applier_status_by_worker— alert when 95th-percentile worker lag exceeds median by 5×. - GTID gaps (
Retrieved_Gtid_SetvsExecuted_Gtid_Set) for unexpected pauses.
FAQ
Is Seconds_Behind_Source ever accurate?+
Should I disable replica_preserve_commit_order for max throughput?+
GTID vs binlog position — which to track?+
Keep reading
MySQL
MySQL Performance Schema vs sys schema: a 2026 monitoring guide
Performance Schema is unreadable. sys schema is friendly but lossy. Here's exactly which to use for which production question, with the eight queries every MySQL DBA should know by heart.
Postgres
Postgres slow queries: 12 causes and how to find each one
A field-tested playbook for diagnosing a slow Postgres query in production — from missing indexes to plan flips to bloated tables — with the SQL to find each cause and the fix.