MySQL

Lag de réplica MySQL: 9 causas, ordenadas por frecuencia

Seconds_Behind_Master es un entero mentiroso. Aquí va el orden real de diagnóstico — apply mono-hilo, transacciones largas, migraciones de esquema, red — con el SQL para cada una.

Published 2026-04-23·10 min read

Seconds_Behind_Master is a lying integer. It’s a delta between two clocks that could already be skewed, computed only when the IO thread is healthy. When you’re lagging, you need to know why, not how much. Here are the nine common causes ranked by how often they’re the culprit on a real production cluster, with the SQL to confirm each.

On this page

First two minutes
Nine causes, ranked
Parallel replication changes the picture
What to monitor
FAQ

First two minutes

SHOW REPLICA STATUS\G  -- (or SHOW SLAVE STATUS on 5.7)

-- Look at:
-- Replica_IO_Running, Replica_SQL_Running    (must be Yes)
-- Last_IO_Errno, Last_SQL_Errno              (must be 0)
-- Seconds_Behind_Source                      (the lying integer — but useful as a trend)
-- Replica_SQL_Running_State                  (the SQL thread's current activity)

If Replica_SQL_Running_State is “Reading event from the relay log” but progressing slowly, you’re bottlenecked on apply. If it’s “Waiting for an event from Coordinator” for long stretches, you’re bottlenecked on parallel apply slot allocation.

Nine causes, ranked by frequency

#	Cause	How to confirm
1	Single-threaded apply on a write-heavy primary	Check `replica_parallel_workers`; if 1, that’s your fix
2	Long transaction holding a lock	Replica side: `SHOW PROCESSLIST`, look for the apply thread waiting
3	DDL replication	`SELECT * FROM performance_schema.events_statements_history` on replica
4	Network bandwidth saturation	Compare relay-log growth rate vs. binlog generation rate
5	Disk IO saturation on replica	iostat / Performance Schema `file_summary_by_instance`
6	Different binlog_format causing extra work	RBS replicating row events while replica needs statement-level
7	Large GTID transaction (one statement = whole table)	Check `Executed_Gtid_Set` diff
8	Replica running heavy reads (BI queries)	Replica QPS > expected from app reads
9	Hardware downgrade vs primary	Replica instance class smaller than primary

Parallel replication changes the picture

From 8.0 with replica_parallel_type=LOGICAL_CLOCK and replica_parallel_workers > 1, multiple workers apply non-conflicting transactions in parallel. Setup:

SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK';
SET GLOBAL replica_parallel_workers = 8;
SET GLOBAL replica_preserve_commit_order = ON;

What to monitor (and what to alert on)

Lag in binlog bytes (not seconds) — accurate even when the SQL thread is stuck.
Apply throughput in events/s; sudden drop = SQL thread blocked.
Worker imbalance from replication_applier_status_by_worker — alert when 95th-percentile worker lag exceeds median by 5×.
GTID gaps (Retrieved_Gtid_Set vs Executed_Gtid_Set) for unexpected pauses.

FAQ

Is Seconds_Behind_Source ever accurate?+

When IO and SQL threads are healthy and clocks aren't skewed, yes. Use it for trend, not for debugging.

Should I disable replica_preserve_commit_order for max throughput?+

Only on read-only replicas where reads don't depend on causal consistency. Otherwise keep it on.

GTID vs binlog position — which to track?+

GTID is canonical in modern setups. Binlog position is still useful when comparing relay-log positions on the replica side.

Keep reading

MySQL

MySQL Performance Schema vs sys schema: a 2026 monitoring guide

Performance Schema is unreadable. sys schema is friendly but lossy. Here's exactly which to use for which production question, with the eight queries every MySQL DBA should know by heart.

Postgres

Postgres slow queries: 12 causes and how to find each one

A field-tested playbook for diagnosing a slow Postgres query in production — from missing indexes to plan flips to bloated tables — with the SQL to find each cause and the fix.

← All posts