Postgres
Consultas lentas en Postgres: 12 causas y cómo encontrar cada una
Un playbook probado en campo para diagnosticar una consulta Postgres lenta en producción — desde índices faltantes hasta cambios de plan y tablas hinchadas — con el SQL de cada causa y la solución.
Most “Postgres is slow” tickets are not really about Postgres. They’re about one query that’s slow, sometimes, on some path, and you have an hour to figure out which of a dozen possible causes is the real one. This is the playbook we use — twelve causes, the SQL to diagnose each, and the fix.
The order matters: the cheap-to-rule-out causes come first.
On this page
Step 1 — confirm it’s actually slow
Before you spend an hour, make sure the symptom is real. p99 charts lie at low call counts; mean latency hides bimodal queries. Pull the actual time per call:
SELECT queryid, calls, mean_exec_time, stddev_exec_time, max_exec_time FROM pg_stat_statements WHERE query ILIKE '%table_name%' ORDER BY total_exec_time DESC;
If stddev_exec_time is 5–10× mean_exec_time, the query is bimodal — usually fast, sometimes catastrophic. Diagnose the catastrophic path, not the mean.
Step 2 — get a real plan
Run EXPLAIN (ANALYZE, BUFFERS, VERBOSE) on the slow path with the actual parameters the production code uses. Not with literal values you guessed.
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT … FROM orders WHERE customer_id = 4128;
Or, if you can’t reproduce: enable auto_explain with auto_explain.log_min_duration = '500ms' and wait for it to fire. The plan ends up in the Postgres log, untouched by you.
The 12 causes (in order of likelihood)
1. Missing index
Telltale: a Seq Scan with a filter that removes >90% of rows. EXPLAIN ANALYZE shows actual rows=N « rows removed by filter=M with M >> N.
Seq Scan on orders (cost=0..50000 rows=1000000 width=...) (actual time=120..2400 rows=4 loops=1) Filter: (customer_id = 4128) Rows Removed by Filter: 999996
Fix: CREATE INDEX CONCURRENTLY ON orders (customer_id); Be paranoid about CONCURRENTLY in production — without it, you take an AccessExclusiveLock for the duration of the build. (See Postgres lock chains.)
2. Stale statistics, wrong row estimates
Telltale: EXPLAIN ANALYZE shows rows=10 in the estimate but actual rows=2,000,000. The planner picked the wrong join order.
ANALYZE orders; -- one table ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000; -- finer histograms
For correlated columns, create extended statistics: CREATE STATISTICS s_orders (dependencies, ndistinct) ON customer_id, region FROM orders;
3. Sequential scan on a large table
Sometimes intentional (small table, planner is right). Sometimes a sign that an index exists but the planner ignores it. Run EXPLAIN with set enable_seqscan = off; in your session — if the index plan is faster, you have a cost-model mismatch (often random_page_cost too high on SSD).
4. Lock waits
Telltale: query is “sometimes fast, sometimes 30s.” pg_stat_activity shows wait_event_type = Lock during the slow path.
Diagnosis: capture pg_blocking_pids() at 1 Hz and reconstruct the lock chain. Full recipe in Postgres lock chains.
5. TOAST and oversized columns
Telltale: a SELECT * on a wide table is slow even when the WHERE clause uses a primary key. The cost is in detoasting large compressed columns (often jsonb or text bodies).
Fix: don’t SELECT *. Project only what you need.
6. Bloat
Telltale: cache hit ratio drops, shared_blks_read climbs, the same query gets slower week-over-week despite no change in data volume. pg_stat_user_tables.n_dead_tup is large relative to n_live_tup.
SELECT relname, n_live_tup, n_dead_tup,
round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC LIMIT 20;Fix: tune autovacuum per-table. VACUUM (FULL, VERBOSE) rewrites the table — only when you can take an exclusive lock.
7. Connection pool saturation
Telltale: query latency spikes correlate with pg_stat_activity count. Postgres works fine, the application is starving for connections.
Fix: PgBouncer in transaction mode. Cap session-mode pool at <100; transaction mode handles 10x more workload at the same backend count.
8. Sort spilling to disk
Telltale: EXPLAIN ANALYZE shows Sort Method: external merge Disk: 1234kB.
Fix: bump work_mem for the role/session. Be careful — work_mem applies per sort/hash node, not per query, so a complex query can multiply it.
9. Hash join with too-small work_mem
Telltale: Hash Batches: 32 when a single batch would fit. Same fix as #8.
10. Plan flip
Telltale: query was fast, deploys nothing, gets slow. Same SQL, different plan. auto_explain proves it: yesterday’s plan used an index, today’s does a hash join.
Causes: stats updated, parameter values cross a histogram boundary, table-row count crosses a planner threshold. Fix: extended statistics, or pg_hint_plan as a pin-of-last-resort. Better: have a tool that tells you the plan flipped — see Obsfly Explain Plan.
11. Slow IO (network, disk, replica lag)
Telltale: blk_read_time dominates total_exec_time.
On RDS, hit the IOPS ceiling and your latency triples. On Aurora, replica lag pushes reads to the primary. Fix: cloud-side IOPS / autoscaling, or split read workload differently.
12. PL/pgSQL or trigger overhead
Telltale: EXPLAIN ANALYZE on the SQL is fast, but calls in pg_stat_statements shows it was called many more times than expected. A trigger or function is multiplying executions.
Fix: track = all in pg_stat_statements config to see inner statements.
What to set up so this isn’t manual next time
- Enable
pg_stat_statementswithtrack = all(see guide) - Enable
auto_explainwithlog_min_duration = 500ms - Enable
track_io_timingforblk_read_timedata - Enable
track_planningfor plan-time visibility (PG 13+) - Capture
pg_stat_activityat 1 Hz for lock-chain reconstruction - Capture EXPLAIN plans on slow signatures and store them — plan-flips need a baseline to compare against
FAQ
What's the right order to investigate causes?+
Do I need superuser to diagnose?+
How long should an EXPLAIN ANALYZE take?+
Keep reading
Postgres
pg_stat_statements: the complete 2026 guide
Every column, every gotcha, the queries you should run today, and why pg_stat_statements is still the most useful 80 lines of telemetry in Postgres — even with five new alternatives in 2026.
Postgres
Postgres lock chains: how to find the session blocking yours
A practical walkthrough of pg_locks, pg_blocking_pids, and the recursive CTE that gives you the full chain — including the AccessExclusiveLocks that quietly take your DB down.
Postgres
Why your Postgres p99 latency lies — and what to track instead
p99 over 1m windows is the most-displayed and most-misleading number on every DBM dashboard. Here's the histogram math, the seasonality math, and a saner default.