Obsfly
postgres / activity / pg_locks · chainliveRecursive chain · 6 sessions blocked · oldest 47spid 4128ACTIVEpid 5031Lockpid 6210Lockpid 7012Lockpid 5302Lockpid 6498Lockroot holder · idle in transaction · 47s

Postgres

Chaînes de verrous Postgres : trouver la session qui bloque la vôtre

Visite pratique de pg_locks, pg_blocking_pids et de la CTE récursive qui donne la chaîne complète — y compris les AccessExclusiveLocks qui plantent silencieusement votre base.

Published ·11 min read

Most Postgres outages don’t come from slow queries. They come from queries waiting on locks. Reproducing this on a developer’s laptop is hard — the chain forms and dissolves in seconds. Catching it in production needs a specific set of queries you should have memorized. This post is that set.

On this page
  1. What a lock chain is
  2. The two views you need
  3. pg_blocking_pids — the easy mode
  4. The recursive CTE for the full chain
  5. Reading wait_event_type and wait_event
  6. AccessExclusiveLock — the silent killer
  7. Automating capture for postmortems
  8. FAQ

What a lock chain is

Two sessions: A is holding lock X, B is waiting for X. That’s a chain of length 1. Add C waiting on B (because B holds lock Y that C wants), and you have length 2. In bad incidents the chain stretches to 50+ — and then connection limits hit, and your app falls over with confusing errors that look like a DB outage but are really one stuck transaction.

The two views you need

  • pg_locks — every lock currently held or wanted, by every session
  • pg_stat_activity — every session, what it’s running, when it started, what it’s waiting on

Lock chains live in pg_locks. Context (which query, which user, since when) lives in pg_stat_activity. You always join the two.

pg_blocking_pids — the easy mode

Postgres 9.6+ has a function that does the work for you. It returns the array of PIDs blocking a given PID:

SELECT
  pid,
  pg_blocking_pids(pid) AS blocked_by,
  state,
  wait_event_type,
  wait_event,
  query
FROM pg_stat_activity
WHERE cardinality(pg_blocking_pids(pid)) > 0
ORDER BY query_start;

This gives you length-1 visibility — every session, who is blocking it. Good enough for most incidents. For the full chain, recurse.

The recursive CTE for the full chain

This walks the graph from any blocked session up to the root holder.

WITH RECURSIVE chain AS (
  -- seed: every blocked session
  SELECT pid,
         pg_blocking_pids(pid) AS blocked_by,
         1 AS depth,
         ARRAY[pid] AS path
  FROM pg_stat_activity
  WHERE cardinality(pg_blocking_pids(pid)) > 0
  UNION ALL
  -- recurse: follow the blockers
  SELECT a.pid,
         pg_blocking_pids(a.pid) AS blocked_by,
         c.depth + 1,
         c.path || a.pid
  FROM chain c
  JOIN unnest(c.blocked_by) AS bp(pid) ON TRUE
  JOIN pg_stat_activity a ON a.pid = bp.pid
  WHERE NOT a.pid = ANY(c.path)   -- prevent infinite loops on cycles
)
SELECT depth, path, a.state, a.wait_event_type,
       a.wait_event, age(now(), a.query_start) AS age,
       substring(a.query for 100) AS q
FROM chain c
JOIN pg_stat_activity a USING (pid)
ORDER BY depth DESC, age DESC;

The session at the bottom of the chain (highest depth) is your culprit. If its state is 'idle in transaction' with a long age, you’ve found a stuck transaction.

Reading wait_event_type and wait_event

wait_event_typeCommon wait_event valuesWhat it usually means
Locktransactionid, relation, tupleWaiting on another session to commit/release
LWLockBufferContent, ProcArrayLock, WALInsertInternal Postgres latch — usually a sign of true contention or a hot page
IODataFileRead, WALSync, BufFileReadDisk-level wait, not lock-level
ClientClientRead, ClientWriteWaiting on the client app — connection idle or slow consumer
IPCMessageQueueSend, ParallelFinishInternal coordination — usually parallel workers

For lock-chain triage, focus on wait_event_type = 'Lock'. Other types matter, but they’re different problem classes.

AccessExclusiveLock — the silent killer

AccessExclusiveLock blocks everything, including read queries. Operations that quietly take this lock:

  • ALTER TABLE in most variants (rewrite, type change, NOT NULL add)
  • VACUUM FULL and CLUSTER
  • DROP TABLE / INDEX
  • REINDEX without CONCURRENTLY
  • CREATE INDEX without CONCURRENTLY
  • Some TRUNCATE patterns

Find them with:

SELECT pid, mode, granted, relation::regclass, query
FROM pg_locks l
JOIN pg_stat_activity a USING (pid)
WHERE mode = 'AccessExclusiveLock'
ORDER BY granted, query_start;

Anything ungranted is waiting; anything granted is the active blocker.

Automating capture for postmortems

Lock chains dissolve before you finish reading the runbook. The only reliable way to debug them is to capture pg_stat_activity + pg_locks at 1 Hz, persistently, so you can scrub backward to the moment.

That’s exactly what Obsfly Query Activity and Deadlock Detection do — sample at 1 Hz, store in ClickHouse for 30 days, render the lock graph at any point in time. If you’d rather build it yourself, the recipe is: cron a snapshot every second to a side table, plus a ring-buffer to bound storage.

FAQ

Why does pg_blocking_pids return a list and not a single PID?+
A session can be blocked by multiple holders simultaneously (e.g. it wants a lock that several other sessions hold in shared mode). Postgres reflects that.
What's the difference between a lock chain and a deadlock?+
A deadlock is a cycle. Postgres detects them in 1 second and aborts one session — they're loud. A lock chain is just a long wait line; nobody dies, everything stalls. That's worse.
Can I terminate a session safely?+
pg_cancel_backend(pid) cancels the current statement; pg_terminate_backend(pid) drops the session entirely. Cancel first, then terminate if cancel doesn't take effect within a few seconds.
How do I prevent lock chains in the first place?+
Short transactions. Set statement_timeout and lock_timeout. Always use CONCURRENTLY for index ops in production. Avoid 'idle in transaction' by configuring idle_in_transaction_session_timeout.

Keep reading

· · ·

Surveillez vos bases comme vos services.

Réservez une démo de 30 minutes. Nous spécifions votre flotte ensemble et chiffrons votre premier deal de 30 jours.

Chaînes de verrous Postgres : trouver la session qui bloque la vôtre · Obsfly