PostgreSQL Monitoring

Postgres-Monitoring, das die langsame Query wirklich findet.

Obsfly liest pg_stat_statements, pg_stat_activity, auto_explain und pg_locks bei 1 Hz, normalisiert alles auf eine Query-Signatur und zeigt dir die Slow Paths, bevor deine Nutzer es tun.

Book a Postgres demo vs Datadog DBM

Why monitor Postgres

Postgres ships with deep instrumentation (17 columns in pg_stat_statements, full plan capture via auto_explain, lock visibility via pg_locks) but using it well requires plumbing it into a system that retains, correlates, and alerts. Obsfly is that system, with no agent on the database itself — only a read-only monitoring user.

What we scrape

Obsfly reads Postgres through the surfaces operators already know. No driver changes, no extensions installed by us, no agent on the database itself.

pg_stat_statements

Per-signature execution counts, total/mean/stddev time, rows touched, buffer hit ratios, WAL volume.

pg_stat_activity

Live session state, current query, wait_event_type / wait_event, blocking PIDs.

auto_explain

EXPLAIN ANALYZE plans captured automatically when queries cross a latency threshold.

pg_locks + pg_blocking_pids()

Lock chains and AccessExclusiveLock detection sampled at 1 Hz.

pg_stat_user_tables / pg_stat_user_indexes

Bloat, dead tuples, last vacuum, last analyze, index usage and unused indexes.

pg_stat_replication / pg_stat_wal_receiver

Per-replica lag in bytes and seconds, sync state, write/flush/replay LSNs.

pg_stat_io (PG 16+)

Per-backend, per-context I/O breakdown — replaces guesswork about who's doing what to disk.

Key metrics tracked

Query latency p50 / p95 / p99 / p99.9

Per signature, merged across hosts via t-digest.

Active connections / max connections

Pool saturation alerts before connections refuse.

Cache hit ratio

shared_blks_hit / (hit + read) — anything under 95% wants attention.

WAL bytes per minute

Write amplification detector; pairs with bgwriter checkpoints.

Replica lag (bytes + seconds)

Per replica, with predicted-overrun forecast.

Dead tuple ratio

Per-table autovacuum tuning signal.

Lock waits per minute

Lock-chain pre-warning before something stalls.

Plan changes per signature

Plan-flip detection — surfaces the moment a deploy moves a query off an index.

Common Postgres pains, and how Obsfly surfaces each

Slow query that's only sometimes slow

Sign

stddev_exec_time is 5–10× mean_exec_time in pg_stat_statements.

Fix

Capture EXPLAIN on the slow path with auto_explain. Check for plan flips and parameter sensitivity.

Random latency spikes correlated with autovacuum

Sign

VACUUM events in logs align with the spikes; pg_stat_user_tables.last_autovacuum coincides.

Fix

Per-table autovacuum tuning. Lower autovacuum_vacuum_scale_factor on hot tables.

Replica lag growing under write load

Sign

pg_stat_replication.write_lag climbing; replay_lag larger than write_lag.

Fix

Single-threaded replay is the bottleneck. Bump shared_buffers on replicas, or split read load to a less-laggy replica.

Connection refusals despite low CPU

Sign

Aborted_connects climbs while idle connection count is high.

Fix

PgBouncer in transaction mode. Default Postgres handles ~100 backends well, transaction pooling handles 10×.

Bloated tables, week-over-week query slowdowns

Sign

pg_stat_user_tables.n_dead_tup > 20% of n_live_tup; cache hit ratio drops slowly.

Fix

Aggressive autovacuum on the affected tables. VACUUM (FULL) only during planned maintenance.

vs Datadog DBM for Postgres

Datadog DBM Postgres ships pg_stat_statements + pg_stat_activity scraping. Obsfly adds plan-change history per signature, t-digest-based percentile merging across hosts, automatic EXPLAIN diff on plan flips, and forecast bands on every metric — at roughly 1/3 the per-DB cost.

Full Datadog DBM comparison →

Obsfly features for Postgres

Feature

Query Summary

Top-N normalized queries with p50 / p95 / p99 latency, QPS, total time, rows touched, and plan-change history.

Feature

Explain Plan

Auto-captured EXPLAIN (ANALYZE, BUFFERS) plans on slow queries, plan diff over time, regression detection.

Feature

Deadlock Detection

Catch every deadlock with full lock-chain context, victim and aggressor stacks, and remediation suggestions.

Feature

Anomaly Detection

ML-driven anomaly detection on every metric. Forecast bands, change-point detection, no thresholds to tune.

FAQ

Does Obsfly install anything on my Postgres host?+

No. The Obsfly agent runs on a separate host (or as a sidecar) and connects via the standard Postgres protocol with a read-only monitoring user. No extensions installed by us — pg_stat_statements is the only required extension, and most installations have it.

Does it work with RDS, Aurora, Cloud SQL, Crunchy Bridge?+

Yes — every managed Postgres offering. The agent uses standard libpq; managed providers expose pg_stat_statements, pg_stat_activity, and pg_locks identically.

What about pgBouncer in front of Postgres?+

Obsfly scrapes both — pgBouncer's stats (SHOW POOLS) for connection metrics, and Postgres directly for query metrics. Connection attribution is preserved through transaction pooling.

Does the monitoring user need superuser?+

No. The pg_read_all_stats role (PG 10+) is enough. We provide a setup script that creates the user with minimal privileges.

How is this different from pgBadger or pganalyze?+

pgBadger is log-based and offline. pganalyze is closest to Datadog DBM in scope. Obsfly ships the same query-and-plan analysis plus AI-native anomaly detection, BYOC and Sovereign deployments, and per-DB pricing — pganalyze charges per server.

Deep dives on Postgres

Postgres

pg_stat_statements: the complete 2026 guide

Every column, every gotcha, the queries you should run today, and why pg_stat_statements is still the most useful 80 lines of telemetry in Postgres — even with five new alternatives in 2026.

Postgres

Postgres slow queries: 12 causes and how to find each one

A field-tested playbook for diagnosing a slow Postgres query in production — from missing indexes to plan flips to bloated tables — with the SQL to find each cause and the fix.

Postgres

Postgres lock chains: how to find the session blocking yours

A practical walkthrough of pg_locks, pg_blocking_pids, and the recursive CTE that gives you the full chain — including the AccessExclusiveLocks that quietly take your DB down.

Postgres

Why your Postgres p99 latency lies — and what to track instead

p99 over 1m windows is the most-displayed and most-misleading number on every DBM dashboard. Here's the histogram math, the seasonality math, and a saner default.

· · ·

See Obsfly on your Postgres.

20-min demo. We connect to a sample Postgres on the call and reproduce your slowest query in the tool.

Demo buchen Dokumentation lesen