MongoDB Monitoring
Le monitoring MongoDB qui attrape le lag de réplica à temps.
Obsfly agrège serverStatus, db.stats, currentOp et la sortie du profiler en latence par signature, wait events et bandes de prévision pour chaque réplica et shard.
Why monitor MongoDB
Monitoring MongoDB well means scraping four different command surfaces and reasoning about replica sets, oplog windows, sharded chunks, and aggregation pipelines. Obsfly does all of that out of the box.
What we scrape
Obsfly reads MongoDB through the surfaces operators already know. No driver changes, no extensions installed by us, no agent on the database itself.
serverStatus()
Host-level rollups: opcounters, connections, WiredTiger cache, network, locks.
db.stats()
Per-database storage, index size, collection count, document count.
db.currentOp()
Live in-flight operations sampled at 1 Hz with lock waits and plan summaries.
system.profile (per-DB profiler)
Persisted slow-op log with execution stats, examined/returned ratios, write conflicts.
rs.status() / db.getReplicationInfo()
Replica set health, per-member lag, oplog window, election history.
$indexStats aggregation
Per-index access counts to find unused indexes.
Key metrics tracked
Common MongoDB pains, and how Obsfly surfaces each
Slow aggregation pipeline
Sign
executionStats.totalDocsExamined >> nReturned in explain output.
Fix
Reorder $match before $group/$lookup. Add index on $match fields. Profile the heavy stage with stages[i].executionTimeMillisEstimate.
Replica lag spikes under write load
Sign
rs.status() shows secondaries' optimeDate falling behind primary.
Fix
Single-threaded oplog application by default. Bump WiredTiger cache on replicas; consider write concern w:majority for important writes.
WiredTiger cache thrashing
Sign
'pages evicted by application threads' growing; query latency rising despite stable workload.
Fix
Working set exceeds cache. Bump WiredTiger cache to ~50% of host RAM (default). Or shard.
Hot shard in a sharded cluster
Sign
Per-shard opcounters show one shard handling most writes; balancer not moving chunks.
Fix
Shard key choice. Often the shard key is too low-cardinality or monotonically increasing. Range-sharding by ObjectId is the classic anti-pattern.
vs Datadog DBM for MongoDB
Obsfly features for MongoDB
Feature
Query Summary
Top-N normalized queries with p50 / p95 / p99 latency, QPS, total time, rows touched, and plan-change history.
Feature
Query Activity
Live query stream with wait events, lock chains, slow-query alerts, and sample-once-per-second activity snapshots.
Feature
Anomaly Detection
ML-driven anomaly detection on every metric. Forecast bands, change-point detection, no thresholds to tune.
Feature
Schema Monitoring
Track every schema change, index health, bloat, drift from baseline, and slow-growing tables.
FAQ
What MongoDB versions and topologies?+
4.4, 5.0, 6.0, 7.0, 8.0. Standalone, replica sets, and sharded clusters. MongoDB Atlas and Amazon DocumentDB.
Does the profiler add load?+
Level 1 (slow ops only) at slowms=100ms typically adds <2% on OLTP. Level 2 (all ops) is for diagnosis only.
What permissions does the monitoring user need?+
clusterMonitor on the cluster + read on each database whose system.profile you want to scrape. We supply the createRole script.
Atlas Performance Advisor — do I still need this?+
Atlas covers slow-query advice on Atlas-hosted clusters only. Obsfly gives the same coverage plus self-hosted, plus replica/oplog/sharded metrics not in Performance Advisor.
Deep dives on MongoDB
MongoDB
MongoDB performance monitoring in production: a 2026 guide
Four surfaces (serverStatus, db.stats, currentOp, profiler), a sane default for what to scrape from each, and how to reason about replica lag, oplog window, and aggregation pipeline cost.
AI
Anomaly detection on database metrics: why thresholds fail and what works
A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.
· · ·
See Obsfly on your MongoDB.
20-min demo. We connect to a sample MongoDB on the call and reproduce your slowest query in the tool.