Obsfly

Elasticsearch Monitoring

Elasticsearch monitoring that watches the JVM heap before it falls.

Obsfly scrapes _cluster/health, _cluster/stats, _nodes/stats, and the slow log every 15s — surfacing heap pressure, GC pauses, indexing throughput, and search latency tails across every node.

Why monitor Elasticsearch

Elasticsearch in production is mostly a JVM-tuning game with shard-allocation politics on top. The metrics that matter — heap pressure, GC time, indexing back-pressure, queue overflow — are buried in node stats. Obsfly surfaces them.

What we scrape

Obsfly reads Elasticsearch through the surfaces operators already know. No driver changes, no extensions installed by us, no agent on the database itself.

_cluster/health

Cluster status (green/yellow/red), unassigned shards, pending tasks.

_cluster/stats

Total shards, indices, fielddata size, query/fetch latency.

_nodes/stats

Per-node JVM heap, GC, thread pools, HTTP, transport.

Slow log (settings index.search.slowlog.*)

Slow searches and indexes captured per request.

_cat/shards / _cat/recovery

Shard placement and recovery state.

Key metrics tracked

JVM heap used %
> 75% sustained means GC pressure; > 85% means trouble.
GC old / young pause time
Pauses > 1s starve query latency.
Search latency p99 per index
From _nodes/stats search.fetch_time and slow log.
Indexing rate vs queue capacity
Bulk thread pool queue depth and rejections.
Unassigned shards
Cluster yellow → red in 1 metric.
Fielddata size / circuit breaker
Old-style fielddata can OOM nodes.

Common Elasticsearch pains, and how Obsfly surfaces each

Old-gen GC pauses spiking

Sign

Old-gen GC count growing; pause time > 1s; heap usage stays high after collection.

Fix

Heap is too small or fielddata bloat. Increase heap (max 30.5 GB for compressed oops), or migrate to doc_values.

Indexing rate drops under load

Sign

bulk thread pool rejections climb; queue depth saturated.

Fix

Increase queue size (cautiously). Better: shard your indices more, or switch to time-series data streams (TSDS).

Unassigned shards stay yellow/red

Sign

_cluster/health shows unassigned > 0; _cat/shards shows reason.

Fix

Allocation explain API: GET /_cluster/allocation/explain. Common causes: disk watermark exceeded, allocation filter mismatch.

vs Datadog DBM for Elasticsearch

Datadog Elasticsearch ships node-stats scraping. Obsfly adds shard-level allocation history, slow-log structured parsing per index, and JVM heap forecast bands — predicting OOM hours ahead.
Full Datadog DBM comparison →

FAQ

OpenSearch supported?+

Yes — same APIs and JVM. Both Elasticsearch (OSS and Elastic.co's commercial) and OpenSearch.

Versions?+

Elasticsearch 7.x, 8.x, 9.x. OpenSearch 1.x, 2.x, 3.x. Older 6.x works with reduced detail.

· · ·

See Obsfly on your Elasticsearch.

20-min demo. We connect to a sample Elasticsearch on the call and reproduce your slowest query in the tool.

Elasticsearch monitoring — slow log, cluster health, JVM, anomalies · Obsfly