Obsfly

Elasticsearch Monitoring

GC ポーズの嵐を予測する Elasticsearch 監視。

Obsfly はクラスタヘルス、JVM ヒープ圧迫、スローログ、シャード割り当てを読み取り — GC ストームやホットノードを30日前に捉える予測バンドに変換します。

Why monitor Elasticsearch

Elasticsearch in production is mostly a JVM-tuning game with shard-allocation politics on top. The metrics that matter — heap pressure, GC time, indexing back-pressure, queue overflow — are buried in node stats. Obsfly surfaces them.

What we scrape

Obsfly reads Elasticsearch through the surfaces operators already know. No driver changes, no extensions installed by us, no agent on the database itself.

_cluster/health

Cluster status (green/yellow/red), unassigned shards, pending tasks.

_cluster/stats

Total shards, indices, fielddata size, query/fetch latency.

_nodes/stats

Per-node JVM heap, GC, thread pools, HTTP, transport.

Slow log (settings index.search.slowlog.*)

Slow searches and indexes captured per request.

_cat/shards / _cat/recovery

Shard placement and recovery state.

Key metrics tracked

JVM heap used %
> 75% sustained means GC pressure; > 85% means trouble.
GC old / young pause time
Pauses > 1s starve query latency.
Search latency p99 per index
From _nodes/stats search.fetch_time and slow log.
Indexing rate vs queue capacity
Bulk thread pool queue depth and rejections.
Unassigned shards
Cluster yellow → red in 1 metric.
Fielddata size / circuit breaker
Old-style fielddata can OOM nodes.

Common Elasticsearch pains, and how Obsfly surfaces each

Old-gen GC pauses spiking

Sign

Old-gen GC count growing; pause time > 1s; heap usage stays high after collection.

Fix

Heap is too small or fielddata bloat. Increase heap (max 30.5 GB for compressed oops), or migrate to doc_values.

Indexing rate drops under load

Sign

bulk thread pool rejections climb; queue depth saturated.

Fix

Increase queue size (cautiously). Better: shard your indices more, or switch to time-series data streams (TSDS).

Unassigned shards stay yellow/red

Sign

_cluster/health shows unassigned > 0; _cat/shards shows reason.

Fix

Allocation explain API: GET /_cluster/allocation/explain. Common causes: disk watermark exceeded, allocation filter mismatch.

vs Datadog DBM for Elasticsearch

Datadog Elasticsearch ships node-stats scraping. Obsfly adds shard-level allocation history, slow-log structured parsing per index, and JVM heap forecast bands — predicting OOM hours ahead.
Full Datadog DBM comparison →

FAQ

OpenSearch supported?+

Yes — same APIs and JVM. Both Elasticsearch (OSS and Elastic.co's commercial) and OpenSearch.

Versions?+

Elasticsearch 7.x, 8.x, 9.x. OpenSearch 1.x, 2.x, 3.x. Older 6.x works with reduced detail.

· · ·

See Obsfly on your Elasticsearch.

20-min demo. We connect to a sample Elasticsearch on the call and reproduce your slowest query in the tool.

Elasticsearch monitoring — slow log, cluster health, JVM, anomalies · Obsfly