Elasticsearch Monitoring
提前预测 GC 暂停风暴的 Elasticsearch 监控。
Obsfly 读取集群健康、JVM 堆压力、慢日志与分片分配,翻译为提前 30 天捕获 GC 风暴与热节点的预测带。
Why monitor Elasticsearch
Elasticsearch in production is mostly a JVM-tuning game with shard-allocation politics on top. The metrics that matter — heap pressure, GC time, indexing back-pressure, queue overflow — are buried in node stats. Obsfly surfaces them.
What we scrape
Obsfly reads Elasticsearch through the surfaces operators already know. No driver changes, no extensions installed by us, no agent on the database itself.
_cluster/health
Cluster status (green/yellow/red), unassigned shards, pending tasks.
_cluster/stats
Total shards, indices, fielddata size, query/fetch latency.
_nodes/stats
Per-node JVM heap, GC, thread pools, HTTP, transport.
Slow log (settings index.search.slowlog.*)
Slow searches and indexes captured per request.
_cat/shards / _cat/recovery
Shard placement and recovery state.
Key metrics tracked
Common Elasticsearch pains, and how Obsfly surfaces each
Old-gen GC pauses spiking
Sign
Old-gen GC count growing; pause time > 1s; heap usage stays high after collection.
Fix
Heap is too small or fielddata bloat. Increase heap (max 30.5 GB for compressed oops), or migrate to doc_values.
Indexing rate drops under load
Sign
bulk thread pool rejections climb; queue depth saturated.
Fix
Increase queue size (cautiously). Better: shard your indices more, or switch to time-series data streams (TSDS).
Unassigned shards stay yellow/red
Sign
_cluster/health shows unassigned > 0; _cat/shards shows reason.
Fix
Allocation explain API: GET /_cluster/allocation/explain. Common causes: disk watermark exceeded, allocation filter mismatch.
vs Datadog DBM for Elasticsearch
Obsfly features for Elasticsearch
Feature
Query Summary
Top-N normalized queries with p50 / p95 / p99 latency, QPS, total time, rows touched, and plan-change history.
Feature
Anomaly Detection
ML-driven anomaly detection on every metric. Forecast bands, change-point detection, no thresholds to tune.
Feature
Forecast
Capacity forecasts for QPS, IOPS, storage, connections — predict outages weeks ahead.
FAQ
OpenSearch supported?+
Yes — same APIs and JVM. Both Elasticsearch (OSS and Elastic.co's commercial) and OpenSearch.
Versions?+
Elasticsearch 7.x, 8.x, 9.x. OpenSearch 1.x, 2.x, 3.x. Older 6.x works with reduced detail.