Apache Cassandra Monitoring

Cassandra-Monitoring, das den Compaction-Backlog rechtzeitig sieht.

Obsfly extrahiert JMX-MBeans, Compaction-Stats, Hint-Queue-Größe und Repair-Status — übersetzt sie in handlungsrelevante Forecast-Bänder pro Knoten.

Book a Cassandra demo vs Datadog DBM

Why monitor Cassandra

Cassandra's pathologies are unique: hot partitions, compaction storms, repair backlogs, and DC-to-DC latency tail. Generic DBM tools miss the JMX surface that exposes them.

What we scrape

Obsfly reads Cassandra through the surfaces operators already know. No driver changes, no extensions installed by us, no agent on the database itself.

JMX MBeans (org.apache.cassandra.metrics)

Read/Write latency histograms per keyspace and table.

JMX (org.apache.cassandra.db)

Compaction state, hint queue, dropped messages.

JMX (org.apache.cassandra.net)

Cross-DC and cross-node messaging metrics.

Slow query log (5.0+)

Per-CQL slow execution log with attribution.

system.* tables

system.peers, system.local for topology, hints state.

Key metrics tracked

Read/write latency p99 per table

Per-keyspace, per-table histograms.

Pending compactions

Total + per-table; alert when growing for > 30 min.

Hinted handoff queue depth

If > 0 for long, replicas are missing data.

Dropped messages per minute

By type (READ, MUTATION, RANGE_SLICE, etc.).

Read repair rate

Background repairs triggered per query — high = data inconsistency.

Tombstone scan ratio

Per-query tombstones examined; high values mean delete-heavy workload.

Common Cassandra pains, and how Obsfly surfaces each

Compaction backlog growing under write load

Sign

Pending compactions climbs; SSTable count per partition grows.

Fix

Increase concurrent_compactors, tune compaction_throughput_mb_per_sec. Consider switching from STCS to LCS for read-heavy tables.

Hot partition detected

Sign

Latency tail dominated by one or two partitions; read repair rate spikes.

Fix

Schema problem. Re-shard the partition key to spread load.

Cross-DC tail latency

Sign

QUORUM/EACH_QUORUM consistency reads have long tails crossing DC boundaries.

Fix

Use LOCAL_QUORUM or LOCAL_ONE where consistency allows. Check DC link health via dropped messages.

vs Datadog DBM for Cassandra

Datadog Cassandra is JMX-scraping with limited per-table granularity. Obsfly extracts every Cassandra-specific MBean and surfaces hot partitions, repair lag, and tombstone ratios as first-class metrics with forecast bands.

Full Datadog DBM comparison →

Obsfly features for Cassandra

Feature

Query Summary

Top-N normalized queries with p50 / p95 / p99 latency, QPS, total time, rows touched, and plan-change history.

Feature

Anomaly Detection

ML-driven anomaly detection on every metric. Forecast bands, change-point detection, no thresholds to tune.

Feature

Configuration Tracking

Database parameter inventory, drift from baseline, recommended values, change history with attribution.

FAQ

Cassandra vs ScyllaDB — both supported?+

Yes. ScyllaDB exposes Cassandra-compatible JMX (or the modern HTTP REST API). The agent picks the best-available surface.

Versions?+

Apache Cassandra 3.11, 4.x, 5.0. DataStax Enterprise. ScyllaDB 5.x and 6.x.

Deep dives on Cassandra

Anomaly detection on database metrics: why thresholds fail and what works

A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.

· · ·

See Obsfly on your Cassandra.

20-min demo. We connect to a sample Cassandra on the call and reproduce your slowest query in the tool.

Demo buchen Dokumentation lesen