ClickHouse Monitoring
Le monitoring ClickHouse qui ne devient pas lui-même un hot spot.
Obsfly lit system.query_log, system.parts, system.merges et system.replication_queue — sélectivement et pré-agrégé, pour que votre monitoring ne devienne pas la requête la plus lente.
Why monitor ClickHouse
ClickHouse has unique pathologies — too-many-parts errors, slow merges, distributed query stuck on a single shard. Generic DBM tools miss them. Obsfly ships ClickHouse-native metrics out of the box.
What we scrape
Obsfly reads ClickHouse through the surfaces operators already know. No driver changes, no extensions installed by us, no agent on the database itself.
system.query_log
Per-query execution: type, query_kind, duration, read/written rows, memory usage.
system.parts / system.parts_columns
Active part counts per table, total bytes, granule count.
system.merges / system.mutations
In-flight merges, mutation backlog, per-table merge throughput.
system.replication_queue
Replicated table operations queued, errors, blocked merges.
system.metrics / system.events / system.asynchronous_metrics
Counters for everything from compressed bytes to file descriptors.
system.processes
Live queries, memory consumption, elapsed time.
Key metrics tracked
Common ClickHouse pains, and how Obsfly surfaces each
'Too many parts' errors blocking inserts
Sign
Insert fails with TOO_MANY_PARTS; system.parts shows 300+ active parts on the table.
Fix
Batch inserts to fewer, larger writes. Tune background_pool_size. Consider min_bytes_for_wide_part.
Slow merges, growing parts count
Sign
system.merges has long-running merges; parts count climbs week-over-week.
Fix
Storage IO ceiling, or merge thread starvation. Inspect background_pool_size and max_bytes_to_merge_at_*.
Replicated table stuck
Sign
system.replication_queue shows operations with last_exception set.
Fix
Inspect the exception. Common: schema mismatch between replicas, ZooKeeper quota exhausted.
vs Datadog DBM for ClickHouse
Obsfly features for ClickHouse
Feature
Query Summary
Top-N normalized queries with p50 / p95 / p99 latency, QPS, total time, rows touched, and plan-change history.
Feature
Query Activity
Live query stream with wait events, lock chains, slow-query alerts, and sample-once-per-second activity snapshots.
Feature
Anomaly Detection
ML-driven anomaly detection on every metric. Forecast bands, change-point detection, no thresholds to tune.
FAQ
Self-hosted, ClickHouse Cloud, Altinity — all supported?+
Yes. The agent connects via the native protocol and reads the system database. Cloud and Altinity expose the same system tables.
Obsfly runs on ClickHouse — do you eat your own?+
Yes. Our internal observability for the Obsfly data plane is Obsfly scraping its own ClickHouse. The integration is hardened by us using it 24/7.
· · ·
See Obsfly on your ClickHouse.
20-min demo. We connect to a sample ClickHouse on the call and reproduce your slowest query in the tool.