Obsfly
clickhouse / system.parts · merges_in_progressliveparts approaching parts_to_throw_insert (5000) · alerts before stallsevents12 merges in progress4200/5000metrics4 merges in progress1820/5000logs2 merges in progress980/5000spans8 merges in progress3100/5000

ClickHouse

ClickHouse en production : surveiller sans devenir vous-même un hot-spot

system.query_log est énorme. system.parts encore plus. Voici ce qu'il faut vraiment scraper et comment monitorer un cluster ClickHouse sans consommer sa moitié de CPU sur des requêtes système.

Published ·11 min read

ClickHouse’s self-monitoring tables are great until your monitoring tool becomes a top-5 query in the cluster you’re trying to monitor. The trick is selective scraping: full granularity on system.merges and system.replication_queue, sampled granularity on system.query_log, and almost nothing on system.parts.

On this page
  1. What to scrape (and what to skip)
  2. Merges & mutations
  3. Replication queue health
  4. Memory and uncompressed cache
  5. Four anomaly patterns
  6. FAQ

What to scrape — and what to skip

  • system.metrics + system.events: cheap, gauges + counters. Scrape every 15s.
  • system.query_log: expensive at high cardinality. Sample 1 in 100, or aggregate via a materialized view.
  • system.merges: read all rows, low frequency. 1Hz is fine.
  • system.parts: huge on big tables. Don’t scrape directly. Pre-aggregate via a query that groups by table and is rate-limited.
  • system.replication_queue: critical. Alert on rows present for > 5 minutes.
  • system.mutations: long-running ALTERs. Alert on is_done = 0 for > 30m.

Merges and mutations: the most-actionable signals

-- How many merges in flight, and how big?
SELECT
    table,
    count() AS active,
    sum(num_parts) AS parts,
    formatReadableSize(sum(total_size_bytes_compressed)) AS size,
    max(elapsed) AS oldest_sec
FROM system.merges
GROUP BY table
ORDER BY active DESC;

Healthy: most merges complete in < 1 minute. Unhealthy: a single merge running for 30+ minutes usually means the part is too large or the host is IO-saturated.

Replication queue health

SELECT
    database, table,
    count() AS pending,
    countIf(num_postponed > 0) AS postponed,
    max(num_tries) AS retries,
    max(create_time) AS oldest
FROM system.replication_queue
GROUP BY database, table
HAVING pending > 0;

Memory and the uncompressed cache

Three memory pools to track:

  • OS RSS — RSS minus mark cache minus uncompressed cache, the real working set.
  • MarkCacheBytes — column index pages. Should be near mark_cache_size.
  • UncompressedCacheBytes — most-skipped because it’s off by default; turn it on for narrow OLAP workloads.

Four anomaly patterns specific to ClickHouse

  • Too many parts: When inserts come faster than merges, parts pile up; INSERT throws “Too many parts” at 300/partition. Alert at 200.
  • Mutation deadlock: ALTER with long predicate runs forever. Visible as a row in system.mutations with high parts_to_do.
  • Replication divergence: absolute_delay climbing on one replica only.
  • Compression ratio drift: Reduced ratio = changing data shape; usually JSON columns or new high-cardinality dimensions.

FAQ

How do I sample query_log without losing the slow ones?+
Materialized view filtering on query_duration_ms > 100 — captures every slow query without sampling, and a 1% sample of fast queries for distributions.
Should I monitor system.parts at all?+
Yes — but pre-aggregated: count and total_bytes by table, not per-row. A direct SELECT * on system.parts of a busy table is itself a 30-second query.
ZooKeeper / ClickHouse Keeper monitoring?+
Track session count, watch count, and request latency. Keeper-specific exporters are more reliable than scraping ClickHouse for ZK status.

Keep reading

· · ·

Surveillez vos bases comme vos services.

Réservez une démo de 30 minutes. Nous spécifions votre flotte ensemble et chiffrons votre premier deal de 30 jours.

ClickHouse en production : surveiller sans devenir vous-même un hot-spot · Obsfly