Obsfly
clickhouse / system.parts · merges_in_progressliveparts approaching parts_to_throw_insert (5000) · alerts before stallsevents12 merges in progress4200/5000metrics4 merges in progress1820/5000logs2 merges in progress980/5000spans8 merges in progress3100/5000

ClickHouse

ClickHouse en producción: monitoriza sin convertirte tú en el hot-spot

system.query_log es enorme. system.parts aún más. Aquí va lo que realmente debes extraer y cómo monitorizar un clúster ClickHouse sin gastar la mitad de su CPU en consultas de sistema.

Published ·11 min read

ClickHouse’s self-monitoring tables are great until your monitoring tool becomes a top-5 query in the cluster you’re trying to monitor. The trick is selective scraping: full granularity on system.merges and system.replication_queue, sampled granularity on system.query_log, and almost nothing on system.parts.

On this page
  1. What to scrape (and what to skip)
  2. Merges & mutations
  3. Replication queue health
  4. Memory and uncompressed cache
  5. Four anomaly patterns
  6. FAQ

What to scrape — and what to skip

  • system.metrics + system.events: cheap, gauges + counters. Scrape every 15s.
  • system.query_log: expensive at high cardinality. Sample 1 in 100, or aggregate via a materialized view.
  • system.merges: read all rows, low frequency. 1Hz is fine.
  • system.parts: huge on big tables. Don’t scrape directly. Pre-aggregate via a query that groups by table and is rate-limited.
  • system.replication_queue: critical. Alert on rows present for > 5 minutes.
  • system.mutations: long-running ALTERs. Alert on is_done = 0 for > 30m.

Merges and mutations: the most-actionable signals

-- How many merges in flight, and how big?
SELECT
    table,
    count() AS active,
    sum(num_parts) AS parts,
    formatReadableSize(sum(total_size_bytes_compressed)) AS size,
    max(elapsed) AS oldest_sec
FROM system.merges
GROUP BY table
ORDER BY active DESC;

Healthy: most merges complete in < 1 minute. Unhealthy: a single merge running for 30+ minutes usually means the part is too large or the host is IO-saturated.

Replication queue health

SELECT
    database, table,
    count() AS pending,
    countIf(num_postponed > 0) AS postponed,
    max(num_tries) AS retries,
    max(create_time) AS oldest
FROM system.replication_queue
GROUP BY database, table
HAVING pending > 0;

Memory and the uncompressed cache

Three memory pools to track:

  • OS RSS — RSS minus mark cache minus uncompressed cache, the real working set.
  • MarkCacheBytes — column index pages. Should be near mark_cache_size.
  • UncompressedCacheBytes — most-skipped because it’s off by default; turn it on for narrow OLAP workloads.

Four anomaly patterns specific to ClickHouse

  • Too many parts: When inserts come faster than merges, parts pile up; INSERT throws “Too many parts” at 300/partition. Alert at 200.
  • Mutation deadlock: ALTER with long predicate runs forever. Visible as a row in system.mutations with high parts_to_do.
  • Replication divergence: absolute_delay climbing on one replica only.
  • Compression ratio drift: Reduced ratio = changing data shape; usually JSON columns or new high-cardinality dimensions.

FAQ

How do I sample query_log without losing the slow ones?+
Materialized view filtering on query_duration_ms > 100 — captures every slow query without sampling, and a 1% sample of fast queries for distributions.
Should I monitor system.parts at all?+
Yes — but pre-aggregated: count and total_bytes by table, not per-row. A direct SELECT * on system.parts of a busy table is itself a 30-second query.
ZooKeeper / ClickHouse Keeper monitoring?+
Track session count, watch count, and request latency. Keeper-specific exporters are more reliable than scraping ClickHouse for ZK status.

Keep reading

· · ·

Vigila tus bases como vigilas tus servicios.

Reserva una demo de 30 minutos. Especificamos tu flota juntos y cotizamos tu primer trato de 30 días.

ClickHouse en producción: monitoriza sin convertirte tú en el hot-spot · Obsfly