AI
Datenbank-Kapazitätsprognosen, die 30 Tage im Voraus warnen
Lineare Regression reicht nicht. ARIMA ist Overkill. Prophet funktioniert, wenn man weiß, welche externen Variablen man füttert. Ein praktisches Rezept für Forecasts mit 30 Tagen Vorlauf.
Linear extrapolation pages you 12 hours before the disk fills. Useful for the on-call, useless for provisioning — you can’t resize an EBS volume on Saturday in 12 hours and have it land cleanly. The actually-useful forecast is one that pages 30 days out, with the math right enough to trust. Here’s what works.
On this page
Why linear extrapolation fails
- Seasonality. Disk grows faster on weekdays. A linear fit on 7 days extrapolates the wrong slope.
- Step changes. Last week’s deploy doubled write rate. Linear fit smooths it; the real curve has a kink.
- Non-stationarity. Growth rate itself changes over time (acquisition spike, seasonal product launch).
The model stack that works
Three models, ensemble:
- Prophet for the seasonality + holiday + changepoint backbone. Fast to fit per-series, robust on noisy data.
- ETSformer / N-BEATS for high-cardinality scenarios where you fit thousands of series. Transformers handle long histories better than Prophet.
- Linear baseline as a safety floor. If the ensemble disagrees with linear by > 3×, alert the operator that something needs review.
# Prophet recipe — minimum viable forecast
from prophet import Prophet
m = Prophet(
daily_seasonality=True,
weekly_seasonality=True,
yearly_seasonality='auto',
changepoint_prior_scale=0.05, # tune up for spiky workloads
seasonality_prior_scale=10,
)
m.add_country_holidays(country_name='US')
m.add_regressor('deploy_count') # exogenous
m.fit(history_df)
future = m.make_future_dataframe(periods=30, freq='D')
future['deploy_count'] = predicted_deploys(future)
fcst = m.predict(future)
# Use yhat_lower / yhat_upper bounds, not yhat alone — the band is what alerts.Exogenous variables that move the needle
- Day-of-week / business-day flag — the single biggest accuracy gain.
- Holidays — country-specific. Black Friday, Lunar New Year, regional holidays for B2C.
- Deploy events — regimes change after deploys. Inject as event markers; Prophet handles them as “holidays”.
- Marketing campaign flags — if the team can post events to the metrics pipeline, you get free correlation.
Evaluating forecasts honestly
- Backtest with sliding window. Fit on weeks 1-4, predict week 5, score. Slide.
- Score with MAPE for level metrics (disk, connections), SMAPE for noisier metrics (QPS).
- Track calibration of bands: if your 90% interval contains the actual value 70% of the time, your bands are too narrow.
Alerting on forecast breaches
The alert isn’t “disk is full.” It’s “disk will be full in N days.”
# Pseudo-rule
if forecast.crosses_threshold(metric='disk_used',
threshold=0.85 * disk_total,
within_days=30):
page(severity='warning',
message=f"Disk on {host} forecast to breach 85% in {days} days")Useful: tier severity by lead time. 30-day = warning (planning), 7-day = high (provision now), 24h = critical (page on-call).
FAQ
Why not just use ARIMA?+
How often should I refit?+
Can the operator override the forecast?+
Keep reading
AI
Anomaly detection on database metrics: why thresholds fail and what works
A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.
Postgres
Why your Postgres p99 latency lies — and what to track instead
p99 over 1m windows is the most-displayed and most-misleading number on every DBM dashboard. Here's the histogram math, the seasonality math, and a saner default.