AWS
RDS Performance Insights: where it stops and what you actually need next
PI is free up to 7 days, ships with every RDS, and surfaces top SQL by wait class. It also stops short on plan history, multi-host correlation, multi-engine fleets, alerting, and AI suggestions. Here's where the line is and what to bolt on.
If you’re on RDS, Performance Insights is right there. Free for 7 days of retention, ships with every RDS instance, one toggle to enable. Every senior DBA you ask will tell you “PI is a starter, you’ll need more.” The advice is correct. This post is about exactly where the line is.
On this page
Seven things PI does well
- Wait-event analysis.The chart-by-wait-class view is genuinely good — it’s what AWS bought from the Oracle ASH playbook and ported across engines.
- Top SQL by elapsed time / IO / cpu.The SQL digest table answers “what was slow in the last hour” without instrumentation.
- SQL digest normalization. Same shape as pg_stat_statements / MySQL Performance Schema. You can correlate PI digests with your own queries.
- 7 days free.The default retention is enough to investigate yesterday’s incident. Free is the right price.
- CloudWatch integration. DBLoad metrics export to CloudWatch so you can set alarms via the AWS-native stack.
- Zero-install.No agent, no exporter, no extension activation (for Postgres ≥ 14). One console toggle.
- Aurora-native. Works identically across Aurora Postgres, Aurora MySQL, and standard RDS engines. One UI.
Seven gaps that bite around month two
| Gap | What it means in practice | When it hurts |
|---|---|---|
| 1. No plan history | PI shows the latest plan only. You can’t see when the optimizer chose a different one or correlate plan flips with regressions. | Plan-flip incident → 2 hours of bisection instead of 5 minutes. |
| 2. No multi-host correlation | Each PI dashboard is per-instance. To see your whole RDS fleet you flip between tabs. | Fleet of 20+ DBs becomes unmanageable. |
| 3. Single-engine view | If your stack is Postgres + MongoDB + Redis you have three different consoles. | Most real fleets are polyglot. |
| 4. Threshold-only alerting via CloudWatch | DBLoad is one metric. You build the rest in CloudWatch Logs Insights and SNS yourself. | Multi-variate anomalies (qps + p99 + lock wait together) miss CloudWatch’s model. |
| 5. No anomaly / forecast | PI is reactive. There’s no “hey, this metric is trending toward breach in 9 days” surface. | Capacity planning is still spreadsheet work. |
| 6. No AI suggestions | PI shows you the slow query. It doesn’t propose a rewrite or an index. | Junior engineers in the on-call rotation can’t self-serve. |
| 7. 7 days retention free, longer is paid | Long-term retention (2 yr) is $0.01 / vCPU-hour. For a 20-vCPU fleet that’s ~$1,400 / yr just for storage. | Compliance + post-mortem use cases need 12+ mo retention. |
Three bolt-on patterns we see work
PI + CloudWatch Logs Insights + Lambda
Use PI for the live console, ship slow-query logs to CloudWatch, run scheduled Lambdas that compute derived metrics and post to SNS. Works. Total build is ~3 engineer-weeks. The downside is everything is on AWS — multi-cloud teams hit a wall.
PI + Grafana + custom exporter
Wire PI’s CloudWatch metrics into Grafana, build per-query dashboards with a custom pg_stat_statements exporter on the side. Common pattern. Now you maintain two systems — Grafana for cross-cutting, PI for the AWS-native deep dive.
PI + commercial DBM
Keep PI for the “what’s burning right now” AWS-native view. Add a DBM tool for plan history, multi-host correlation, multi-engine, anomaly, and AI. That’s the pattern we see on every RDS fleet over ~30 instances.
What RDS Extended Support buys you (and doesn’t)
AWS’s “Performance Insights Premium” tier extends retention to 24 months at higher per-vCPU cost. It does not add plan history, anomaly detection, AI suggestions, or multi-DB fan-out. The premium tier solves problem #7 above and leaves the other six in place.
When you’ve outgrown PI
- 15+ RDS instances. Per-instance tabs are no longer workable.
- Multi-engine fleet. Postgres + MySQL + Mongo or similar polyglot.
- You had a plan-flip incident.You promised the post-mortem this wouldn’t happen again, and PI alone can’t deliver that.
- Compliance retention. 12+ months on slow-query samples is mandated.
- Forecast / capacity planning is on the roadmap.PI doesn’t do this; it’s a separate tool either way.
FAQ
Is Performance Insights enough to ditch a DBM tool entirely?+
Does Performance Insights work on Aurora Serverless v2?+
Can I use Performance Insights with Obsfly?+
What's the real cost difference?+
Keep reading
Postgres
pg_stat_statements: the complete 2026 guide
Every column, every gotcha, the queries you should run today, and why pg_stat_statements is still the most useful 80 lines of telemetry in Postgres — even with five new alternatives in 2026.
MySQL
MySQL Performance Schema vs sys schema: a 2026 monitoring guide
Performance Schema is unreadable. sys schema is friendly but lossy. Here's exactly which to use for which production question, with the eight queries every MySQL DBA should know by heart.
BYOC
Why regulated SaaS can't use Datadog DBM — and the BYOC fix
Walking through the architecture of a BYOC observability deployment: where data lives, what crosses the boundary, and how to satisfy SOC2 / HIPAA / GDPR without giving up the UX.