Obsfly
aws / overviewliveAWS · monitoring · field notes

AWS

RDS Performance Insights: where it stops and what you actually need next

PI is free up to 7 days, ships with every RDS, and surfaces top SQL by wait class. It also stops short on plan history, multi-host correlation, multi-engine fleets, alerting, and AI suggestions. Here's where the line is and what to bolt on.

Published ·11 min read

If you’re on RDS, Performance Insights is right there. Free for 7 days of retention, ships with every RDS instance, one toggle to enable. Every senior DBA you ask will tell you “PI is a starter, you’ll need more.” The advice is correct. This post is about exactly where the line is.

On this page
  1. Seven things PI does well
  2. Seven gaps that bite around month two
  3. Three bolt-on patterns
  4. What RDS Extended Support buys you (and doesn't)
  5. When you've outgrown PI
  6. FAQ

Seven things PI does well

  • Wait-event analysis.The chart-by-wait-class view is genuinely good — it’s what AWS bought from the Oracle ASH playbook and ported across engines.
  • Top SQL by elapsed time / IO / cpu.The SQL digest table answers “what was slow in the last hour” without instrumentation.
  • SQL digest normalization. Same shape as pg_stat_statements / MySQL Performance Schema. You can correlate PI digests with your own queries.
  • 7 days free.The default retention is enough to investigate yesterday’s incident. Free is the right price.
  • CloudWatch integration. DBLoad metrics export to CloudWatch so you can set alarms via the AWS-native stack.
  • Zero-install.No agent, no exporter, no extension activation (for Postgres ≥ 14). One console toggle.
  • Aurora-native. Works identically across Aurora Postgres, Aurora MySQL, and standard RDS engines. One UI.

Seven gaps that bite around month two

GapWhat it means in practiceWhen it hurts
1. No plan historyPI shows the latest plan only. You can’t see when the optimizer chose a different one or correlate plan flips with regressions.Plan-flip incident → 2 hours of bisection instead of 5 minutes.
2. No multi-host correlationEach PI dashboard is per-instance. To see your whole RDS fleet you flip between tabs.Fleet of 20+ DBs becomes unmanageable.
3. Single-engine viewIf your stack is Postgres + MongoDB + Redis you have three different consoles.Most real fleets are polyglot.
4. Threshold-only alerting via CloudWatchDBLoad is one metric. You build the rest in CloudWatch Logs Insights and SNS yourself.Multi-variate anomalies (qps + p99 + lock wait together) miss CloudWatch’s model.
5. No anomaly / forecastPI is reactive. There’s no “hey, this metric is trending toward breach in 9 days” surface.Capacity planning is still spreadsheet work.
6. No AI suggestionsPI shows you the slow query. It doesn’t propose a rewrite or an index.Junior engineers in the on-call rotation can’t self-serve.
7. 7 days retention free, longer is paidLong-term retention (2 yr) is $0.01 / vCPU-hour. For a 20-vCPU fleet that’s ~$1,400 / yr just for storage.Compliance + post-mortem use cases need 12+ mo retention.

Three bolt-on patterns we see work

PI + CloudWatch Logs Insights + Lambda

Use PI for the live console, ship slow-query logs to CloudWatch, run scheduled Lambdas that compute derived metrics and post to SNS. Works. Total build is ~3 engineer-weeks. The downside is everything is on AWS — multi-cloud teams hit a wall.

PI + Grafana + custom exporter

Wire PI’s CloudWatch metrics into Grafana, build per-query dashboards with a custom pg_stat_statements exporter on the side. Common pattern. Now you maintain two systems — Grafana for cross-cutting, PI for the AWS-native deep dive.

PI + commercial DBM

Keep PI for the “what’s burning right now” AWS-native view. Add a DBM tool for plan history, multi-host correlation, multi-engine, anomaly, and AI. That’s the pattern we see on every RDS fleet over ~30 instances.

What RDS Extended Support buys you (and doesn’t)

AWS’s “Performance Insights Premium” tier extends retention to 24 months at higher per-vCPU cost. It does not add plan history, anomaly detection, AI suggestions, or multi-DB fan-out. The premium tier solves problem #7 above and leaves the other six in place.

When you’ve outgrown PI

  • 15+ RDS instances. Per-instance tabs are no longer workable.
  • Multi-engine fleet. Postgres + MySQL + Mongo or similar polyglot.
  • You had a plan-flip incident.You promised the post-mortem this wouldn’t happen again, and PI alone can’t deliver that.
  • Compliance retention. 12+ months on slow-query samples is mandated.
  • Forecast / capacity planning is on the roadmap.PI doesn’t do this; it’s a separate tool either way.

FAQ

Is Performance Insights enough to ditch a DBM tool entirely?+
For a single-DB shop with a Postgres-only fleet under ~10 instances, yes. The moment you add a second engine or a 15-month retention requirement, the gaps become full-time engineering work to backfill.
Does Performance Insights work on Aurora Serverless v2?+
Yes, identically. The PI agent runs in the AWS-managed compute layer.
Can I use Performance Insights with Obsfly?+
Yes — and most of our RDS customers do. They keep PI enabled for the live console and AWS-native view, and use Obsfly for plan history, multi-host, multi-engine, anomaly, and AI rewrite.
What's the real cost difference?+
PI free tier is $0 with 7-day retention. PI Premium is roughly $0.01–$0.03/vCPU-hour for 24-month retention — call it ~$200–$1,500/mo for a typical fleet. Obsfly Team is $39/DB/mo flat; replacing PI Premium on a 30-DB fleet runs $1,170/mo, with all seven gaps closed.

Keep reading

· · ·

Watch your databases the way you watch your services.

Book a 30-minute demo. We'll spec your fleet together and quote your first 30-day deal.