Skip to content

Alerts

Intro

  • When things go wrong, you need to know about it
  • set up a rotation amongst all your developers

Why This Matters

  • Things will go wrong. You need to know when/where/how they do.

Our Recommendation

  • FireHydrant/PagerDuty/etc.
  • Developers should be on call for the services they develop
  • SREs / Operations personnel should also be in the rotation

How To Do It

  • Thresholds & triggers on observability data/metrics
  • Weekly on call rotations
  • Consistent retrospectives on every alert

Alternatives/Notes

  • Wait for people to notice

Dependencies

Dependents

Production Infrastructure Guide