Alerts
Intro
- When things go wrong, you need to know about it
- set up a rotation amongst all your developers
Why This Matters
- Things will go wrong. You need to know when/where/how they do.
Our Recommendation
- FireHydrant/PagerDuty/etc.
- Developers should be on call for the services they develop
- SREs / Operations personnel should also be in the rotation
How To Do It
- Thresholds & triggers on observability data/metrics
- Weekly on call rotations
- Consistent retrospectives on every alert
Alternatives/Notes
- Wait for people to notice
