Crafting sustainable on-call rotations – Increment: On-Call
https://increment.com/on-call/crafting-sustainable-on-call-rotations/
Distributed Systems Observability [Book]
https://www.oreilly.com/library/view/distributed-systems-observability/9781492033431/
Engineering Uber's On-Call Dashboard | Uber Engineering Blog
https://eng.uber.com/on-call-dashboard/
Google - Site Reliability Engineering
https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
Google - Site Reliability Engineering
https://landing.google.com/sre/books/
Implementing Model-Agnosticism in Uber’s Real-Time Anomaly Detection Platform
https://eng.uber.com/anomaly-detection/
Microservices vs The World
https://adamdallis.com/2019/02/09/microservices-vs-the-world/
Observability at Scale: Building Uber's Alerting Ecosystem | Uber Engineering Blog
https://eng.uber.com/observability-at-scale/
Operating a large distributed system in a reliable way: practices I learned | Hacker News
https://news.ycombinator.com/item?id=20462349
Some items from my "reliability list"
https://rachelbythebay.com/w/2019/07/21/reliability/
Some items from my “reliability list” | Hacker News
https://news.ycombinator.com/item?id=20522868
You Are Not Google – Bradfield
https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb