Session Name: Site Reliability Engineering: Anti-patterns in Everyday Life and What They Teach Us
Real world experience and things that go wrong are two of life’s best teachers. This talk will explore key elements of scalable large-system design and Site Reliability Engineering (SRE) principles* through anti-patterns encountered in real life. Find out what lessons can be gleaned from watching the dynamics in a crowded cafe or dealing with a security issue during a hotel stay. Learn about fundamental site reliability engineering principles and practices including:
-Avoiding cascading failures -Not feeding the machines with human toil -Writing blameless postmortems -Engineering solutions to eliminate classes of errors rather than implementing point fixes
These principles will be framed through a lens of the suboptimal while demonstrating the impact of SRE anti-patterns on user trust.
* SRE is often thought of as a specific implementation of the DevOps interface.
Jennifer Petoff is a Senior Program Manager for Google's Site Reliability Engineering team based in Dublin, Ireland. She is the global lead for Google’s SRE EDU program and is one of the co-editors of the best-selling book, Site Reliability Engineering: How Google Runs Production Systems.