Chris Jones is a Privacy Engineer. From 2007 to 2017, he was a Site Reliability Engineer at Google. Among other projects, he was tech lead and an editor for Google's book, "Site Reliability Engineering" (O'Reilly, 2016).
Session: SLOs and Error Budgets
100% is almost never the right reliability target for a service, and service level agreements (SLAs) aren't the right tool for SREs to manage a service. These two (apparent) heresies are fundamental to how Google SRE thinks about running large-scale distributed computing services: we set service level objectives (SLOs) expressing how reliable a service needs to be and manage our service to maximize product development and feature velocity within the agreed "error budget. We'll discuss the differences between indicators, objectives, and agreements; error budgets in practice; and how this brings product managers, product developers, and SREs together in a spirit of peaceful coexistence and cooperation.