<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1919858758278392&amp;ev=PageView&amp;noscript=1">

Reducing Trauma in Production with SLOs and Chaos Engineering

Oct 28, 2021 9:11:58 AM By All Day DevOps

Customer experience is a shared responsibility that falls on the shoulders of the entire organization. And hence, while many organizations leave reliability up to the site reliability engineering (SRE) team, reliability needs to be built in from the very beginning and goes beyond just the SRE team, according to Mandi Walls and Julie Gunderson.

Mandi Walls is a DevOps Advocate on the Community and Advocacy Team at PagerDuty. Before joining PagerDuty, Mandi spent several years at Chef Software, working with customers and community members in the US and Europe. Originally a large-scale systems administrator, she’s also worked on operations at AOL and MovieFone.

Julie Gunderson is a Sr. Reliability Advocate at Gremlin, where she works to further the adoption of Chaos Engineering principles and methodologies. Over the last seven years, Julie has been actively involved in the DevOps space. She is passionate about helping individuals, teams, and organizations understand how to leverage best practices and develop amazing cultures.

Together Mandi and Julie will be speaking at this year's All Day DevOps (ADDO), the world's largest DevOps conference, which will be streaming live for 24 hours starting at 3 a.m. ET on October 28, 2021.

What are SLOs, and why are they important?

Mandi and Julie will discuss Service Levels Objectives (SLO)s, why they are important to the organization, and how to define and set them.

Going beyond SLOs, attendees will learn what Chaos Engineering is and find practical ways to ensure compliance and resilience with best practices. The duo will show you how to focus your goals and error budgets with examples that lead to reliability and improved user experience.

They’ll also discuss improving overall production reliability through SLIs and SLOs, combined with qualitative measurement analysis and Chaos Engineering to understand production sites, prioritize how people improve things, and make them more reliable.

“It is important to focus on what customers see and prefer, as well as their tolerance for bad experiences and for slowdowns versus simply using random measurement,” says Mandi Wall. “It is important to truly know and understand what customers are really after and really like.”

It seems like a given, but it is critical to understand how to make experiences more efficient for internal users and customers to improve their products and services.

“With a shift for so many verticals to digital, there is a need for all interactions to be interactive and pleasant -- and one of our internal goals should be to hope that the customer ultimately gets the most out of the product,” says Walls.

DevOps Means Continuous Improvement

For the more seasoned folks in the DevOps space, this presentation will focus on constantly evaluating what is important to their users. It will look at behaviors, tolerances, and constant evaluation as part of internal goal-setting.

And for those newer to the DevOps practice, you’ll learn how to improve over time. The biggest lesson? You’re never really done. Continuous improvement and Agile planning are fundamental to development. In today’s world, you never really get to the point where you put something into the world and you’re done with it. Instead, developers need to think of the process as a continuous cycle of refreshing and revising rather than a plot to the end of the road.

Thinking of leveraging SLO and Chaos Engineering?

Wondering what it would look like if your data layer or microservice went offline? How would you protect against outages or slowdowns so that users don’t pick up and move somewhere else and you don’t lose their business? The ideal attendee for this talk at All Day DevOps is someone within a larger enterprise organization whose basic metrics are CPU utilization or memory - which don’t necessarily tell you how an application is used. If you are thinking about what types of tools are out there and what is next, Walls and Gunderon want you to know that the tools are out there like chaos engineering.

Mandi Walls and Julie Gunderson will also discuss the importance of trial and error. She will also stress that it is perfectly fine to practice and do testing in production because you want to understand how users might behave later on, and this is the time to learn from your mistakes.

The virtual event gathers more than 25,000 DevOps professionals for free, hands-on education from 180+ speakers, along with peer-to-peer insights and networking with professionals worldwide. View the speaker line up, including six different tracks, and register to attend October 28.