"I Have an SLO. Now What?"

Apr 16, 2021 10:33:11 AM By Taurai Mutimutema

In this talk, Alex Hidalgo explains how to use your SLOs to make your customers, your engineers, and your business happier. You can watch the entire talk here.

Since 2020, a lot of talks have clouded the internet with discussions on how to create service level objectives (SLO) and what not to do. Rather predictably, a lot of blogs spin antiquated advice, like "you should ship features if and only if you have pockets deep enough to rectify errors, else focus on reliability." However, if you already have SLO data, you can convert it into stakeholder joy—and Alex Hidalgo shows you how.

Why We Need SLOs

To set a good foundation for the discussion, Alex explained why we need SLOs in the first place. Starting from scratch, when a new app grows, it can get crowded by elements added out of necessity. It's easy to see how. First, you aim to ace the MVP, but then you realize that the app is growing, so you start measuring logs and metrics all together with tracers.

Properly measuring metrics in such a situation requires that you have a reliability stack. A good one uses service level indicators (SLIs) to keep an eye on the SLOs that customers consider important. All of this determines your error budget. But what do you do after you have SLIs and SLOs?

What You Can Do With SLOs

1. Maintain a Balance Between Features and Reliability Shipments

Thinking back on the error budget versus reliability balance, you can at least approach that ratio with caution. Systems only fail because of change, and shipping reliability features technically counts as change.

2. Establish the Scope of Your Work

Even when you're careful with your changes, you may not always own the entire codebase to your project. This can be due to using open-source code. To mitigate this, make sure you recalculate your SLIs. This gives you more attainable SLO thresholds.

3. Examine Your Risk Factors

SLO data also allows you to check your risk factors as often as you need. You'll quickly see when you're not being reliable and why. These become your risks. Security, deployment frequency (and quality), infrastructure upkeep, and downtime are all potential risks you unearth with each examination. Even when no complaints are coming from the customer, you might find that your entire error budget has been used up. If this happens to you, Alex suggests that you slow down and make time for the little changes that the error budget allows.

4. Experimentation and Chaos Engineering

Experimentation, even to the extent of breaking systems intentionally, gives you new data on your breaking points. SLO data can warn you of these thresholds, but only if you spend time testing. For instance, SLO data can tell you if you have a good error budget and easy rollback. Try new updates and learn your limits.

5. Run Stress Tests

This activity is similar in outcomes item (4) above, except these tests aim to discover very specific breaking points.

load test, stress test, and blackholes Load test, stress test, and blackholes

6. Turn Things Off

You can use SLO data to learn what happens if you turn things off. It's better to turn things off than experience unannounced failures that could frustrate your customers, right? Often you'll discover that some units actually break down because they're dependent on the points you turn off. Then, you'll be able to make them more reliable.

7. Do Nothing!

Yes. You can actually do nothing if you know that users are happy but you're getting signals that you're being unreliable. These numbers can be wrong sometimes. You can choose to do nothing when suggested actions undo the purpose of the entire operation. Say an upgrade is needed, but it will take too long. You can (probably should) do nothing.

8. Make Better Service Reliability Reports

Combing through your SLO data should allow you to explain why patterns are happening. The resulting error budget adjustments are better reflections of each measured variable.

9. Make Better Decisions

This part is crucial. When you have more data, you have more knowledge about how your business works—which means you can have better conversations about your business with the right people. Ultimately, this helps you make better decisions about your technology.

make better decisions Better Conversations = Better Decisions

Improve Your Business With SLOs

SLO data can help you improve nearly every facet of your business. The best part is, you probably already have SLO data available to you—and if you don't, it's easy to get started. What are you waiting for?