Responding to incidents is work. It’s unplanned, sometimes chaotic, and often stressful. It should be getting better, but many organizations find improving difficult and often backslide into bad practices. Teams tackling too many incidents see more burnout and have less time to work on work that impacts the bottom line. Getting better at handling incidents takes practice and resources, changes to culture as well as improvements to tooling. We want to prioritize the most important issues, the problems that impact users, while delegating lower priority issues to automation. In the long term, reducing the number of incidents that responders have to deal with will improve team engagement, reduce burnout, and recapture time to spend on more important tasks. In this talk, we’ll cover a number of methods that will have a positive impact on incident response, from crafting alerts, to writing automation, to setting good practices to prevent frustration among your team.
Mandi Walls is a DevOps Advocate on the Community and Advocacy Team at PagerDuty. Before joining PagerDuty, Mandi spent a number of years at Chef Software, working with customers and community members in the US and Europe. Originally a large-scale systems administrator, Mandi has focused on IT automation; organizational culture and change; and community.