Session Name: Continuous Resilience: Extending DevOps Practices for Systems Resilience
Continuous resilience is the dream: ensuring our complex sociotechnical systems can prepare form, recover from, and adapt to failure gracefully on an ongoing basis – whether infrastructure outages, misconfigurations, or the omnipresent scourge of attackers. But many software engineering teams struggle to chart a new course to nurture resilience. Must we change all our practices? Do we have the budget? How do we even start? This talk bestows a beacon of hope by presenting multiple opportunities for software engineers to extend existing practices towards Continuous Resilience. We’ll start by delving into the value of automation like IaC, how CI/CD forms a critical inner loop of a larger feedback cycle, and how modularity proffers adaptive capacity during a crisis. Then, we’ll explore how we can align our assumptions about our systems with reality, especially as they evolve over time. We’ll cover how to model adverse scenarios with decision trees and conduct continuous resilience stress tests (i.e. chaos experiments) to generate real world evidence of system behavior during adversity. By the end of the talk, you’ll understand how to apply DevOps practices to sustain resilience in sociotechnical systems. Many of the practices you already know – like IaC, CI/CD, and modular design – have resilience benefits that might surprise (and delight!) you, and new practices – like decision trees and resilience stress tests – offer foundational first steps in your resilience quest.
Kelly Shortridge is a Senior Principal Engineer at Fastly and lead author of Security Chaos Engineering: Sustaining Resilience in Software and Systems (O'Reilly Media). Shortridge is best known as an expert on resilience in complex systems, applying behavioral economics to cybersecurity, and bringing security out of the dark ages. Shortridge is a frequent keynote speaker, advisor, and author and serves on the editorial board of ACM Queue magazine.