Session Name: Managing Systems in an Age of Dynamic Complexity
Why is it that a single server can often have better uptime than a public cloud service?
We used to manage systems. Instead, many of us now write and run dynamic control planes: the systems that run our user-facing systems. We find the dynamic control plane pattern in software-defined networking, in service meshes, in some load balancers, and in job orchestration systems.
This talk looks at the common architectural shapes of dynamic control planes, and some examples of how they fail spectacularly—many major cloud outages are caused by dynamic control plane issues. Why are dynamic control planes so hard to run, and what can we do about it?
Laura Nolan's background is in Site Reliability Engineering, software engineering, distributed systems, and computer science. She wrote the 'Managing Critical State' chapter in the O'Reilly 'Site Reliability Engineering' book, as well as contributing to the more recent 'Seeking SRE'. She is a member of the USENIX SREcon steering committee.