Tom Leaman is a senior manager of the Runtime Engineering org at The Vanguard Group where he works with teams to ensure the reliability of their microservice platforms. When he isn’t studying up on how to reinforce technical and human systems he likes to pretend that he can bake and woodwork.
Session: Practice Makes Perfect - Developing Expertise Through Chaos Engineering
Our technical systems are getting more complicated by the day. Whether it’s due to intention or accident this complexity has the same effect on our ability to manage the applications our clients depend on: it gets a lot harder. When the system produces a ‘surprise’ and no longer performs according to assumption effective incident response is critical. Engineers involved must quickly align behind a common goal, communicate efficiently, and predictably coordinate actions to return system behavior to normal. The high level of cohesion necessary to act in this manner doesn’t happen overnight and relying on live-incidents to build this expertise can be painful and costly.
In this talk we’ll cover how teams can prepare themselves for the worst of incidents by covering:
* The critical building blocks of teamwork that are necessary to bring surprises to resolution;
* How to incorporate deliberate practice into the workday to build up incident response muscle memory; and
* The incorporation of Chaos engineering practices such as GameDays to realistically simulate how the team will react to a real surprise.