In operations, we often find ourselves dominated by the urgent. The site is down *right now*! All hands on deck! Much has been said about the dangers of pager fatigue, toil and urgent tactical work. We in Site Reliability Engineering pride ourselves in being aware of this, being proactive and not reactive. It turns out, though, that this proactivity has limits. In this talk, we would like to tell a story about the far end of the spectrum; work that is critically important but has a long time horizon. Most organizations are not set up to handle this well, SRE included.
Tony is a Staff Software Engineer at Google. To date, he has focused mainly on building reliable, large-scale distributed systems, but is slowly starting to understand that scaling people and projects is the greater challenge.