The Etsy organization has grown by a significant amount over the last five years. As a company grows, more thought must be put into the communication techniques that it uses, and how people acquire technical proficiency using them. Mastery manifests in a variety of ways, including understanding how a system fails and recovers, which patterns make it secure, what adds or detracts from the maintainability, debuggability, or performance, best practices for spreading knowledge, and how we learn from failures.
At the organizational level we tend to achieve this primarily through tooling, though also through process and education. This talk will cover several communication techniques that have helped foster a Just Culture, one in which an effort is made to balance both safety and accountability.
The first techniques we will cover are architecture and operability reviews. Each of these has a distinct purpose in their goal. An architecture review is to understand the costs and benefits of a proposed solution, and to discuss alternatives. These exploratory conversations generally discuss technical departures to gain confidence in new and different systems that may be introduced. An operability review is there to ensure that we know when a system is working, and how we will know when it is broken. This will cover the formats we use for both meetings and questions that can be posed.
The second technique is about how we deal with failure. Anyone who has worked with technology at scale is familiar with failure. Even with all the planning and thought that goes into architecture and operability reviews, we still encounter it. By investigating mistakes in a way that focuses on the situational aspects of a failure’s mechanism, and the decision-making process of individuals proximate to the failure, an organization can come out safer than it would normally if it had simply punished the actors involved as a remediation. This is what we call the Blameless PostMortem, and we’ll dive into the structure of how to approach these meetings.
As your organization grows, more thought has to be put into how people communicate. If you start to implement some of these techniques early on, they can assist with technology changes over the years, and introduce processes that deal with failure in mature ways.
John Goulah works in New York City, and has over a decade of experience scaling infrastructure for media- and e-commerce-based platforms. He strives for non-mundane tasks and has automated himself out of his last few endeavors, which has landed him in his current role as a senior engineering manager at Etsy, the leading marketplace for handmade goods. He has been working there for almost five years on developer tools and deployment infrastructure.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org