Testing in production has gotten a bad rap. People act like testing in production implies you aren’t doing due diligence with your tests before production. But it’s more like a fact of life: you can only catch the easy bugs in staging—the known-unknowns, the things you predicted would fail, and the things that have failed before. Which isn’t nothing, but it’s no better than running tests on your laptop.
Most interesting problems are only going to manifest under real workloads, on real data, with real users doing unpredictable things under real concurrency and resource pressure. So you should use much fewer of your scarce engineering cycles poring over staging and much more of them building guard rails for prod. Production—where your customers live—is the only environment that matters. Time spent interacting with nonprod systems is wasted time. Replicas are not valuable for helping build your instincts, your skill set, your intuition. Secondary environments actually train you to expect faulty assumptions and to take dangerous shortcuts and run terrifying commands. You should force people to develop and test on production as much as possible and interact with production every day.
Charity Majors dives into tooling and shares ways to harden production and make it safe for engineers to do their work directly on it—from deploys and canarying to feature flags, instrumentation and observability, human practices and workflows, and much more.
Charity Majors is the cofounder and CTO of Honeycomb, a startup that provides the first and (thus far) only observability solution for modern systems. She’s the coauthor of Database Reliability Engineering (the unicorn book) and previously worked at Parse, Facebook, and Linden Lab. She tests in prod.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com