Testing in production has gotten a bad rap. People act like testing in production implies you aren’t doing due diligence with your tests before production. But it’s more like a fact of life: you can only catch the easy bugs in staging—the known-unknowns, the things you predicted would fail, and the things that have failed before. Which isn’t nothing, but it’s no better than running tests on your laptop.
Most interesting problems are only going to manifest under real workloads, on real data, with real users doing unpredictable things under real concurrency and resource pressure. So you should use much fewer of your scarce engineering cycles poring over staging and much more of them building guard rails for prod. Production—where your customers live—is the only environment that matters. Time spent interacting with nonprod systems is wasted time. Replicas are not valuable for helping build your instincts, your skill set, your intuition. Secondary environments actually train you to expect faulty assumptions and to take dangerous shortcuts and run terrifying commands. You should force people to develop and test on production as much as possible and interact with production every day.
Charity Majors dives into tooling and shares ways to harden production and make it safe for engineers to do their work directly on it—from deploys and canarying to feature flags, instrumentation and observability, human practices and workflows, and much more.
Charity Majors is the cofounder and CTO of Honeycomb, a new startup focused on mining machine data. Previously, Charity ran infrastructure at Parse and was an engineering manager at Facebook. She also worked with the RocksDB team to build and deploy the world’s first Mongo + Rocks in production. Charity likes single malt scotch.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org