At the start of the Crimean War in 1853, Britain’s Royal Navy needed 90 new gunboats ready to fight in the Baltic in just 90 days. Assembling the boats was straightforward. The challenge was to build all of the engine sets in time. Marine engineer John Penn did an unusual thing: he took a pair of reference engines, disassembled them, and distributed the pieces to the best machine shops across Britain. These workshops—latter-day microservices—each built 90 sets of their allocated parts, which were then assembled into the engines for the new gunboats, ready for battle.
This was the nineteenth century. How could the admiralty be certain that the parts from all these independent workshops would come together to form 90 high-powered engines? The answer lay in a crucial piece of standardization: the Whitworth thread, the world’s first national screw thread standard, devised by Sir Joseph Whitworth in 1841. By the time the Royal Navy came knocking, this standard had been adopted by workshops across Britain, so John Penn could be confident that engine parts built by any workshop to the Whitworth standard would fit together.
Snowplow’s Alexander Dean uses the story of the Crimean War gunboats to argue that our data-processing architectures urgently require a standardization of their own, in the form of schema registries. Like the Whitworth screw thread, a schema registry, such as Confluent Schema Registry or Snowplow’s own Iglu, allows enterprises to standardize on a set of business entities which can be used throughout their batch and stream processing architectures. Like the artisanal workshops in 1850s Britain, microservices can work on narrowly defined data processing tasks, confident that their inputs and outputs will be compatible with their peers.
Alexander outlines the rationale for putting a schema registry at the heart of your business, before moving on to the practicalities of an implementation, offering a side-by-side comparison of the available registries, best practices about schema versioning, and strategies around schema federation across different companies, including Snowplow’s own Iglu Central.
Alexander Dean is cofounder and technical lead at Snowplow Analytics, an enterprise-strength open source event analytics platform.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.