Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

What Crimean War gunboats teach us about the need for schema registries

Alexander Dean (Snowplow Analytics Ltd)
5:25pm–6:05pm Wednesday, 09/28/2016
Data-driven business
Location: 1 E 15/1 E 16 Level: Intermediate
Average rating: ****.
(4.00, 1 rating)

Prerequisite knowledge

  • Basic experience with representing data using schemas in a format such as JSON/JSON Schema, Avro, Thrift, or Protocol Buffers
  • Familiarity with the general concepts of data processing (no experience in a specific technology such as Hadoop, Spark, or Storm required)
  • High-level understanding of the difference between a monolithic and a microservices-based software architecture
  • What you'll learn

  • Discover why every organization should implement a schema registry
  • Understand how the schemas in this registry can provide a common language for all data processing throughout your organization, allowing you to assemble your data pipeline from many smaller microservices
  • Description

    At the start of the Crimean War in 1853, Britain’s Royal Navy needed 90 new gunboats ready to fight in the Baltic in just 90 days. Assembling the boats was straightforward. The challenge was to build all of the engine sets in time. Marine engineer John Penn did an unusual thing: he took a pair of reference engines, disassembled them, and distributed the pieces to the best machine shops across Britain. These workshops—latter-day microservices—each built 90 sets of their allocated parts, which were then assembled into the engines for the new gunboats, ready for battle.

    This was the nineteenth century. How could the admiralty be certain that the parts from all these independent workshops would come together to form 90 high-powered engines? The answer lay in a crucial piece of standardization: the Whitworth thread, the world’s first national screw thread standard, devised by Sir Joseph Whitworth in 1841. By the time the Royal Navy came knocking, this standard had been adopted by workshops across Britain, so John Penn could be confident that engine parts built by any workshop to the Whitworth standard would fit together.

    Snowplow’s Alexander Dean uses the story of the Crimean War gunboats to argue that our data-processing architectures urgently require a standardization of their own, in the form of schema registries. Like the Whitworth screw thread, a schema registry, such as Confluent Schema Registry or Snowplow’s own Iglu, allows enterprises to standardize on a set of business entities which can be used throughout their batch and stream processing architectures. Like the artisanal workshops in 1850s Britain, microservices can work on narrowly defined data processing tasks, confident that their inputs and outputs will be compatible with their peers.

    Alexander outlines the rationale for putting a schema registry at the heart of your business, before moving on to the practicalities of an implementation, offering a side-by-side comparison of the available registries, best practices about schema versioning, and strategies around schema federation across different companies, including Snowplow’s own Iglu Central.

    Photo of Alexander Dean

    Alexander Dean

    Snowplow Analytics Ltd

    Alexander Dean is cofounder and technical lead at Snowplow Analytics, an enterprise-strength open source event analytics platform.