Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Managing data chaos in the world of microservices

Oleksii Kachaiev (Attendify)
3:30pm–4:10pm Thursday, 09/13/2018
Data engineering and architecture
Location: 1A 10 Level: Intermediate
Average rating: ***..
(3.50, 2 ratings)

Who is this presentation for?

  • Software engineers, software architects, and CTOs

Prerequisite knowledge

  • Familiarity with microservices and distributed data challenges
  • A basic understanding of why we need different storage solutions for different needs

What you'll learn

  • Understand why encapsulation of data is a good thing for your services and a bad thing for your data architecture, solutions to the problem of data observability, data fetching from different origins, data versioning and how to approach the problem in your organization.

Description

Microservices is one of the hottest topics in recent years, and the industry is shifting toward splitting applications into smaller and smaller independent units. This is all happening for very good reasons; you can gain a lot both in terms of technologies and organizational scalability. Many infrastructure tools to support the movement have been developed, from schedulers, deploy automation, and services discovery systems to development tools, like distribute tracers, log aggregators, and analyzers, and we’ve invented and reinvented protocols to make microservices communication even more efficient. However, one problem is often overlooked: the data layer is being diluted due to active encapsulation, which is essential for microservices to grow and evolve.

As we move toward more independently encapsulated services, we’re experiencing dramatically increased challenges managing data, including:

  • Observability, knowledge sharing, and data discovery (Who owns that piece of the data? Where can I find that thing?)
  • Querying the data (What API should I expose for others? How can I get this info from that dataset? Should I cache this or re-query when necessary?)
  • Structural and semantic changes in the datasets (Can I add a new field here? Who’s using this record, and how should I update one not breaking any other services?)

These problems are common, but most of our effort and attention is directed at infrastructure, which is easier to find generic solutions for. On the other hand, making sense of the data is hardly a generalizable problem. There have been many attempts to tame the chaos associated with independent dataset management. Alexey Kachayev discusses high-level approaches to build a sharable abstraction layer separating “physical” details from logical concerns as well as specific technologies you can leverage.

The growing complexity of your data layer may overshadow the benefits of microservices architecture you deployed, so the sooner you start working on the solution, the easier it will be to manage the chaos.

Photo of Oleksii Kachaiev

Oleksii Kachaiev

Attendify

Oleksii (Alexey) Kachaiev is the CTO at Attendify, where he spends his days coding in Clojure, Haskell, and Rust. His interests include algebra and protocols. Alexey is the author of the Muse and Fn.py libraries and is an active contributor to Aleph and other open source projects.