Streaming data systems, so called fast data, promise accelerated access to information, leading to new innovations and competitive advantages. But they aren’t just faster versions of big data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. Dean Wampler outlines what you need to know to exploit fast data successfully.
Big data started with an emphasis on batch-oriented architectures, where data is captured in large, scalable stores and then processed using batch jobs. To reduce the gap between data arrival and information extraction, these architectures are now evolving to be stream oriented, where data is processed as it arrives. While a new buzzword, fast data is also a new opportunity for innovation in how your organization leverages data.
However, fast data architectures introduce new challenges for your organization. Whereas a batch job might run for hours, a stream processing application might run for weeks or months. This raises the bar for making these systems resilient against traffic spikes, hardware and network failures, and so forth. The microservice world has faced these challenge for a while. Your data teams will likely need to evolve to resemble the teams you already have for your microservices-based systems. In fact, you’ll probably merge these teams over time, as your microservices do more data processing and your data systems leverage your microservices.
Dean Wampler is an expert in streaming data systems, focusing on applications of machine learning and artificial intelligence (ML/AI). He’s head of developer relations at Anyscale, which is developing Ray for distributed Python, primarily for ML/AI. Previously, he was an engineering VP at Lightbend, where he led the development of Lightbend CloudFlow, an integrated system for building and running streaming data applications with Akka Streams, Apache Spark, Apache Flink, and Apache Kafka. Dean is the author of Fast Data Architectures for Streaming Applications, Programming Scala, and Functional Programming for Java Developers, and he’s the coauthor of Programming Hive, all from O’Reilly. He’s a contributor to several open source projects. A frequent conference speaker and tutorial teacher, he’s also the co-organizer of several conferences around the world and several user groups in Chicago. He earned his PhD in physics from the University of Washington.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com