Real-time event stream processing is increasingly becoming a staple of data-driven decision making in many organizations. However, providing such data processing capability is not without challenges. Real-time decision-making support systems must be able to handle high event traffic volume and execute sophisticated analyses and must be highly available and scalable with high data consistency and integrity guarantees. However, satisfying all these requirements simultaneously is difficult.
Yu-Xi Lim and Michal Wegrzyn outline a high-throughput distributed software pattern capable of processing event streams in real time. It can easily define an event flow (i.e., a stream-in/stream-out transformer) and shard the state, pass state fragments to independent state machines which would process independent stream shards, and then splice resulting state fragments into a global state if necessary. With proper orchestration, this framework becomes capable of performing asynchronous checkpointing and seamless recovery of individually failed instances, leading to a highly reliable, highly available framework.
Yu-Xi and Michal illustrate the framework via a case study of Teralytics, a company that analyzes cellular location data to derive transportation patterns and compute transportation-related metrics. Yu-Xi and Michal demonstrate how the algorithm for computing one such metric can be formulated in the framework and how the resulting component reflects the traits given above.
Yu-Xi Lim is lead data scientist at Teralytics, where he leads the technical team in the company’s Singapore office. Yu-Xi is interested in applying data science to retail and travel. Previously, he led teams at Southeast Asian ecommerce giant Lazada and at TravelShark; was vice president of engineering at payment startup Fastacash; and was a software engineer in Microsoft’s Windows Division. Yu-Xi holds a PhD in electrical and computer engineering from Georgia Tech, where he did research on WiFi positioning systems.
Michal Wegrzyn is a software engineer at Teralytics, where he is responsible for developing batch and streaming big data analytics applications. He previously worked with CellVision and Orange Poland where he built and delivered telco analytics applications. He holds a master’s degree in electronics and telecommunications from AGH University of Science and Technology.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com