Streaming data continuously from Kafka allows users to gain insights faster, but when these pipelines fail, they can leave users panicked about data loss when restarting their application. Jordan Hambleton and Guru Medasani explain how offset management provides users the ability to restore the state of the stream throughout its lifecycle, deal with unexpected failure, and improve accuracy of results.
Jordan and Guru demonstrate how Apache Spark integrates with Apache Kafka for streaming data in a distributed and scalable fashion, covering considerations and approaches for building fault-tolerant streams and detailing a few strategies of offset management to easily recover a stream and prevent data loss.
Topics include:
Jordan Hambleton is a Consulting Manager and Senior Architect at Cloudera, where he partners with customers to build and manage scalable enterprise products on the Hadoop stack. Previously, Jordan was a member of technical staff at NetApp, where he designed and implemented the NRT operational data store that continually manages automated support for all of their customers’ production systems.
Guru Medasani is a Data Science Architect at Domino Data Lab. He helps small and large enterprises in building efficient machine learning pipelines. Previously he was a senior solutions architect at Cloudera, where he helped customers build big data platforms and leverage technologies like Apache Hadoop and Apache spark to solve complex business problems. Some of the business applications he’s worked on include applications for collecting, storing, and processing huge amounts of machine and sensor data, image processing applications on Hadoop, machine learning models to predict consumer demand, and tools to perform advanced analytics on large volumes of data stored in Hadoop.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
Thanks for the note Mauricio. The slides have been uploaded.
Guys, could you post your slides here? tx