For this tutorial, we'll be using a virtual machine (VM) with Kafka installed on it. You must install VirtualBox 5.0.10 or above BEFORE you arrive onsite. Your computer must have a 64-bit processor and at least 3 GB of free RAM. Some computers' BIOS have VT-X turned off by default. You need to enable this setting if the VM boot fails.
After you have downloaded the VM, you need to decompress it. Once the VM is decompressed, import the VM into VirtualBox.Participants should have a basic familiarity with Java or Python and be comfortable with Linux-based tools.
Learn about stream data and Apache Kafka from several core Kafka contributors. During this three-hour tutorial, Ewen Cheslack-Postava, Joseph Adler, Jesse Anderson, and Ian Wrigley explain what Kafka is, demonstrate how it works, and explore using Kafka to build modern data applications. Ewen, Joseph, Jesse, and Ian also discuss key architectural concepts and developer APIs as they guide participants through hands-on labs to build an application that can to publish data to Kafka and subscribe to receive data from Kafka.
This tutorial is ideal for application developers, extraction-transformation-load (ETL) developers, or data scientists who need to interact with Kafka clusters as a source of, or destination for, stream data.
Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. Jesse trains employees on big data—including cutting-edge technology like Apache Kafka, Apache Hadoop, and Apache Spark. He has taught thousands of students at companies ranging from startups to Fortune 100 companies the skills to become data engineers. He is widely regarded as an expert in the field and recognized for his novel teaching practices. Jesse is published by O’Reilly and Pragmatic Programmers and has been covered in such prestigious media outlets as the Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired. You can learn more about Jesse at Jesse-Anderson.com.
Ewen Cheslack-Postava is an engineer at Confluent building a stream data platform based on Apache Kafka to help organizations reliably and robustly capture and leverage all their real-time data. Ewen received his PhD from Stanford University, where he developed Sirikata, an open source system for massive virtual environments. His dissertation defined a novel type of spatial query giving significantly improved visual fidelity and described a system for efficiently processing these queries at scale.
Joseph Adler has many years of experience in data mining and data analysis at companies including DoubleClick, Verisign, and LinkedIn. Currently, he is director of product management and data science at Confluent. He is the holder of several patents for computer security and cryptography and the author of Baseball Hacks and R in a Nutshell. He graduated from MIT with a BSc and MEng in computer science and electrical engineering.
Ian Wrigley is a Technical Director at StreamSets, the company behind the industry’s first data operations platform. Over his 25-year career, Ian has taught tens of thousands of students subjects ranging from C programming to Hadoop development and administration.
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.