The tutorial will include a live demo of the full project on Cloudera's QuickStart VM. The code for the demo is available on GitHub. Download it here to follow along.
Implementing a scalable, low-latency architecture requires understanding a broad range of frameworks, such as Kafka, HBase, HDFS, Flume, Spark, Spark Streaming, and Impala, among many others. The good news is that there’s an abundance of resources—books, websites, conferences, etc.—for gaining a deep understanding of these related projects. The bad news is there’s still a scarcity of information on how to integrate these components to implement complete solutions.
Ted Malaska walks you through building a fraud-detection system, using an end-to-end case study to provide a concrete example of how to architect and implement real-time systems via Apache Hadoop components like Kafka, HBase, Impala, and Spark. Along the way, Ted covers best practices and considerations for architecting real-time applications to give developers, architects, or project leads who are already knowledgeable about Hadoop or similar distributed data processing systems more insight into how they can be leveraged to implement real-world applications.
Ted Malaska is a director of enterprise architecture at Capital One. Previously, he was the director of engineering in the Global Insight Department at Blizzard; principal solutions architect at Cloudera, helping clients find success with the Hadoop ecosystem; and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com