Rapid advancements are causing a dramatic evolution in both the storage and processing capabilities in the open source big data software ecosystem. These advancements include projects like:
These storage and processing systems provide a powerful platform to implement data processing applications on batch and streaming data. While these advancements are exciting, they also add a new array of tools that architects and developers need to understand when architecting modern data processing solutions.
Using Customer 360 and the IoT as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging components to reliably integrate multiple data sources, perform real-time and batch data processing, reliably store massive volumes of data, and efficiently query and process large datasets. Along the way, they discuss considerations and best practices for utilizing these components to implement solutions, cover common challenges and how to address them, and provide practical advice for building your own modern, real-time data architectures.
Ted Malaska is a director of enterprise architecture at Capital One. Previously, he was the director of engineering in the Global Insight Department at Blizzard; principal solutions architect at Cloudera, helping clients find success with the Hadoop ecosystem; and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.
Jonathan Seidman is a software engineer on the cloud team at Cloudera. Previously, he was a lead engineer on the big data team at Orbitz, helping to build out the Hadoop clusters supporting the data storage and analysis needs of one of the most heavily trafficked sites on the internet. Jonathan is a cofounder of the Chicago Hadoop User Group and the Chicago Big Data Meetup and a frequent speaker on Hadoop and big data at industry conferences such as Hadoop World, Strata, and OSCON. Jonathan is the coauthor of Hadoop Application Architectures from O’Reilly.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org