What are the essential components of a data platform? John Akred, Mauricio Vacas, and Stephen O’Sullivan explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.
By tracing the flow of data from source to output, John, Mauricio, and Stephen explore the options and considerations for components, including acquisition from internal and external data sources; ingestion (offline and real-time processing); storage; analytics (batch and interactive); and providing data services (exposing data to applications). They’ll also give advice on tool selection, the function of the major Hadoop components and other big data technologies such as Spark and Kafka, and integration with legacy systems.
With over 15 years in advanced analytical applications and architecture, John Akred is dedicated to helping organizations become more data driven. As CTO of Silicon Valley Data Science, John combines deep expertise in analytics and data science with business acumen and dynamic engineering leadership.
A leading expert on big data architecture and Hadoop, Stephen O’Sullivan has 20 years of experience creating scalable, high-availability data and applications solutions. A veteran of @WalmartLabs, Sun, and Yahoo, Stephen leads data architecture and infrastructure at Silicon Valley Data Science.
Mauricio Vacas is a data engineer at Silicon Valley Data Science, where he has developed in multiple areas of the data platform from ingestion to analysis and visualization. Previously, Mauricio was a tech arch manager with 5+ years of experience working in Accenture’s R&D group and its big data practice, where he managed the development of a cloud-based data platform used by Accenture’s data science teams to create analytic models for multiple customer projects. Mauricio is passionate about technology and its ability to make a difference in people’s lives.
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.