Building, running, and governing a data lake and production data applications on Hadoop is often a difficult process filled with slow development cycles and painful operations. Not only are traditional development tools and techniques missing from the Hadoop ecosystem, but mastering data ingestion and data integration, as well as enterprise governance and security, has become a formidable challenge when building big data solutions. The challenge only increases as the Hadoop ecosystem continues to grow, use cases mature, SLAs intensify, and services become customer facing and revenue generating. And while the IT organization owns the task of mitigating these issues, more importantly, it also has an opportunity to enable the business to reduce time to insights and make better decisions faster by providing them with a modern self-service environment for their data.
Jonathan Gray proposes a modern, unified integration architecture that helps IT mitigate these issues while enabling businesses to reduce time to insights and make decisions faster through a modern self-service environment. Drawing on his experiences as an early committer on Apache HBase, building real-time systems on Hadoop at Facebook, and working with customers at Cask, Jonathan explores the benefits of the Cask Data Application Platform (CDAP), the first unified integration platform for big data and the IoT. CDAP ensures data and process consistency between applications and underlying infrastructure technologies, across multiple environments, and between different parts of the IT organization and provides a single environment for design, operations, data science, and governance for data lakes, data applications, and the IoT. Jonathan discusses the requirements for building and running modern production applications on Hadoop and outlines the architecture CDAP offers to address the challenges in the context of common use cases.
Jonathan Gray is the founder and CEO of Cask. Jonathan is an entrepreneur and software engineer with a background in startups, open source, and all things data. Previously, he was a software engineer at Facebook, where he helped drive HBase engineering efforts, including Facebook Messages and several other large-scale projects, from inception to production. An open source evangelist, Jonathan was responsible for helping build the Facebook engineering brand through developer outreach and refocusing the open source strategy of the company. Prior to Facebook, Jonathan founded Streamy.com, where he became an early adopter of Hadoop and HBase. He is now a core contributor and active committer in the community. Jonathan holds a bachelor’s degree in electrical and computer engineering from Carnegie Mellon University.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.