Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Fixing what’s broken: Big data in the enterprise (sponsored by Cask)

Jonathan Gray (Cask)
11:50am12:30pm Wednesday, March 15, 2017
Sponsored
Location: 230 B
Average rating: *****
(5.00, 1 rating)

What you'll learn

  • Explore the standardization, automation, and deep integration technologies in Hadoop and Spark that allow companies, developers, and users to focus on application logic and insights rather than infrastructure and integration

Description

Enterprises are increasingly looking to deploy big data technologies to transform their business, but the time to generate value from their data often exceeds their worst expectations. Hadoop and Spark provide unprecedented scale and flexibility at a low cost compared to data warehouses. However, the messy and diverse nature of big data means that users have to stitch together disparate systems, resulting in undesirable complexities and inefficiencies. Even simple tasks like ingestion or transformation of data can be cumbersome, requiring large number of lines of complex code and manual programming. The sheer volume of petabytes of data distributed across a cluster further complicates operations, security, and data governance, and the lack of skilled resources is a big barrier in using and operationalizing this modern data architecture.

Moreover, many of the advantages result in downstream issues. Schema-on-read allows more flexibility but is turning data lakes into data swamps. The varied choice of open source technologies that offer the promise of a rich, diverse ecosystem ends up in specialized divergent options that can trigger integration headaches (e.g., multiple storage layers, many processing engines, and various workflow engines and schedules). Point solutions are limited and cannot be easily put in production or often require custom integration code. Finally, breaking data silos and democratizing data is not easily achievable as the platform has severe usability shortcomings (command-line and code requirements) for business users.

Jonathan Gray explores the standardization, automation, and deep integration technologies in Hadoop and Spark that allow companies, developers, and users to focus on application logic and insights rather than infrastructure and integration.

Topics include:

  • A simplified API for data integration and app development on big data
  • Pervasive metadata, lineage, and usage analytics
  • A layer of abstraction to ensure portability, reusability, and future-proofing
  • Self-service user experience for citizen integrators and business users
  • Packaged solutions and prebuilt components for rapid time-to-value
  • Sophisticated security, audit, and encryption for compliance needs
  • API support, management, and replication to operationalize projects

This session is sponsored by Cask.

Photo of Jonathan Gray

Jonathan Gray

Cask

Jonathan Gray is the founder and CEO of Cask. Jonathan is an entrepreneur and software engineer with a background in startups, open source, and all things data. Prior to founding Cask, he was a software engineer at Facebook, where he helped drive HBase engineering efforts, including Facebook Messages and several other large-scale projects, from inception to production. An open source evangelist, Jonathan was responsible for helping build the Facebook engineering brand through developer outreach and refocusing the open source strategy of the company. Prior to Facebook, Jonathan founded Streamy.com, where he became an early adopter of Hadoop and HBase. He is now a core contributor and active committer in the community. Jonathan holds a bachelor’s degree in electrical and computer engineering from Carnegie Mellon University.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)