Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

DataOps: An Agile methodology for data-driven organizations

Ellen Friedman (MapR Technologies)
2:40pm3:20pm Wednesday, March 7, 2018
Data engineering and architecture
Location: LL21 E/F Level: Non-technical
Average rating: ****.
(4.43, 7 ratings)

Who is this presentation for?

  • Data scientists, application architects, managers, and developers

Prerequisite knowledge

  • A basic understanding of data science workflows (e.g., what a model is, how it's developed, and what deploying a model means)

What you'll learn

  • Learn about the emerging practice of DataOps and how to develop a DataOps methodology

Description

No longer a new idea, big data is fast becoming a core competency for many organizations. According to a surveys by New Vantage Partners, as of 2016, 62% of F1000 firms and industry leaders report they have at least one big data application in production, double the amount who reported the same in 2013. By 2017, over 80% say their big data investments are successful. But the 2017 report goes on to highlight the major challenge now: dealing with the difficulty in organizational and cultural change around big data.

Another challenge involves the practical logistics of data and application management that are necessary to deliver value in real world settings. Data science and machine learning techniques are playing an increasingly important role in driving value for big data projects. However, as data science and machine learning start to move from R&D to production, organizations are finding unexpected challenges. For instance, with machine learning, it turns out that selecting models and tuning parameters is the easy part; much harder are the logistical aspects—that is, the work involved with curating training datasets, versioning datasets, training models, benchmarking models, deploying them to production, and improving them iteratively. Overcoming these logistical challenges becomes critical for an organization’s ability to derive value from data-intensive applications.

DataOps is an emerging practice that helps with these challenges. At its core is cross-skill communication between data scientists, data engineers, application developers and the operations staff, with a better focus on a shared, data-driven goal. This collaboration fosters an Agile process for flexibility and fast time to value. A successful DataOps practice is also a good fit to emerging approaches designed to deal with logistical aspects of data-intensive applications.

Ellen Friedman offers an overview of DataOps and explains how to implement it.

Topics include:

  • What DataOps is and why it improves focus and flexibility
  • The steps needed to build a DataOps approach
  • Why this style of work makes it more likely to stay on time and on focus
  • The connection between DataOps and microservices
  • How DataOps provides a good fit for use of a global data fabric
Photo of Ellen Friedman

Ellen Friedman

MapR Technologies

Ellen Friedman is principal technologist for MapR Technologies. Ellen is a committer on the Apache Drill and Apache Mahout projects and coauthor of a number of books on computer science, including Machine Learning Logistics, Streaming Architecture, the Practical Machine Learning series, and Introduction to Apache Flink. Ellen has been an invited speaker at Strata Data conferences, Big Data London, Big Data Paris, Berlin Buzzwords, Nike Tech Talks, the University of Sheffield Methods Institute in the UK, and NoSQL Matters Barcelona. She holds a PhD in biochemistry.

Comments on this page are now closed.

Comments

Andrea Santurbano | SENIOR CONSULTANT
04/02/2018 4:08pm PDT

Hi, when the slides will be available?