Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

DataOps: An Agile methodology for data-driven organizations

Ellen Friedman (Independent)
2:40pm3:20pm Wednesday, March 7, 2018
Average rating: ****.
(4.43, 7 ratings)

Who is this presentation for?

  • Data scientists, application architects, managers, and developers

Prerequisite knowledge

  • A basic understanding of data science workflows (e.g., what a model is, how it's developed, and what deploying a model means)

What you'll learn

  • Learn about the emerging practice of DataOps and how to develop a DataOps methodology

Description

No longer a new idea, big data is fast becoming a core competency for many organizations. According to a surveys by New Vantage Partners, as of 2016, 62% of F1000 firms and industry leaders report they have at least one big data application in production, double the amount who reported the same in 2013. By 2017, over 80% say their big data investments are successful. But the 2017 report goes on to highlight the major challenge now: dealing with the difficulty in organizational and cultural change around big data.

Another challenge involves the practical logistics of data and application management that are necessary to deliver value in real world settings. Data science and machine learning techniques are playing an increasingly important role in driving value for big data projects. However, as data science and machine learning start to move from R&D to production, organizations are finding unexpected challenges. For instance, with machine learning, it turns out that selecting models and tuning parameters is the easy part; much harder are the logistical aspects—that is, the work involved with curating training datasets, versioning datasets, training models, benchmarking models, deploying them to production, and improving them iteratively. Overcoming these logistical challenges becomes critical for an organization’s ability to derive value from data-intensive applications.

DataOps is an emerging practice that helps with these challenges. At its core is cross-skill communication between data scientists, data engineers, application developers and the operations staff, with a better focus on a shared, data-driven goal. This collaboration fosters an Agile process for flexibility and fast time to value. A successful DataOps practice is also a good fit to emerging approaches designed to deal with logistical aspects of data-intensive applications.

Ellen Friedman offers an overview of DataOps and explains how to implement it.

Topics include:

  • What DataOps is and why it improves focus and flexibility
  • The steps needed to build a DataOps approach
  • Why this style of work makes it more likely to stay on time and on focus
  • The connection between DataOps and microservices
  • How DataOps provides a good fit for use of a global data fabric
Photo of Ellen Friedman

Ellen Friedman

Independent

Ellen Friedman is a data technologist with a Ph.D. in biochemistry. She is a committer for Apache Drill and Apache Mahout projects and co-author of books including AI & Analytics in Production, Machine Learning Logistics, Streaming Architecture, the Practical Machine Learning series, and Introduction to Apache Flink, all published by O’Reilly Media. Ellen has been a keynote speaker at JFokus in Stockholm, Big Data London and NoSQL Matters Barcelona and an invited speaker at Strata Data conferences, Berlin Buzzwords, Nike Tech Talks, and the University of Sheffield Methods Institute.

Comments on this page are now closed.

Comments

Andrea Santurbano | SENIOR CONSULTANT
04/02/2018 4:08pm PDT

Hi, when the slides will be available?