No longer a new idea, big data is fast becoming a core competency for many organizations. According to a surveys by New Vantage Partners, as of 2016, 62% of F1000 firms and industry leaders report they have at least one big data application in production, double the amount who reported the same in 2013. By 2017, over 80% say their big data investments are successful. But the 2017 report goes on to highlight the major challenge now: dealing with the difficulty in organizational and cultural change around big data.
Another challenge involves the practical logistics of data and application management that are necessary to deliver value in real world settings. Data science and machine learning techniques are playing an increasingly important role in driving value for big data projects. However, as data science and machine learning start to move from R&D to production, organizations are finding unexpected challenges. For instance, with machine learning, it turns out that selecting models and tuning parameters is the easy part; much harder are the logistical aspects—that is, the work involved with curating training datasets, versioning datasets, training models, benchmarking models, deploying them to production, and improving them iteratively. Overcoming these logistical challenges becomes critical for an organization’s ability to derive value from data-intensive applications.
DataOps is an emerging practice that helps with these challenges. At its core is cross-skill communication between data scientists, data engineers, application developers and the operations staff, with a better focus on a shared, data-driven goal. This collaboration fosters an Agile process for flexibility and fast time to value. A successful DataOps practice is also a good fit to emerging approaches designed to deal with logistical aspects of data-intensive applications.
Ellen Friedman offers an overview of DataOps and explains how to implement it.
Ellen Friedman is principal technologist for MapR Technologies. Ellen is a committer on the Apache Drill and Apache Mahout projects and coauthor of a number of books on computer science, including Machine Learning Logistics, Streaming Architecture, the Practical Machine Learning series, and Introduction to Apache Flink. Ellen has been an invited speaker at Strata Data conferences, Big Data London, Big Data Paris, Berlin Buzzwords, Nike Tech Talks, the University of Sheffield Methods Institute in the UK, and NoSQL Matters Barcelona. She holds a PhD in biochemistry.
Comments on this page are now closed.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org