20–23 April 2020

Building real-world data pipelines

Ted Dunning (MapR, an HPE company)
14:0514:45 Wednesday, April 22, 2020
Location: Capital Suite 13

Who is this presentation for?

Data engineers, data architects, developers

Level

Intermediate

Description

Data pipelines are fast becoming a standard fixture in modern systems. Everybody with data has one. Or a dozen. The knowledge of how to build and maintain these pipelines, however, isn’t nearly as widely known as, say, building a data warehouse. The truth is out there, though, and with the right tools, building a maintainable pipeline doesn’t have to be nearly as hard as it seems. Just the way DevOps can simplify the development and deployment of conventional software, an ML Ops approach can simplify building large-scale data pipelines. The benefits are similar in that you gain control and tighten development cycles, but many of the tools and concepts are different.

Ted Dunning demystifies the core building blocks of such pipelines and how to use tools such as TensorFlow (extended), scikit-learn, Apache Flink, and Apache Beam to build, maintain, and monitor them.

Prerequisite knowledge

  • A basic understanding of what a model is and roughly how machine learning is used to build models from data

What you'll learn

  • Learn the most current techniques for building and maintaining data pipelines
Photo of Ted Dunning

Ted Dunning

MapR, an HPE company

Ted Dunning is the chief technology officer at MapR, an HPE company. He’s also a board member for the Apache Software Foundation; a PMC member; and committer on a number of projects. Ted has years of experience with machine learning and other big data solutions across a range of sectors. He’s contributed to clustering, classification, and matrix decomposition algorithms in Mahout and to the new Mahout Math library and designed the t-digest algorithm used in several open source projects and by a variety of companies. Previously, Ted was chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems and built fraud-detection systems for ID Analytics (LifeLock). Ted has coauthored a number of books on big data topics, including several published by O’Reilly related to machine learning, and has 24 issued patents to date plus a dozen pending. He holds a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires