Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
Please log in

Executive Briefing: What it takes to use machine learning in fast data pipelines

Dean Wampler (Anyscale)
16:3517:15 Thursday, 2 May 2019
Average rating: *****
(5.00, 4 ratings)



What you'll learn

  • Understand the business justification for transitioning from batch-oriented big data to stream-oriented fast data, including the delivery of stream-based, ML/AI services
  • Discover the main challenges faced when deploying these technologies together
  • Explore solutions to these challenges, including criteria to use when evaluating choices


Dean Wampler helps you develop a conceptual understanding of the challenges faced by your teams as they develop and deploy machine learning and artificial intelligence services integrated with fast data (streaming) pipelines. While combining these technologies is challenging, the benefits include timely delivery of innovative services to your customers.

Dean begins by briefly discussing machine learning use cases that are best delivered as streaming data applications. He then explores the main challenges faced when deploying these technologies together and outlines solutions to these challenges, including criteria to use when evaluating choices. Along the way, he explains the tools your teams are already talking about and the role they play.

Topics include:

  • Bridging the gap between data science tools and methods versus data engineering tools and methods needed for robust production delivery
  • How fast data pipelines are forcing changes to data architectures, in order to meet higher demands for reliability, resiliency, dynamic scalability, etc.
  • Performance implications of different AI/ML and fast data tools and techniques
  • Deploying updates to ML/AI capabilities into running pipelines without forcing restarts
Photo of Dean Wampler

Dean Wampler


Dean Wampler is an expert in streaming data systems, focusing on applications of machine learning and artificial intelligence (ML/AI). He’s head of developer relations at Anyscale, which is developing Ray for distributed Python, primarily for ML/AI. Previously, he was an engineering VP at Lightbend, where he led the development of Lightbend CloudFlow, an integrated system for building and running streaming data applications with Akka Streams, Apache Spark, Apache Flink, and Apache Kafka. Dean is the author of Fast Data Architectures for Streaming Applications, Programming Scala, and Functional Programming for Java Developers, and he’s the coauthor of Programming Hive, all from O’Reilly. He’s a contributor to several open source projects. A frequent conference speaker and tutorial teacher, he’s also the co-organizer of several conferences around the world and several user groups in Chicago. He earned his PhD in physics from the University of Washington.