TensorFlow is all kinds of fancy, from helping startups raising their series A in Silicon Valley to detecting if something is a cat. However, when things start to get “real,” you may find yourself no longer just dealing with mnist.csv but instead needing do large-scale data prep as well as training.
Holden Karau details how to use TensorFlow in conjunction with Apache Spark, Flink, and Beam to create a full machine learning pipeline—including the annoying feature engineering and data prep components that we like to pretend don’t exist. Holden also explains why these feature prep stages need to be integrated into the serving layer. She concludes by examining changing industry trends, like Apache Arrow, and how they impact cross-language development for things like deep learning. Even if you’re not trying to raise a round of funding in Silicon Valley, this talk will give you tools to do interesting machine learning problems at scale.
Holden Karau is a transgender Canadian open source developer advocate at Google focusing on Apache Spark, Beam, and related big data tools. Previously, she worked at IBM, Alpine, Databricks, Google (yes, this is her second time), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She’s a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work, she enjoys playing with fire, riding scooters, and dancing.
Comments on this page are now closed.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com