Acme Corporation, a global leader in commerce marketing, classifies 4.5B products a day into ~4,500 categories using Google Taxonomy. At 600 TB of data per day, Acme Corporation has the largest Hadoop cluster in Europe. Manu Mukerji walks you through Acme Corporation’s machine learning example for universal catalogs, explaining how the training and test sets are generated and annotated; how they were created when there is no public training data available; how the model is pushed to production, automatically evaluated, and used; how Acme Corporation built a Hadoop/Spark pipeline using different types of models predicting various values; production issues that arise when applying ML at scale in production; and lessons learned along the way.
Manu Mukerji is senior director of data, machine learning, and analytics at 8×8. Manu’s background lies in cloud computing and big data, working on systems handling billions of transactions per day in real time. He enjoys building and architecting scalable, highly available data solutions and has extensive experience working in online advertising and social media.
Comments on this page are now closed.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org