Real-life ML workloads typically require more than training and predicting: data often needs to be preprocessed and postprocessed, sometimes in multiple steps. Thus, developers and data scientists have to train and deploy not just a single algorithm but a sequence of algorithms that will collaborate in delivering predictions from raw data.
Julien Simon outlines how to use Apache Spark MLlib to build ML pipelines and discusses scaling options when datasets grow huge. As the cloud is a popular way to scale, he dives into how to how implement inference pipelines on AWS using Apache Spark and sci-kit learn, as well as ML algorithms implemented by Amazon.
Julien Simon is a technical evangelist at AWS. Previously, Julien spent 10 years as a CTO and vice president of engineering at a number of top-tier web startups. He’s particularly interested in all things architecture, deployment, performance, scalability, and data. Julien frequently speaks at conferences and technical workshops, where he helps developers and enterprises bring their ideas to life thanks to the Amazon Web Services infrastructure.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org