Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Leveraging Spark and deep learning frameworks to understand data at scale

Vartika Singh (Cloudera), Juan Yu (Cloudera), Marton Balassi (Cloudera), Steven Totman (Cloudera)

9:00–12:30 Tuesday, 22 May 2018

Data science and machine learning
Location: Capital Suite 15 Level: Intermediate

Average rating:

(3.75, 4 ratings)

Who is this presentation for?

Data analysts, software engineers, and data scientists

Prerequisite knowledge

A basic understanding of data pipelines, Spark, and machine learning
A working knowledge of Scala and Python

Materials or downloads needed in advance

N/A

What you'll learn

Learn preprocessing and ingestion techniques and tools ideal for different kinds of datasets
Understand the nuances of deployment at scale for training and inference across data sets and frameworks

Description

The increasing complexity of learning algorithms and deep neural networks, combined with size of data and parameters, has made it challenging to exploit existing large-scale data processing pipelines for training and inference. Vartika Singh, Marton Balassi, Steven Totman, and Juan Yu outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. Vartika, Marton, Steven, and Juan walk you through different tools and frameworks, ranging from Spark for preprocessing to deep learning frameworks for training and inference, targeting the nuances in the datasets as they relate to algorithm optimization techniques, frameworks, and scale.

Vartika Singh

Cloudera

Vartika Singh is a field data science architect at Cloudera. Previously, Vartika was a data scientist applying machine learning algorithms to real-world use cases ranging from clickstream to image processing. She has 12 years of experience designing and developing solutions and frameworks utilizing machine learning techniques.

Juan Yu

Cloudera

Juan Yu is a software engineer at Cloudera working on the Impala project, where she helps customers investigate, troubleshoot, and resolve escalations and analyzes performance issues to identify bottlenecks, failure points, and security holes. Juan also implements enhancements in Impala to improve customer experience. Previously, Juan was a software engineer at Interactive Intelligence and held developer positions at Bluestreak, Gameloft, and Engenuity.

Marton Balassi

Cloudera

Marton Balassi is a solutions architect at Cloudera, where he focuses on data science and stream processing with big data tools. Marton is a PMC member at Apache Flink and a regular contributor to open source. He is a frequent speaker at big data-related conferences and meetups, including Hadoop Summit, Spark Summit, and Apache Big Data.

Steven Totman

Cloudera

Steven Totman is the financial services industry lead for Cloudera’s Field Technology Office, where he helps companies monetize their big data assets using Cloudera’s Enterprise Data Hub. Prior to Cloudera, Steve ran strategy for a mainframe-to-Hadoop company and drove product strategy at IBM for DataStage and Information Server after joining with the Ascential acquisition. He architected IBM’s Infosphere product suite and led the design and creation of governance and metadata products like Business Glossary and Metadata Workbench. Steve holds several patents for data-integration and governance/metadata-related designs.

Comments on this page are now closed.

Comments

Juan Yu | SOFTWARE ENGINEER

5/06/2018 3:57 BST

Hey Stephanie,

Sorry for the slow response. We have some issues to upload the slides. I put my part on github, you can get it here:
https://github.com/yjwater/strata-london/blob/master/Machine%20Learning%250BFrom%20idea%20to%20productization.pdf

stephanie werli | DATA ENGINEER

23/05/2018 10:33 BST

Hello, I was really interested by Juan Yu’s part. Is there a way to get the slides of her presentation? Thanks.

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com