Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Leveraging Spark and deep learning frameworks to understand data at scale

Vartika Singh (Cloudera), Alan Silva (Cloudera), Alex Bleakley (Cloudera), Steven Totman (Cloudera), Mirko Kämpf (Cloudera), Syed Nasar (Cloudera)

1:30pm–5:00pm Tuesday, 09/11/2018

Data science and machine learning
Location: 1E 07/08 Level: Intermediate

Secondary topics: Deep Learning

Average rating:

(1.00, 1 rating)

Download slides (1-PDF)

Download slides (2-ODP)

Download slides (3-PDF)

Download slides (4-PPTX)

Who is this presentation for?

Data analysts, software engineers, and data scientists

Prerequisite knowledge

A basic understanding of data pipelines, Spark, and machine learning
A working knowledge of Scala and Python

Materials or downloads needed in advance

A laptop (You'll be provided with an environment to run the sample datasets in a cloud-based environment; you'll also have the option of downloading a VM or using local computer to run the programs, although this option is not ideal.)

What you'll learn

Learn preprocessing and ingestion techniques and tools ideal for different kinds of datasets
Understand the nuances of deployment at scale for training and inference across datasets and frameworks

Description

The increasing complexity of learning algorithms and deep neural networks, combined with size of data and parameters, has made it challenging to exploit existing large-scale data processing pipelines for training and inference. Vartika Singh, Alan Silva, Alex Bleakley, Steven Totman, Mirko Kämpf, and Syed Nasar outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. You’ll explore different tools and frameworks, ranging from Spark for preprocessing to deep learning frameworks for training and inference, targeting the nuances in the datasets as they relate to algorithm optimization techniques, frameworks, and scale.

Vartika Singh

Cloudera

Vartika Singh is a field data science architect at Cloudera. Previously, Vartika was a data scientist applying machine learning algorithms to real-world use cases ranging from clickstream to image processing. She has 12 years of experience designing and developing solutions and frameworks utilizing machine learning techniques.

Alan Silva

Cloudera

Alan Silva is a solutions architect and data scientist for Latin America (LATAM) at Cloudera, where he is focused on developing new solutions using machine learning algorithms and solutions, using Marvin as a workflow to support data science and machine learning projects. Alan has experience with a wide range of security systems and network technologies; his technical background includes cryptography, mathematics, network protocols, distributed systems, operational systems, application security, and secure software development. He holds an MSc in computer science from University Federal of São Carlos (UFSCAR), a postgraduate degree in cryptography and network security from University Federal Fluminense (UFF), and a BS in mathematics.

Alex Bleakley

Cloudera

Alex Bleakley is the manager of the Machine Learning Solutions Architecture team at Cloudera. Alex combines core machine learning skills with 6 years experience implementing practical data solutions across multiple industries to lead a team focused on taking machine learning solutions to production at big data scale.

Steven Totman

Cloudera

Steven Totman is the financial services industry lead for Cloudera’s Field Technology Office, where he helps companies monetize their big data assets using Cloudera’s Enterprise Data Hub. Prior to Cloudera, Steve ran strategy for a mainframe-to-Hadoop company and drove product strategy at IBM for DataStage and Information Server after joining with the Ascential acquisition. He architected IBM’s Infosphere product suite and led the design and creation of governance and metadata products like Business Glossary and Metadata Workbench. Steve holds several patents for data-integration and governance/metadata-related designs.

Mirko Kämpf

Cloudera

Mirko Kämpf is a solutions architect on the CEMEA team at Cloudera, where he applies tools from the Hadoop ecosystem, such as Spark, HBase, and Solr, to solve customer’s problems and is working on graph-based knowledge representation using Apache Jena to enable semantic search at scale. Mirko’s research focuses on time-dependent networks and time series analysis at scale. He loves to deliver data-centric workshops and has spoken at several big data-related conferences and meetups. He holds a PhD in statistical physics.

Website

Syed Nasar

Cloudera

Syed Nasar is a solutions architect at Cloudera. As a big data and machine learning professional, his expertise extends to artificial intelligence, machine learning, and computer vision, and he has worked with a number of enterprises in bridging big data technologies with advanced statistical analysis, machine learning, and deep learning to create high-quality data products and intelligent systems that drive strategy and investment decisions. Syed is a founder of the Nashville Artificial Intelligence Society. His research interests include NLP, deep learning (mainly RNN and GAN), distributed systems, machine learning at scale, and emerging technologies. He is the founder of Nashville Artificial Intelligence Society. He holds a master’s degree in interactive intelligence from the Georgia Institute of Technology.

Website

Comments on this page are now closed.

Comments

Deepthi Kolluru | SOFTWARE ENGINEER - 2 TECH

09/11/2018 7:52am EDT

Is anything available before session to download?

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com