Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Leveraging Spark and deep learning frameworks to understand data at scale

Vartika Singh (Cloudera), Alan Silva (Cloudera), Alex Bleakley (Cloudera), Steven Totman (Cloudera), Mirko Kämpf (Cloudera), Syed Nasar (Cloudera)
1:30pm–5:00pm Tuesday, 09/11/2018
Data science and machine learning
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Deep Learning
Average rating: *....
(1.00, 1 rating)

Who is this presentation for?

  • Data analysts, software engineers, and data scientists

Prerequisite knowledge

  • A basic understanding of data pipelines, Spark, and machine learning
  • A working knowledge of Scala and Python

Materials or downloads needed in advance

  • A laptop (You'll be provided with an environment to run the sample datasets in a cloud-based environment; you'll also have the option of downloading a VM or using local computer to run the programs, although this option is not ideal.)

What you'll learn

  • Learn preprocessing and ingestion techniques and tools ideal for different kinds of datasets
  • Understand the nuances of deployment at scale for training and inference across datasets and frameworks


The increasing complexity of learning algorithms and deep neural networks, combined with size of data and parameters, has made it challenging to exploit existing large-scale data processing pipelines for training and inference. Vartika Singh, Alan Silva, Alex Bleakley, Steven Totman, Mirko Kämpf, and Syed Nasar outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. You’ll explore different tools and frameworks, ranging from Spark for preprocessing to deep learning frameworks for training and inference, targeting the nuances in the datasets as they relate to algorithm optimization techniques, frameworks, and scale.

Photo of Vartika Singh

Vartika Singh


Vartika Singh is a field data science architect at Cloudera. Previously, Vartika was a data scientist applying machine learning algorithms to real-world use cases ranging from clickstream to image processing. She has 12 years of experience designing and developing solutions and frameworks utilizing machine learning techniques.

Photo of Alan Silva

Alan Silva


Alan Silva is a solutions architect and data scientist for Latin America (LATAM) at Cloudera, where he is focused on developing new solutions using machine learning algorithms and solutions, using Marvin as a workflow to support data science and machine learning projects. Alan has experience with a wide range of security systems and network technologies; his technical background includes cryptography, mathematics, network protocols, distributed systems, operational systems, application security, and secure software development. He holds an MSc in computer science from University Federal of São Carlos (UFSCAR), a postgraduate degree in cryptography and network security from University Federal Fluminense (UFF), and a BS in mathematics.

Photo of Alex Bleakley

Alex Bleakley


Alex Bleakley is the manager of the Machine Learning Solutions Architecture team at Cloudera. Alex combines core machine learning skills with 6 years experience implementing practical data solutions across multiple industries to lead a team focused on taking machine learning solutions to production at big data scale.

Photo of Steven Totman

Steven Totman


Steven Totman is the financial services industry lead for Cloudera’s Field Technology Office, where he helps companies monetize their big data assets using Cloudera’s Enterprise Data Hub. Prior to Cloudera, Steve ran strategy for a mainframe-to-Hadoop company and drove product strategy at IBM for DataStage and Information Server after joining with the Ascential acquisition. He architected IBM’s Infosphere product suite and led the design and creation of governance and metadata products like Business Glossary and Metadata Workbench. Steve holds several patents for data-integration and governance/metadata-related designs.

Photo of Mirko Kämpf

Mirko Kämpf


Mirko Kämpf is a solutions architect on the CEMEA team at Cloudera, where he applies tools from the Hadoop ecosystem, such as Spark, HBase, and Solr, to solve customer’s problems and is working on graph-based knowledge representation using Apache Jena to enable semantic search at scale. Mirko’s research focuses on time-dependent networks and time series analysis at scale. He loves to deliver data-centric workshops and has spoken at several big data-related conferences and meetups. He holds a PhD in statistical physics.

Photo of Syed Nasar

Syed Nasar


Syed Nasar is a solutions architect at Cloudera. As a big data and machine learning professional, his expertise extends to artificial intelligence, machine learning, and computer vision, and he has worked with a number of enterprises in bridging big data technologies with advanced statistical analysis, machine learning, and deep learning to create high-quality data products and intelligent systems that drive strategy and investment decisions. Syed is a founder of the Nashville Artificial Intelligence Society. His research interests include NLP, deep learning (mainly RNN and GAN), distributed systems, machine learning at scale, and emerging technologies. He is the founder of Nashville Artificial Intelligence Society. He holds a master’s degree in interactive intelligence from the Georgia Institute of Technology.

Comments on this page are now closed.


Deepthi Kolluru | SOFTWARE ENGINEER - 2 TECH
09/11/2018 7:52am EDT

Is anything available before session to download?