Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Data science at scale: Using Spark and Hadoop

Bruce Martin (Cloudera)
9:00am - 5:00pm Monday, September 26 & Tuesday, September 27
Location: 1 C03

All training courses takes place 9:00am - 5:00pm, Monday, September 26 through Tuesday, September 27 and are limited in size to maintain a high level of hands-on learning and instructor interaction.

Participants should plan to attend both days of training. Training passes do not include access to tutorials on Tuesday.

Audio or video recording, live-streaming or broadcasting of the training is strictly prohibited without the prior written consent of O'Reilly and Cloudera. The training event is subject to the applicable Cloudera Training Terms excluding the Description of Services which do not apply.

Prerequisite knowledge

  • A working knowledge of the Linux command line
  • Proficiency in a scripting language (Python is strongly preferred; Perl or Ruby is sufficient.)
  • What you'll learn

  • Learn how Spark and Hadoop enable data scientists to help companies reduce costs, increase profits, improve products, retain customers, and identify new opportunities
  • Description

    Data scientists build information platforms to provide deep insight and answer previously unimaginable questions. Spark and Hadoop are transforming how data scientists work by allowing interactive and iterative data analysis at scale. Learn how Spark and Hadoop enable data scientists to help companies reduce costs, increase profits, improve products, retain customers, and identify new opportunities.

    Bruce Martin explores what data scientists do, the problems they solve, and the tools and techniques they use. Through in-class simulations and exercises, Bruce walks you through applying data science methods to real-world challenges in different industries, offering preparation for and experience with data scientist roles in the field.

    Topics include:

    • How to identify potential business use cases where data science can provide impactful results
    • How to obtain, clean, and combine disparate data sources to create a coherent picture for analysis
    • What statistical methods to leverage for data exploration that will provide critical insight into your data
    • Where and when to leverage Hadoop streaming and Apache Spark for data science pipelines
    • What machine-learning technique to use for a particular data science project
    • How to implement and manage recommenders using Spark‚Äôs MLlib and how to set up and evaluate data experiments
    • The pitfalls of deploying new analytics projects to production at scale
    Photo of Bruce Martin

    Bruce Martin


    Bruce Martin is a senior instructor at Cloudera, where he teaches courses on data science, Apache Spark, Apache Hadoop, and data analysis. Previously, Bruce was principal architect and director of advanced concepts at SunGard Higher Education, where he developed the software architecture for SunGard’s Course Signals Early Intervention System, which uses machine learning algorithms to predict the success of students enrolled in university courses. His other roles have included senior staff engineer at Sun Microsystems and researcher at Hewlett-Packard Laboratories. Bruce has written many papers on data management and distributed system technologies and frequently presents his work at academic and industrial conferences. Bruce has authored patents on distributed object technologies. He holds a PhD and master’s degree in computer science from the University of California, San Diego, and a bachelor’s degree in computer science from the University of California, Berkeley.

    Comments on this page are now closed.


    09/24/2016 3:23am EDT

    Looking forward to this training.

    09/23/2016 9:10am EDT

    Same question as Marvin. What do I need to download or install beforehand?

    Marvin Watts
    09/08/2016 4:56pm EDT

    As a first time attendee, I would like to know if any prerequisite software or hardware is needed for this training? Not software or hardware experience, but actual software or hardware.

    Picture of Bruce Martin
    Bruce Martin
    09/08/2016 6:46am EDT

    Rajan, No you do not. I will present basic concepts of Machine Learning and apply them in the context of a recommender system.

    09/08/2016 6:08am EDT

    Do I need any experience with machine learning Technics to attend this class?