Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Practical data science on Hadoop

BRANDON MACKENZIE (IBM), John Rollins (IBM), Jacques Roy (IBM), Chris Fregly (PipelineAI), Mokhtar Kandil (IBM)
9:00am–5:00pm Tuesday, 09/29/2015
Location: 1B 03
Average rating: **...
(2.50, 12 ratings)

Materials or downloads needed in advance

Hands-on lab environment will be provided by IBM. Participants may want to bring a small USB drive to save their files, if desired.


In this three-day course, you will:

  • Learn how to use machine learning, text analysis, and real-time analytics to solve frequently
    encountered, high-value business problems.
  • Understand data science methodology and end-to-end work flow of problem solution including
    data preparation, model building and validation, and model deployment.
  • Use Apache Spark and other tools for analytics.


Day 1

  • Fundamental data science methodology
  • Overview of selected machine learning methods
  • Hands-on labs with Spark MLlib and SystemML libraries
  • Descriptive statistics
  • Feature transformations
  • Supervised and unsupervised methods
  • Diagnostics

Day 2

  • Text analytics concepts
  • Text analytics development, testing, and deployment
  • Continuous analytics (streaming)
  • Hands-on labs on text analytics and streaming

Day 3

  • Recommendation engines with hands-on lab
  • Using Apache Spark with IBM SPSS Modeler
  • What’s coming in data science
  • Spark and hardware accelerators
  • Machine learning pipelines with hands-on lab
  • Productization with Spark

Target Audience

Data scientists, business analysts.
Some knowledge of R and/or Python is preferable but not required.

Additional Information

Hands-on lab environment will be provided by IBM.




Brandon MacKenzie is the Data Science on Hadoop leader on IBM’s Worldwide Technical Sales team for Information Management Software. He is an expert on statistical processing in Hadoop and HPC environments. Brandon earned his master’s degree from The University of Edinburgh.

Photo of John Rollins

John Rollins


John B. Rollins, Ph.D. is a data scientist in the IBM Analytics division of IBM. His background is in the fields of data mining, engineering, and econometrics in many industries. He holds seven patents, and has authored a best-selling engineering textbook and many technical papers. He holds doctoral degrees in economics and petroleum engineering from Texas A&M University.

Photo of Jacques Roy

Jacques Roy


Jacques Roy is a member of the IBM worldwide analytics platform technical team, specializing in big data streaming analytics. He has also worked in many technology areas including operating systems, databases, and application development. He is the author of multiple books, with the most recent being The Power of Now: Real-Time Analytics and IBM InfoSphere Streams. He is also a regular contributor to IBM Data magazine. Jacques has been a presenter at many conferences including IBM’s Information on Demand (IOD).

Photo of Chris Fregly

Chris Fregly


Chris Fregly is founder and research engineer at PipelineAI, a San Francisco-based streaming machine learning and artificial intelligence startup. Previously, Chris was a distributed systems engineer at Netflix, a data solutions engineer at Databricks, and a founding member of the IBM Spark Technology Center in San Francisco. Chris is a regular speaker at conferences and meetups throughout the world. He’s also an Apache Spark contributor, a Netflix Open Source committer, founder of the Global Advanced Spark and TensorFlow meetup, author of the upcoming book Advanced Spark, and creator of the O’Reilly video series Deploying and Scaling Distributed TensorFlow in Production.

Mokhtar Kandil


Comments on this page are now closed.


Kshitija Gokhale
09/28/2015 10:48am EDT

It says up top
“Hands-on lab environment will be provided by IBM. Participants may want to bring a small USB drive to save their files, if desired. "

Carlos Miron
09/26/2015 8:26am EDT

Do I need to bring my own laptop?

Tija Gokhale
09/24/2015 6:04pm EDT

For the hands-on training will I need to bring my own laptop? ro will the computing platform be provided by IBM?

Picture of Armen Donigian
Armen Donigian
09/23/2015 5:11pm EDT

can u post a link to training materials or things we need to download/setup prior to arrival?

Picture of Ben Lorica
Ben Lorica
08/10/2015 6:55am EDT


“When I attend this 3-day training, is there still time for visiting sessions or it is 3days fulltime”
>> this training will coincide with the sessions, it probably won’t be possible to visit sessions while attending this training.

You can see this from the “daily grid” for Tue/Wed/Thu
Alexander Bij
08/10/2015 5:29am EDT

When I attend this 3-day training, is there still time for visiting sessions or it is 3days fulltime.
Then I should buy a TrainingsTicket instead.

Pradipti Pal
06/03/2015 9:18am EDT

Would this be a one on one session including practicals?
Is there a verified certificate for this course?

Suresh Devanathan
06/01/2015 1:11am EDT

Can you please list any pre-requisites for this training?

Picture of Kathy Yu
Kathy Yu
05/22/2015 11:53am EDT

Hi Zhibo – this is a three-day course that runs from Tuesday-Thursday.

Zhibo Zheng
05/22/2015 10:17am EDT

Is this one-day, two-day, or three-day course?