Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

In-Person Training
Data science at scale: Using Spark and Hadoop

Bruce Martin (Cloudera)
Monday, March 13 & Tuesday, March 14, 9:00am - 5:00pm
Average rating: ****.
(4.00, 1 rating)

Participants should plan to attend both days of this 2-day training course. Platinum and Training passes do not include access to tutorials on Tuesday.

Bruce Martin walks you through applying data science methods to real-world challenges in different industries, offering preparation for data scientist roles in the field. Join in to learn how Spark and Hadoop enable data scientists to help companies reduce costs, increase profits, improve products, retain customers, and identify new opportunities.

What you'll learn, and how you can apply it

  • Learn how Spark and Hadoop enable data scientists to help companies reduce costs, increase profits, improve products, retain customers, and identify new opportunities

Data scientists build information platforms to provide deep insight and answer previously unimaginable questions. Spark and Hadoop are transforming how data scientists work by allowing interactive and iterative data analysis at scale. Learn how Spark and Hadoop enable data scientists to help companies reduce costs, increase profits, improve products, retain customers, and identify new opportunities.

Bruce Martin explores what data scientists do, the problems they solve, and the tools and techniques they use. Through in-class simulations and exercises, Bruce walks you through applying data science methods to real-world challenges in different industries, offering preparation for and experience with data scientist roles in the field.

Topics include:

  • How to identify potential business use cases where data science can provide impactful results
  • How to obtain, clean, and combine disparate data sources to create a coherent picture for analysis
  • What statistical methods to leverage for data exploration that will provide critical insight into your data
  • Where and when to leverage Hadoop streaming and Apache Spark for data science pipelines
  • What machine-learning technique to use for a particular data science project
  • How to implement and manage recommenders using Spark’s MLlib and how to set up and evaluate data experiments
  • The pitfalls of deploying new analytics projects to production at scale

About your instructor

Bruce Martin is a senior instructor at Cloudera, where he teaches courses on data science, Apache Spark, Apache Hadoop, and data analysis. Previously, Bruce was principle architect and director of advanced concepts at SunGard Higher Education, where he developed the software architecture for SunGard’s Course Signals Early Intervention System, which uses machine-learning algorithms to predict the success of students enrolled in university courses. Bruce’s other roles have included senior staff engineer at Sun Microsystems and researcher at Hewlett-Packard Laboratories. Bruce has written many papers on data management and distributed system technologies and frequently presents his work at academic and industrial conferences. Bruce holds patents on distributed object technologies. Bruce holds a PhD and master’s degree in computer science from the University of California at San Diego and a bachelor’s degree in computer science from the University of California, Berkeley.

Conference registration

Get the Platinum pass or the Training pass to add this course to your package.

Comments on this page are now closed.

Comments

Picture of Roopesh Chetty
Roopesh Chetty | VICE PRESIDENT - DATA ANALYTICS
03/12/2017 1:02am PST

Hi Bruce,

Are you planning to post an updated agenda for these 2 days.

Picture of Bruce Martin
Bruce Martin | SENIOR INSTRUCTOR
03/08/2017 1:52am PST

Hi Kranthi,
We will provide each student with a VM in the cloud. You need a laptop with a modern browser to access it. Another, better way you can access the VM is with a RDP client. So, prior to the course you may want to download the Microsoft RDP client for Mac from the MacStore.

Regards,
Bruce

Kranthi . | ENGINEER
03/07/2017 8:27pm PST

Hi Bruce,

Looking forward to the training. As a pre-step what computer resources would we need, is a Mac laptop sufficient or do we need access to a Sandbox VM? Please let us know.

Regards,
Kranthi

Picture of Sophia DeMartini
Sophia DeMartini | SENIOR SPEAKER MANAGER
03/07/2017 3:57am PST

Hi Phillip,

It will depend on if everyone who signed up for the training shows up, but we can try. Please email me at speakers@oreilly.com, and we can discuss this.

Thank you,
Sophia

Phillip Kriegel | SYSTEMS ENGINEER
03/07/2017 3:53am PST

Hi! I registered a bit late, and the class was full, so I signed up for a second choice. Will walk-ins be admitted in the event that someone who registered doesn’t show up?

Picture of Bruce Martin
Bruce Martin | SENIOR INSTRUCTOR
02/09/2017 11:27pm PST

Hi Kranthi,

The hands on exercises use Python and SQL. You are free to do them in Scala if you like but the solutions are provided in Python.

Kranthi . | ENGINEER
02/09/2017 9:13pm PST

Hi,

Is this course going to be in Scala? Please let me know.

Regards,
Kranthi

Picture of Sophia DeMartini
Sophia DeMartini | SENIOR SPEAKER MANAGER
02/06/2017 5:29am PST

Hi Mehul,

When I receive an updated agenda from the trainer for this course, I will post it to this page.

Thanks
Sophia

MEHUL RAMANI |
02/06/2017 5:23am PST

@Sophie,

Thank you for the response. Is it possible to share syllabus/agenda for the training.

Picture of Sophia DeMartini
Sophia DeMartini | SENIOR SPEAKER MANAGER
02/02/2017 6:41am PST

Hi Mehul,

Yes, this will be a hands-on training.

Thank you,
Sophia

MEHUL RAMANI |
02/02/2017 6:25am PST

Is this an Hands on Training