Skip to main content

Effective Data Science With Scalding

Vitaly Gordon (LinkedIn)
Data Science
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ****.
(4.00, 3 ratings)

Tutorial Prerequisites

Clone this repository and follow the readme instructions to run the example. This will install all the necessary dependencies on your machine.

Tutorial Description

Data products are the driving force behind new multi-billion dollar companies and a lot of the things we do today on a day to day basis have machine learning algorithms behind them. But unfortunately, even though data science is a concept invented in the 21st century, in practice the state of data science is more similar to software engineering in late 20th century.

The pioneers of data science did a great job of making it very accessible and fairly easy to pick up, but since it’s beginning circa 2005, not much effort has been made to bring it up to par with modern software engineering practices. Machine learning is software. As such, it should follow standard software engineering practices,, however, the current tools of the trade are not modular, maintainable or reusable. In this tutorial we will learn to work with Scalding, a Scala DSL which provides both the simplicity of languages like Apache Pig, and the power of a functional fully JVM language.

Photo of Vitaly Gordon

Vitaly Gordon

Senior Data Scientist, LinkedIn

Vitaly Gordon is a senior data scientist on the LinkedIn Product Data Science team where he develops data products that most of you use every day. Prior to LinkedIn, Vitaly founded the data science team at LivePerson and worked in the elite 8200 unit, leading a team of researchers in developing algorithms to fight terrorism. His contributions have been recognized through a number of awards including the “Life Source” award, an award given each year deemed most high-impact in saving lives. Vitaly holds a B.Sc in Computer Science and an MBA from the Israeli Institute of Technology.