Making Open Work
May 8–9, 2017: Training & Tutorials
May 10–11, 2017: Conference
Austin, TX

Distinguish pop music from heavy metal using Apache Spark MLlib

2:35pm3:15pm Wednesday, May 10, 2017
Data, Big and Small, TensorFlow
Location: Meeting Room 18 C/D
Level: Beginner
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Software engineers

Prerequisite knowledge

  • A basic understanding of distributed computing concepts
  • The ability to read Java code
  • A working knowledge of Apache Spark (useful but not required)

What you'll learn

  • Understand how machine learning can easily be used in real applications from an applied perspective
  • Discover that machine-learning algorithms are suitable for the Java ecosystem and can be easily integrated in Java applications (particularly with Apache Spark MLlib)
  • Explore the supervised machine-learning pipeline and its main components (e.g., feature extraction, cross-validation, etc.) and Word2Vec
  • See a fun natural language processing pipeline in action


Machine learning may be overhyped nowadays, but there is still a strong belief that this area is exclusively for data scientists with a deep mathematical background who leverage the Python (scikit-learn, Theano, TensorFlow, etc.) or R ecosystems and use specific tools like R Studio, Matlab, or Octave. Obviously, there is some truth to this statement, but Java engineers can also take the best of the machine-learning world from an applied perspective by using our native language and familiar frameworks like Apache Spark. Taras Matyashovsky explains how to use Apache Spark MLlib to build a supervised learning NLP pipeline to distinguish pop music from heavy metal—and have fun in the process. Along the way, Taras offers an overview of the simplest machine-learning tasks and algorithms, like regression, classification, and clustering.

Photo of Taras Matyashovsky

Taras Matyashovsky


Taras Matyashovsky is a software engineer at Lohika, as well as a frequent speaker, the founder of the Morning@Lohika tech talks and a program committee member of JEEConf and XP Days Ukraine conferences. Primarily focused on the development of complex distributed systems and R&D activities, Taras is currently interested in microservices architecture, big data trends, and applied machine learning.