Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Job recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )
2:00pm–2:40pm Thursday, 09/13/2018
Secondary topics:  Deep Learning, Media, Marketing, Advertising

Who is this presentation for?

  • Machine and deep learning practitioners and big data professionals

Prerequisite knowledge

  • A basic understanding of Apache Spark, machine learning, and deep learning

What you'll learn

  • Learn how to use BigDL on Apache Spark, apply DL techniques to solve real-world use cases like job search, and deploy DL workloads in the cloud


Collaborative filtering recommends items by identifying other users with similar taste but tends to misfire when user history is little known or new items are introduced into the mix. Incorporating context and natural language processing (NLP) is one way to improve recommendations. In addition, newly developed deep neural networks have shed light on the success by chaptering nonlinear relationships in the user-item dataset.

In the talent attraction industry, short hire cycles limit history around job advertisements and job seekers. The implication is most job recommendation systems search via keywords. Unfortunately, this short keyword context lacks the expressiveness to adequately describe the job seeker’s intent. In contrast, résumés offer a source of much richer context in natural language.

Guoqiong Song, Wenjing Zhan, and Jacob Eisinger demonstrate how to leverage distributed deep learning framework BigDL on Apache Spark to predict a candidate’s probability of applying to specific jobs based on their résumé, including document embedding using the pretrained Global Vectors for Word Representation (GloVe) model and neural collaborative filtering using deep neural networks. The deep learning algorithms in BigDL result in much better results compared to cosine similarity measure or traditional ALS (alternative linear square) as measured by precision and recall metrics.

Photo of Guoqiong Song

Guoqiong Song


Guoqiong Song is a software engineer on the big data technology team at Intel, where she works in the area of big data analytics. She’s engaged in developing and optimizing distributed deep learning frameworks on Apache Spark.

Photo of Wenjing Zhan

Wenjing Zhan


Wenjing Zhan is a data scientist at Talroo, where she is in charge of predictive machine learning. Previously, Wenjing aided in search relevance through classification modeling and has done data engineering with Apache Spark and machine learning in Scala, R, and Python. She holds a master’s degree in statistics from the University of Texas at Austin.

Photo of Jacob Eisinger

Jacob Eisinger


Jacob Eisinger is the director of data at Talroo, where he is responsible for the Special Projects initiative to pilot and validate high-impact business models and technologies. Previously, Jacob led search, personalization, data warehouse, bot detection, and machine learning at Talroo and worked in the Emerging Technologies Group at IBM, where he worked with technologies like BlueMix, Apache Spark, Apache Kafka, OAuth, and web service standards. Jacob is an accomplished inventor with over 20 patent applications. He holds a bachelor’s degree in computer science from Virginia Tech.