Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Job recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )
2:00pm–2:40pm Thursday, 09/13/2018
Secondary topics:  Deep Learning, Media, Marketing, Advertising

Who is this presentation for?

  • Machine and deep learning practitioners and big data professionals

Prerequisite knowledge

  • A basic understanding of Apache Spark, machine learning, and deep learning

What you'll learn

  • Learn how to use BigDL on Apache Spark, apply DL techniques to solve real-world use cases like job search, and deploy DL workloads in the cloud

Description

Collaborative filtering recommends items by identifying other users with similar taste but tends to misfire when user history is little known or new items are introduced into the mix. Incorporating context and natural language processing (NLP) is one way to improve recommendations. In addition, newly developed deep neural networks have shed light on the success by chaptering nonlinear relationships in the user-item dataset.

In the talent attraction industry, short hire cycles limit history around job advertisements and job seekers. The implication is most job recommendation systems search via keywords. Unfortunately, this short keyword context lacks the expressiveness to adequately describe the job seeker’s intent. In contrast, résumés offer a source of much richer context in natural language.

Guoqiong Song, Wenjing Zhan, and Jacob Eisinger demonstrate how to leverage distributed deep learning framework BigDL on Apache Spark to predict a candidate’s probability of applying to specific jobs based on their résumé, including document embedding using the pretrained Global Vectors for Word Representation (GloVe) model and neural collaborative filtering using deep neural networks. The deep learning algorithms in BigDL result in much better results compared to cosine similarity measure or traditional ALS (alternative linear square) as measured by precision and recall metrics.

Photo of Guoqiong Song

Guoqiong Song

Intel

Guoqiong Song is a senior deep learning software engineer on the big data technology team at Intel. She’s interested in developing and optimizing distributed deep learning algorithms on Spark. She holds a PhD in atmospheric and oceanic sciences with a focus on numerical modeling and optimization from UCLA.

Guoqiong Song是英特尔大数据技术团队的高级深度学习软件工程师。 她拥有加州大学洛杉矶分校的大气和海洋科学博士学位,专业方向是数值建模和优化。 她现在的研究兴趣是开发和优化分布式深度学习算法。

Photo of Wenjing Zhan

Wenjing Zhan

Talroo

Wenjing Zhan is a data scientist at Talroo, where she is in charge of predictive machine learning. Previously, Wenjing aided in search relevance through classification modeling and has done data engineering with Apache Spark and machine learning in Scala, R, and Python. She holds a master’s degree in statistics from the University of Texas at Austin.

Photo of Jacob Eisinger

Jacob Eisinger

Talroo

Jacob Eisinger is the director of data at Talroo, where he is responsible for the Special Projects initiative to pilot and validate high-impact business models and technologies. Previously, Jacob led search, personalization, data warehouse, bot detection, and machine learning at Talroo and worked in the Emerging Technologies Group at IBM, where he worked with technologies like BlueMix, Apache Spark, Apache Kafka, OAuth, and web service standards. Jacob is an accomplished inventor with over 20 patent applications. He holds a bachelor’s degree in computer science from Virginia Tech.