Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Building career advisory tools for the tech sector using machine learning

Simon Hughes (, Yuri Bykov (
2:40pm3:20pm Thursday, March 8, 2018
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Data scientists, data engineers, IT professionals, and programmers

Prerequisite knowledge

  • A basic understanding of machine learning, data science, and statistics

What you'll learn

  • Explore the machine learning algorithms behind's career advisory tools for technology professionals and the technologies used to build, deploy, and monitor them in production


As the leading job board for IT professionals in the US, is constantly looking for ways to provide value to its customers that goes beyond providing a job search and a resume database. The company recently released several free career advisory tools for technology professionals, including a salary predictor, a tool that recommends the next skills to learn, and a career path explorer. Simon Hughes and Yuri Bykov offer an overview of the machine learning algorithms behind these tools and the technologies used to build, deploy, and monitor them in production.

Behind all of’s data science solutions lie an in-depth taxonomy of technology skills and a semantic matching algorithm that determines how similar two skill sets are to one another. Simon and Yuri explain how extracts skills from text using the Apache Lucene libraries, how skills are standardized, and how the company makes use of matrix factorization algorithms to determine a measure of similarity between skills and sets of skills. This algorithm allows it to determine how related two skills are to one another, even if they have never occurred together in the same profile or job description.

The dice market value tool enables technology professionals to gauge what their predicted salary should be given their background, work history, level of experience, and location. Simon and Yuri share how they trained a regression model to predict user’s salaries, how the features were selected, and how they used an ensemble of different models to outperform simpler modeling approaches. One unique aspect of the salary prediction tool is that is also suggests the most relevant skills the technology profession should learn next that will give them the optimal future earning potential. Simon and Yuri describe how they combined’s skill similarity model with its salary predictor to achieve this.

Simon and Yuri conclude by examining the career path explorer tool, which allows users to explore potential career paths relevant to their current career. Mapping how professionals change positions over the course of their career required each job title to be mapped into a canonical title. Then transition probabilities can be calculated based on a user’s work history. However, the initial prototypes built using only transition data provided unsatisfactory results; the most likely next step in someone’s career isn’t always the most interesting to users. wanted the application to show users possible transitions that would further their careers in different ways rather than just show the most common path. Simon and Yuri detail how they made use of supply and demand information and salary information to allow users to view relevant career paths that make them more in-demand or improve their earning potential and inform them of the skills they need to learn to make this transition.

Photo of Simon Hughes

Simon Hughes

Simon Hughes is the chief data scientist at technology professional recruiting site, where he develops multiple recommender engines for matching job seekers with jobs and optimizes the accuracy and relevancy of’s job and candidates search. More recently, Simon has been instrumental in building the machine intelligence behind the Career Explorer portion of Dice’s website, which allows users to gauge their market value and explore potential career paths. Simon is a PhD candidate in machine learning and natural language processing at DePaul University, where he is researching machine learning approaches for determining causal relations in student essays, with the view to building more intelligent essay-grading software.

Photo of Yuri Bykov

Yuri Bykov

Yuri Bykov is director of data science at, where he and his team leverage machine learning, NLP, big data, information retrieval, and other scientific disciplines to research and build innovative data products and services that help tech professionals manage their careers. Yuri started his career as a software developer, moving into BI and data analytics before finding his passion in data science. He holds an MBA and MIS from the University of Iowa.

Comments on this page are now closed.


02/05/2018 2:00am PST

Please let us know if there is anything in particular you’d like us to focus on. Links to the tools are included in the proposal above, so please feel free to check them out and ask us any questions you may have.