Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

From an archived data field to GO-JEK’s world-class product feature for customer experience

Divya Choudhary (University of Southern California)
2:40pm3:20pm Wednesday, March 27, 2019
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Data scientists, analysts, and product managers



Prerequisite knowledge

  • A basic understanding of Python and the analytical mindset

What you'll learn

  • Understand how to drive a big product feature through data science and machine learning
  • Explore the n-gram language model, DBSCAN, and k-means clustering


Like any other service company, customer experience while booking a service is of prime importance to GO-JEK, a technology startup based in Jakarta, Indonesia, that specializes in ride hailing. With immense data influx in the system from more than 18 services, the data fields that had already been archived turned out to be the best data to improve how customers book rides on the GO-JEK app.

Divya Choudhary explains how GO-JEK uses random chat messages and notes written in a local language sent by customers to their drivers while waiting for a ride to arrive to carve out unparalleled information about pickup points and their names (which sometimes even Google Maps has no idea of) and help create a world-class customer pickup experience feature. Join in to learn how GO-JEK used machine learning and natural language processing on this customer notes data—along with bookings data—to come up with a product feature enabling customers to see all nearby pickup gates with their appropriate names when booking a car or ride service. Divya shares the use case and problem statement, the solution, the system for data processing, major algorithmic decisions, the final output feature, and lessons learned.

Topics include:

  • A machine learning clustering technique
  • DBSCAN versus k-means: How to know when to use what
  • The wonders of language modeling
  • The key: Preprocessing the corpus
  • The great potential of n-gram modeling
Photo of Divya Choudhary

Divya Choudhary

University of Southern California

Divya Choudhary is a researcher and graduate student in data science at USC. A computer science engineer turned decision scientist turned data scientist, Divya is known for her business understanding, approach to problem solving, machine learning, NLP, and driving data science problems to the final execution. She has four years’ experience unveiling the wonders of data using data science. Previously, she was a data scientist at GOJEK and worked closely with the boards of directors of three startups in India and Indonesia. She’s a yoga lover, painter, poetess, and avid trekker and wanderer who’s best at talking to people and learning about them.