San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Add to Your Schedule

From an archived data field to GO-JEK’s world-class product feature for customer experience

Divya Choudhary (University of Southern California)

2:40pm–3:20pm Wednesday, March 27, 2019

Data Science, Machine Learning & AI
Location: 2009

Secondary topics: Text and Language processing and analysis, Transportation and Logistics

Average rating:

(4.50, 2 ratings)

Who is this presentation for?

Data scientists, analysts, and product managers

Level

Intermediate

Prerequisite knowledge

A basic understanding of Python and the analytical mindset

What you'll learn

Understand how to drive a big product feature through data science and machine learning
Explore the n-gram language model, DBSCAN, and k-means clustering

Description

Like any other service company, customer experience while booking a service is of prime importance to GO-JEK, a technology startup based in Jakarta, Indonesia, that specializes in ride hailing. With immense data influx in the system from more than 18 services, the data fields that had already been archived turned out to be the best data to improve how customers book rides on the GO-JEK app.

Divya Choudhary explains how GO-JEK uses random chat messages and notes written in a local language sent by customers to their drivers while waiting for a ride to arrive to carve out unparalleled information about pickup points and their names (which sometimes even Google Maps has no idea of) and help create a world-class customer pickup experience feature. Join in to learn how GO-JEK used machine learning and natural language processing on this customer notes data—along with bookings data—to come up with a product feature enabling customers to see all nearby pickup gates with their appropriate names when booking a car or ride service. Divya shares the use case and problem statement, the solution, the system for data processing, major algorithmic decisions, the final output feature, and lessons learned.

Topics include:

A machine learning clustering technique
DBSCAN versus k-means: How to know when to use what
The wonders of language modeling
The key: Preprocessing the corpus
The great potential of n-gram modeling

Divya Choudhary

University of Southern California

Divya Choudhary is a researcher and graduate student in data science at USC. A computer science engineer turned decision scientist turned data scientist, Divya is known for her business understanding, approach to problem solving, machine learning, NLP, and driving data science problems to the final execution. She has four years’ experience unveiling the wonders of data using data science. Previously, she was a data scientist at GOJEK and worked closely with the boards of directors of three startups in India and Indonesia. She’s a yoga lover, painter, poetess, and avid trekker and wanderer who’s best at talking to people and learning about them.

Website

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com