Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Schedule: Media, Advertising, Entertainment sessions

11:15–11:55 Wednesday, 23 May 2018

Revolutionizing the newsroom with artificial intelligence

Data science and machine learning, Data-driven business management, Emerging technologies and case studies, Expo Hall
Location: Expo Hall Level: Beginner

Dan Gilbert (News UK), Jonathan Leslie (Pivigo)

Average rating:

(3.75, 4 ratings)

In the era of 24-hour news and online newspapers, editors in the newsroom must quickly and efficiently make sense of the enormous amounts of data that they encounter and make decisions about their content. Daniel Gilbert and Jonathan Leslie discuss an ongoing partnership between News UK and Pivigo in which a team of data science trainees helped develop an AI platform to assist in this task. Read more.

11:15–11:55 Wednesday, 23 May 2018

Web analytics at scale with Druid at Naver

Data engineering and architecture
Location: S11B Level: Intermediate

Jason Heo (Naver), Dooyong Kim (Navercorp)

Average rating:

(3.00, 1 rating)

Naver.com is the largest search engine in Korea, with a 70% share of the Korean search market, and it handles billions of pages and events everyday. Jason Heo and Dooyong Kim offer an overview of Naver's web analytics system, built with Druid. Read more.

11:15–11:55 Wednesday, 23 May 2018

Finding bias in social media recommendations

Data science and machine learning, Law, ethics, and governance
Location: Capital Suite 14

Guillaume Chaslot (AlgoTransparency)

Average rating:

(4.17, 6 ratings)

An increasing number of ex-Google and ex-Facebook employees state that social media is starting to control us rather than the other way around. How can we determine if social media is a pure reflection of people's interests or if it pushes us toward specific narratives? Guillaume Chaslot explores methodologies to find out which narratives are favored by social media recommendation engines. Read more.

12:05–12:45 Wednesday, 23 May 2018

Deep learning for recommender systems

Data science and machine learning
Location: Capital Suite 13 Level: Intermediate

Nick Pentreath (IBM)

Average rating:

(4.43, 7 ratings)

In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice. Read more.

12:05–12:45 Wednesday, 23 May 2018

Fairness and diversity in online social systems

Data science and machine learning
Location: Capital Suite 12

Elisa Celis (EPFL)

Average rating:

(4.25, 4 ratings)

There is a pressing need to design new algorithms that are socially responsible in how they learn and socially optimal in the manner in which they use information. Elisa Celis explores the emergence of bias in algorithmic decision making and presents first steps toward developing a systematic framework to control biases in classical problems, such as data summarization and personalization. Read more.

16:35–17:15 Wednesday, 23 May 2018

Using Siamese CNNs for removing duplicate entries from real estate listing databases

Big data and data science in the cloud, Data science and machine learning
Location: Capital Suite 13 Level: Intermediate

Sergey Ermolin (Intel), Olga Ermolin (MLS Listings)

Average rating:

(4.00, 1 rating)

Aggregation of geospecific real estate databases results in duplicate entries for properties located near geographical boundaries. Sergey Ermolin and Olga Ermolin detail an approach for identifying duplicate entries via the analysis of images that accompany real estate listings that leverages a transfer learning Siamese architecture based on VGG-16 CNN topology. Read more.

11:15–11:55 Thursday, 24 May 2018

Big data, big quality: Data quality at Spotify

Data engineering and architecture
Location: S11B Level: Intermediate

Irene Gonzálvez (Spotify)

Average rating:

(3.88, 8 ratings)

Irene Gonzálvez shares Spotify's process for ensuring data quality, covering why and how the company became aware of its importance, the products it has developed, and future strategy. Read more.

11:15–11:55 Thursday, 24 May 2018

Accelerating development velocity of production ML systems with Docker

Data engineering and architecture, Streaming systems and real-time applications
Location: Capital Suite 7 Level: Intermediate

Kinnary Jangla (Pinterest)

Average rating:

(3.00, 5 ratings)

Having trouble coordinating development of your production ML system between a team of developers? Microservices drifting and causing problems debugging? Kinnary Jangla explains how Pinterest dockerized the services powering its home feed and how it impacted the engineering productivity of its ML teams while increasing uptime and ease of deployment. Read more.

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com