Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Media, Marketing, Advertising sessions

11:20am–12:00pm Wednesday, 09/12/2018

BlazeIt: An exploratory video analytics engine

Location: 1A 08 Level: Advanced

Daniel Kang (Stanford University)

Average rating:

(4.00, 2 ratings)

Daniel Kang offers an overview of exploratory video analytics engine BlazeIt, which offers FrameQL, a declarative SQL-like language for querying video, and a query optimizer for executing these queries. You'll see how FrameQL can capture a large set of real-world queries ranging from aggregation and scrubbing and how BlazeIt can execute certain queries up to 2,000x faster than a naive approach. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Harnessing and customizing state-of-the-art recommendation solutions with OpenRec

Location: 1A 15/16 Level: Intermediate

Longqi Yang (Cornell Tech, Cornell University)

State-of-the-art recommendation algorithms are increasingly complex and no longer one size fits all. Current monolithic development practice poses significant challenges to rapid, iterative, and systematic, experimentation. Longqi Yang explains how to use OpenRec to easily customize state-of-the-art solutions for diverse scenarios. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Document vectors in the wild: Building a content recommendation system for Reuters.com

Location: 1A 06/07 Level: Intermediate

James Dreiss (Reuters)

Average rating:

(3.67, 3 ratings)

James Dreiss discusses the challenges in building a content recommendation system for one of the largest news sites in the world, Reuters.com. The particularities of the system include developing a scrolling newsfeed and the use of document vectors for semantic representation of content. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Correlation analysis on live data streams

Location: 1A 12/14 Level: Intermediate

Arun Kejariwal (Independent), Francois Orsini (MZ)

Average rating:

(4.00, 1 rating)

The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Diversification in recommender systems: Using topical variety to increase user satisfaction

Location: 1A 06/07 Level: Intermediate

Ahsan Ashraf (Pinterest)

Online recommender systems often rely heavily on user engagement features. This can cause a bias toward exploitation over exploration, overoptimizing on users' interests. Content diversification is important for user satisfaction, but measuring and evaluating impact is challenging. Ahsan Ashraf outlines techniques used at Pinterest that drove ~2–3% impression gains and a ~1% time-spent gain. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Executive Briefing: Managing successful data projects—Technology selection and team building

Location: 1E 14 Level: Intermediate

Ted Malaska (Capital One), Jonathan Seidman (Cloudera)

Average rating:

(4.00, 3 ratings)

Creating a successful big data practice in your organization presents new challenges in managing projects and teams. Ted Malaska and Jonathan Seidman share guidance and best practices to help technical leaders deliver successful projects from planning to implementation. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Perverse incentives in metrics: Inequality in the like economy

Location: 1A 06/07 Level: Intermediate

Bonnie Barrilleaux (LinkedIn)

Average rating:

(4.50, 4 ratings)

As LinkedIn encouraged members to join conversations, it found itself in danger of creating a "rich get richer" economy in which a few creators got an increasing share of all feedback. Bonnie Barrilleaux explains why you must regularly reevaluate metrics to avoid perverse incentives—situations where efforts to increase the metric cause unintended negative side effects. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

When Tiramisu meets online fashion retail

Location: 1A 15/16 Level: Intermediate

Patty Ryan (Microsoft), CY Yam (Microsoft), Elena Terenzi (Microsoft)

Average rating:

(5.00, 1 rating)

Large online fashion retailers must efficiently maintain catalogues of millions of items. Due to human error, it's not unusual that some items have duplicate entries. Since manually trawling such a large catalogue is next to impossible, how can you find these entries? Patty Ryan, CY Yam, and Elena Terenzi explain how they applied deep learning for image segmentation and background removal. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

Executive Briefing: Most data-driven cultures aren’t

Location: 1E 14 Level: Non-technical

Cassie Kozyrkov (Google)

Average rating:

(4.30, 10 ratings)

Many organizations aren’t aware that they have a blindspot with respect to their lack of data effectiveness, and hiring experts doesn’t seem to help. Cassie Kozyrkov examines what it takes to build a truly data-driven organizational culture and highlights a vital yet often neglected job function: the data science manager. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

If you thought politics was dirty, you should see the analytics behind it.

Location: 1E 12/13 Level: Non-technical

John Thuma (Arcadia Data)

Average rating:

(5.00, 1 rating)

Forget about the fake news; data and analytics in politics is what drives elections. John Thuma shares ethical dilemmas he faced while proposing analytical solutions to the RNC and DNC. Not only did he help causes he disagreed with, but he also armed politicians with real-time data to manipulate voters. Read more.

11:20am–12:00pm Thursday, 09/13/2018

Applying petabyte-scale analytics and machine learning to billions of news reading sessions

Location: 1A 06/07 Level: Intermediate

Andrew Montalenti (Parse.ly )

Average rating:

(5.00, 1 rating)

What can we learn from a one-billion-person live poll of the internet? Andrew Montalenti explains how Parse.ly has gathered a unique dataset of news reading sessions of billions of devices, peaking at over two million sessions per minute on thousands of high-traffic news and information websites, and how the company uses this data to unearth the secrets behind online content. Read more.

11:20am–12:00pm Thursday, 09/13/2018

Data discovery and lineage: Integrating streaming data in the public cloud with on-prem, classic data stores, and heterogeneous schema types

Location: 1E 09 Level: Advanced

Barbara Eckman (Comcast)

Average rating:

(4.33, 6 ratings)

Comcast’s streaming data platform comprises ingest, transformation, and storage services in the public cloud, with Apache Atlas for data discovery and lineage. Barbara Eckman explains how Comcast recently integrated on-prem data sources, including traditional data warehouses and RDBMSs, which required its data governance strategy to include relational and JSON schemas in addition to Apache Avro. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

Augmented reality: Going beyond plots in 3D

Location: 1A 12/14 Level: Beginner

Bob Levy (Virtual Cove, Inc.)

Average rating:

(3.00, 1 rating)

Augmented reality opens a completely new lens on your data through which you see and accomplish amazing things. Bob Levy explains how to use simple Python scripts to leverage completely new plot types. You'll explore use cases revealing new insight into financial markets data as well as new ways of interacting with data that build trust in otherwise “black box” machine learning solutions. Read more.

2:00pm–2:40pm Thursday, 09/13/2018

Job recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

Location: 1A 15/16 Level: Intermediate

Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )

Can the talent industry make the job search/match more relevant and personalized for a candidate by leveraging deep learning techniques? Guoqiong Song, Wenjing Zhan, and Jacob Eisinger demonstrate how to leverage distributed deep learning framework BigDL on Apache Spark to predict a candidate’s probability of applying to specific jobs based on their résumé. Read more.

3:30pm–4:10pm Thursday, 09/13/2018

Scaling data infrastructure in the fashion world; or, “What is this? Business intelligence for ants?”

Location: 1E 10/11 Level: Non-technical

Francesco Mucio (Francescomuc.io)

Average rating:

(3.50, 2 ratings)

Francesco Mucio tells the story of how Zalando went from an old-school BI company to an AI-driven company built on a solid data platform. Along the way, he shares what Zalando learned in the process and the challenges that still lie ahead. Read more.

4:20pm–5:00pm Thursday, 09/13/2018

Building turnkey recommendations for 5% of internet video

Location: 1A 21/22 Level: Intermediate

Nir Yungster (JW Player), Kamil Sindi (JW Player)

JW Player—the world’s largest network-independent video platform, representing 5% of global internet video—provides on-demand recommendations as a service to thousands of media publishers. Nir Yungster and Kamil Sindi explain how the company is systematically improving model performance while navigating the many engineering challenges and unique needs of the diverse publishers it serves. Read more.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com