Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Media, Marketing, Advertising sessions

11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 08 Level: Advanced
Daniel Kang (Stanford University)
Average rating: ****.
(4.00, 2 ratings)
Daniel Kang offers an overview of exploratory video analytics engine BlazeIt, which offers FrameQL, a declarative SQL-like language for querying video, and a query optimizer for executing these queries. You'll see how FrameQL can capture a large set of real-world queries ranging from aggregation and scrubbing and how BlazeIt can execute certain queries up to 2,000x faster than a naive approach. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Longqi Yang (Cornell Tech, Cornell University)
State-of-the-art recommendation algorithms are increasingly complex and no longer one size fits all. Current monolithic development practice poses significant challenges to rapid, iterative, and systematic, experimentation. Longqi Yang explains how to use OpenRec to easily customize state-of-the-art solutions for diverse scenarios. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
James Dreiss (Reuters)
Average rating: ***..
(3.67, 3 ratings)
James Dreiss discusses the challenges in building a content recommendation system for one of the largest news sites in the world, Reuters.com. The particularities of the system include developing a scrolling newsfeed and the use of document vectors for semantic representation of content. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Intermediate
Arun Kejariwal (Independent), Francois Orsini (MZ)
Average rating: ****.
(4.00, 1 rating)
The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
Ahsan Ashraf (Pinterest)
Online recommender systems often rely heavily on user engagement features. This can cause a bias toward exploitation over exploration, overoptimizing on users' interests. Content diversification is important for user satisfaction, but measuring and evaluating impact is challenging. Ahsan Ashraf outlines techniques used at Pinterest that drove ~2–3% impression gains and a ~1% time-spent gain. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1E 14 Level: Intermediate
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
Average rating: ****.
(4.00, 3 ratings)
Creating a successful big data practice in your organization presents new challenges in managing projects and teams. Ted Malaska and Jonathan Seidman share guidance and best practices to help technical leaders deliver successful projects from planning to implementation. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
Bonnie Barrilleaux (LinkedIn)
Average rating: ****.
(4.50, 4 ratings)
As LinkedIn encouraged members to join conversations, it found itself in danger of creating a "rich get richer" economy in which a few creators got an increasing share of all feedback. Bonnie Barrilleaux explains why you must regularly reevaluate metrics to avoid perverse incentives—situations where efforts to increase the metric cause unintended negative side effects. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Patty Ryan (Microsoft), CY Yam (Microsoft), Elena Terenzi (Microsoft)
Average rating: *****
(5.00, 1 rating)
Large online fashion retailers must efficiently maintain catalogues of millions of items. Due to human error, it's not unusual that some items have duplicate entries. Since manually trawling such a large catalogue is next to impossible, how can you find these entries? Patty Ryan, CY Yam, and Elena Terenzi explain how they applied deep learning for image segmentation and background removal. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1E 14 Level: Non-technical
Cassie Kozyrkov (Google)
Average rating: ****.
(4.30, 10 ratings)
Many organizations aren’t aware that they have a blindspot with respect to their lack of data effectiveness, and hiring experts doesn’t seem to help. Cassie Kozyrkov examines what it takes to build a truly data-driven organizational culture and highlights a vital yet often neglected job function: the data science manager. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 12/13 Level: Non-technical
John Thuma (Arcadia Data)
Average rating: *****
(5.00, 1 rating)
Forget about the fake news; data and analytics in politics is what drives elections. John Thuma shares ethical dilemmas he faced while proposing analytical solutions to the RNC and DNC. Not only did he help causes he disagreed with, but he also armed politicians with real-time data to manipulate voters. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Intermediate
Andrew Montalenti (Parse.ly )
Average rating: *****
(5.00, 1 rating)
What can we learn from a one-billion-person live poll of the internet? Andrew Montalenti explains how Parse.ly has gathered a unique dataset of news reading sessions of billions of devices, peaking at over two million sessions per minute on thousands of high-traffic news and information websites, and how the company uses this data to unearth the secrets behind online content. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 09 Level: Advanced
Barbara Eckman (Comcast)
Average rating: ****.
(4.33, 6 ratings)
Comcast’s streaming data platform comprises ingest, transformation, and storage services in the public cloud, with Apache Atlas for data discovery and lineage. Barbara Eckman explains how Comcast recently integrated on-prem data sources, including traditional data warehouses and RDBMSs, which required its data governance strategy to include relational and JSON schemas in addition to Apache Avro. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Beginner
Bob Levy (Virtual Cove, Inc.)
Average rating: ***..
(3.00, 1 rating)
Augmented reality opens a completely new lens on your data through which you see and accomplish amazing things. Bob Levy explains how to use simple Python scripts to leverage completely new plot types. You'll explore use cases revealing new insight into financial markets data as well as new ways of interacting with data that build trust in otherwise “black box” machine learning solutions. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Intermediate
Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )
Can the talent industry make the job search/match more relevant and personalized for a candidate by leveraging deep learning techniques? Guoqiong Song, Wenjing Zhan, and Jacob Eisinger demonstrate how to leverage distributed deep learning framework BigDL on Apache Spark to predict a candidate’s probability of applying to specific jobs based on their résumé. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1E 10/11 Level: Non-technical
Francesco Mucio (Zalando SE)
Average rating: ***..
(3.50, 2 ratings)
Francesco Mucio tells the story of how Zalando went from an old-school BI company to an AI-driven company built on a solid data platform. Along the way, he shares what Zalando learned in the process and the challenges that still lie ahead. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 21/22 Level: Intermediate
Nir Yungster (JW Player), Kamil Sindi (JW Player)
JW Player—the world’s largest network-independent video platform, representing 5% of global internet video—provides on-demand recommendations as a service to thousands of media publishers. Nir Yungster and Kamil Sindi explain how the company is systematically improving model performance while navigating the many engineering challenges and unique needs of the diverse publishers it serves. Read more.