Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Social media conference sessions

11:00am–11:40am Wednesday, 03/30/2016
Eric Tschetter (Yahoo)
Yahoo uses Druid to provide visibility into the actions of its billions of users and developed a new type of sketch called a Theta Sketch to enable this analysis. Eric Tschetter discusses how Yahoo leverages Druid and Theta Sketches together to enable user-level understanding of their billions of users.
1:50pm–2:30pm Wednesday, 03/30/2016
John Berryman (Eventbrite)
At Eventbrite, users can serendipitously discover events they will love. But making this possible isn't easy. Events are short lived, and by the time Eventbrite can build an adequate collaborative-filtering model, the event is already over. John Berryman explains how Eventbrite overcomes these technical challenges with a combination of collaborative-filtering and content-based methods.
2:40pm–3:20pm Thursday, 03/31/2016
Sijie Guo (StreamNative)
DistributedLog is a high-performance replicated log service built on top of Apache BookKeeper that is the foundation of publish-subscribe at Twitter, serving traffic from transactional databases to real-time data analytic pipelines. Sijie Guo offers an overview of DistributedLog, detailing the technical decisions and challenges behind its creation and how it is used at Twitter.
5:10pm–5:50pm Wednesday, 03/30/2016
Moderated by:
Michael Dauber (Amplify Partners)
Yael Garten (LinkedIn), Monica Rogati (Data Natives), Daniel Tunkelang (Various)
We’ve all heard that rare breed the data scientist described as a unicorn. In building your DS team, should you hold out for that unicorn or create groups of specialists who can work together? Michael Dauber, Yael Garten, Monica Rogati, and Daniel Tunkelang discuss the pros and cons of various team models to help you decide what works best for your particular situation and organization.
11:00am–11:40am Thursday, 03/31/2016
Chi-Yi Kuan (LinkedIn), Weidong Zhang (LinkedIn), Tiger Zhang (LinkedIn)
Chi-Yi Kuan, Weidong Zhang, and Yongzheng Zhang explain how LinkedIn has built a "voice of member" platform to analyze hundreds of millions of text documents. Chi-Yi, Weidong, and Yongzheng illustrate the critical components of this platform and showcase how LinkedIn leverages it to derive insights such as customer value propositions from an enormous amount of unstructured data.
10:00am–10:30am Tuesday, 03/29/2016
Yael Garten (LinkedIn)
You’ve decided you need data scientists. You know who to hire. Now, what do you do with them? Yael Garten offers examples of how companies like LinkedIn use data to make business and product decisions. Yael reviews the spectrum of data science, and discusses the culture, process and tools needed to transform companies into data-driven organizations.
11:50am–12:30pm Thursday, 03/31/2016
Sumeet Singh (Yahoo), Mridul Jain (Yahoo)
Building a real-time monitoring service that handles millions of custom events per second while satisfying complex rules, varied throughput requirements, and numerous dimensions simultaneously is a complex endeavor. Sumeet Singh and Mridul Jain explain how Yahoo approached these challenges with Apache Storm Trident, Kafka, HBase, and OpenTSDB and discuss the lessons learned along the way.
9:30am–10:00am Tuesday, 03/29/2016
Xavier Amatriain explains the lessons learned building real-life machine-learning systems at Quora.
1:30pm–2:00pm Tuesday, 03/29/2016
Michael Conover (LinkedIn)
Michael Conover details the structure and dynamics of LinkedIn's Economic Graph in all its exquisite detail, exploring breathtaking visualizations, sophisticated machine-learning algorithms, and actionable insights that will improve the quality of your professional network.