Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Hardcore Data Science conference sessions

Tuesday, September 29

9:00am–5:00pm Tuesday, 09/29/2015
Location: 1 E10 / 1 E11
Ben Lorica (O'Reilly), Reza Zadeh (Matroid | Stanford), David Blei (Columbia University), Anima Anandkumar (UC Irvine), Hussein Mehanna (Facebook), Jennifer Chayes (Microsoft Research), Ben Recht (University of California, Berkeley), Tanzeem Choudhury (Cornell and HealthRhythms), Jenn Wortman Vaughan (Microsoft Research), Adam Marcus (B12), Stefanie Jegelka (M.I.T.), Mikhail Bilenko (Microsoft), Reynold Xin (Databricks)
Average rating: ****.
(4.00, 4 ratings)
All-Day: Strata's regular data science track has great talks with real-world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting... Read more.
9:05am–9:30am Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Intermediate
Reza Zadeh (Matroid | Stanford)
Average rating: ***..
(3.33, 3 ratings)
We present the design decisions and extensive benchmarks for distributed matrix computations on Spark. Read more.
9:30am–10:00am Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Advanced
David Blei (Columbia University)
Average rating: ****.
(4.44, 9 ratings)
I will review the basics of topic modeling, and describe our recent research on collaborative topic models, models that simultaneously analyze a collection of texts and its corresponding user behavior. Read more.
10:00am–10:30am Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Advanced
Anima Anandkumar (UC Irvine)
Average rating: **...
(2.50, 4 ratings)
I will demonstrate how tensor methods can yield rich discriminative features for classification tasks and can serve as an alternative method for training neural networks. Read more.
11:00am–11:30am Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Advanced
Hussein Mehanna (Facebook)
Average rating: ***..
(3.75, 8 ratings)
FBLearner Flow is Facebook's machine learning platform, used by over a dozen teams including Search, Ads and News Feed to train models delivering relevant content to users. Tens of thousands of models are trained every week, using trillions of training examples. The system spans several areas, including ML infrastructure, algorithms, and applications built on top of the platform. Read more.
11:30am–12:00pm Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Advanced
Jennifer Chayes (Microsoft Research)
Average rating: ***..
(3.75, 4 ratings)
Here I show how to use the theory of graph limits, developed over the last decade, to give consistent estimators for machine learning of massive sparse networks, and moreover how to do this in a way that protects the privacy of individuals on the network. Read more.
12:00pm–12:30pm Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Advanced
Ben Recht (University of California, Berkeley)
Average rating: ****.
(4.00, 9 ratings)
KeystoneML: building large-scale machine learning pipelines on Apache Spark. Read more.
1:30pm–2:00pm Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Advanced
Tanzeem Choudhury (Cornell and HealthRhythms)
Average rating: **...
(2.75, 4 ratings)
How ubiquitous computing is transforming the treatment of mental health disorders Read more.
2:00pm–2:30pm Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Advanced
Jenn Wortman Vaughan (Microsoft Research)
Average rating: ***..
(3.83, 6 ratings)
Crowdsourcing techniques allow us to harness human computation and the "wisdom of crowds" to make predictions or accomplish other tasks that would be difficult to accomplish using computers alone. In this talk I will discuss ways in which crowdsourcing can be used to generate novel sources of data, and some of the unique challenges that arise when doing so. Read more.
2:30pm–3:00pm Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Intermediate
Adam Marcus (B12)
Average rating: ***..
(3.33, 6 ratings)
We'll explore active learning, in which we identify small questions we can ask the crowd (e.g., workers on CrowdFlower or Amazon's Mechanical Turk) in order to improve the accuracy of machine-powered classifiers. Classifiers are just the beginning of the story: human-in-the-loop computing enables us to move up the knowledge work stack to interesting problems like design and analysis. Read more.
3:30pm–4:00pm Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Advanced
Stefanie Jegelka (M.I.T.)
Average rating: *****
(5.00, 2 ratings)
Submodularity in Machine Learning Read more.
4:00pm–4:30pm Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Intermediate
Mikhail Bilenko (Microsoft)
Average rating: ****.
(4.50, 2 ratings)
Learning with Counts is a simple yet powerful machine learning technique that is widely used in practice, yet has received little attention in literature, remaining a “trick of the trade." This talk will introduce the technique via real-world examples, provide intuition and analysis explaining its power, and describe two extensions yielding significant accuracy and robustness improvements. Read more.
4:30pm–5:00pm Tuesday, 09/29/2015
Location: 1 E10/1 E11 Level: Intermediate
Reynold Xin (Databricks)
Average rating: ****.
(4.00, 4 ratings)
In this talk, we introduce a recent effort in Spark to employ randomized algorithms for a number of common, expensive methods: membership testing, cardinality, stratified sampling, frequent items, quantile estimation. Read more.