Skip to main content

Hardcore Data Science

In this track, we push the envelope of data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. We’ll cover topics such as machine learning, natural language parsing, crowdsourcing and algorithm design.
Who should attend: Data scientists, statisticians, data modellers, and analysts with a strong understanding of data science fundamentals; CTOs, Chief Scientists, and academic researchers.

Gramercy Suite
Brandon Ballinger (Cardiogram)
Average rating: ***..
(3.57, 7 ratings)
Deep learning has upset the best results in speech recognition, computer vision, and other fields. How do deep neural nets work? What makes them different than the classical neural nets of the 70's? How is deep learning getting us closer to the original dream of AI -- machines that can think? Read more.
Gramercy Suite
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Average rating: ***..
(3.14, 7 ratings)
Strata's regular data science track has great talks with real world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting and academia. Read more.
Gramercy Suite
Ted Dunning (MapR, now part of HPE)
Average rating: ****.
(4.25, 12 ratings)
Machine learning constructs such as Recommendation engines take a simplistic approach to data modeling: a single kind of user interaction with a single kind of item is used to suggest the same kind of interaction with the same kind of item. We will cover why this approach is flawed and present an easily implemented recommendation architecture and implementation style that addresses these flaws. Read more.
Gramercy Suite
Bahman Bahmani (Rakuten)
Average rating: ****.
(4.10, 10 ratings)
We will show how scalable algorithm design can enable big data applications that would otherwise be simply infeasible even using the most modern big data architectures. Then, we provide effective techniques for designing such algorithms and explain the tradeoffs governing them. We will crystallize these techniques using concrete examples from machine learning to social network and text analytics. Read more.
Gramercy Suite
Jacqueline Kazil (Capital One)
Average rating: *....
(1.33, 6 ratings)
Qualitative data can often to be lost or disregarded in the data analysis process because of it’s simplicity. On the other end, some systems can be so complex that information is ignored. This talk will introduce agent-based modeling, an approach that addresses both of these issues by looking at data and interactions from the bottom up. Read more.
Gramercy Suite
Brian Dalessandro (Capital One)
Average rating: ****.
(4.17, 6 ratings)
A common Data Science problem is that we have access to a lot of data but not enough of the right data. In many applications the right data is either impossible to collect or prohibitively expense to obtain. This talk will cover the basic strategies of Transfer Learning and will show how they can be leveraged to get the most out of the data you have rather than the data you want. Read more.
Gramercy Suite
Fangjin Yang (Imply), Nelson Ray (Metamarkets)
Average rating: ****.
(4.00, 8 ratings)
Many exact queries require computation and storage that scale linearly or superlinearly in the data. However, many classes of problems exist for which exact query results are not necessary. We describe the roles of various approximation algorithms that allow Druid, a distributed datastore, to increase query speeds and minimize data volume while maintaining rigorous error bounds on the results. Read more.
Gramercy Suite
Robert Grossman (Open Data Group)
Average rating: **...
(2.67, 9 ratings)
Many analytic and data science problems are about trying to understand the data well enough to build a model with good predictive power, but there are also some analytic problems that are best understood as involving an adversary. In this talk, we give an introduction to adversarial analytics, giving examples from fraud, real time bidding systems, high momentum trading, and cybersecurity. Read more.
Gramercy Suite
Vijay Agneeswaran (Walmart Labs), Pranay Tonpay (Impetus)
Average rating: **...
(2.71, 7 ratings)
We will talk about how we have implemented machine learning algorithms over Spark streaming to allow real time processing, namely the “Naïve Bayes” and “Logistic Regression” for Classification and “k-means” for Clustering. We have also implemented PMML support for the ML algorithms in Spark, to provide a very flexible means to import models and evaluate its performance. Read more.


Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners

Press & Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts