Schedule: Data Science sessions

Leading data practitioners share their experience and techniques. With talks on machine learning, Hadoop, behavioral modeling and more, this track covers the successes, failures and the human side of working with data.

Add to your personal schedule
Location: King's Suite
Doug Cutting (Cloudera)
Average rating: ***..
(3.15, 33 ratings)
As technology further pervades enterprises, each generates more data. Once harnessed, this data can enhance business, enabling growth. A new home for data has arrived to better support this: the Enterprise Data Hub, with Apache Hadoop at its center. Doug will discuss the trends that drive this and speculate on where they lead. Read more.
Add to your personal schedule
Location: King's Suite
Max Ogden (Independent)
Average rating: ***..
(3.65, 26 ratings)
Dat aims to bring a distributed collaboration flow to big data. Git and Github have done it for source code, but we don't yet have a social data solution. Read more.
Add to your personal schedule
Location: King's Suite Level: Intermediate
Francine Bennett (Mastodon C)
Average rating: ***..
(3.93, 27 ratings)
The NHS produces an amazing amount of detailed raw data about health, prescribing, doctors, hospitals, and so on. The data's a great resource for data scientists to experiment with and learn on - it's very rich, interesting, and important to society. This session will discuss the available datasets and work through some example analyses of the data from different perspectives. Read more.
Add to your personal schedule
Location: Palace Suite - Blenheim Room Level: Intermediate
Jan Overgoor (Airbnb)
Average rating: ***..
(3.00, 7 ratings)
For a two-sided marketplace like Airbnb, the search engine is the main driver of the health of the business. We developed an open-source technology stack and a set of analytical methods to optimize the search experience for our users and search conversion for our business. We’ll discuss the tools we use for data crunching, analysis and reporting, as well as our thoughts on experimental design. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Non-technical
Average rating: ***..
(3.50, 2 ratings)
We're getting better all the time. See how the Cato Institute used responsive design and D3.js to show how human development indicators improve as economic freedom spreads. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Non-technical
Patrick Wendell (Databricks)
Average rating: ****.
(4.67, 12 ratings)
As big data analytics evolves beyond simple batch jobs, there is a need for both lower-latency processing (interactive queries and steam processing) and more complex analytics (e.g. machine learning, graph algorithms). This talk will introduce Spark and Shark, popular open source projects from Berkeley that address this need through an optimized runtime engine and in-memory computing capabilities. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
Ian Hegerty (Facebook)
Average rating: ***..
(3.67, 9 ratings)
In January Facebook launched Graph Search in the US which allows users to search their social graph. Ian Hegerty will describe how the Graph Search corpus was built from Facebook's entity graph, and how big data is used to understand users queries and provide relevant results, with minimal initial user behavioral data. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Non-technical
Average rating: ***..
(3.31, 16 ratings)
How Stuff Spreads looks at how two recent memes spread online: Gangnam Style vs Harlem Shake. The talk dissects the memes through the lens of big data to show what made them go viral, what do they have in common, how quantitative and qualitative analysis have to come together to craft insights and tell a story, and finally how to predict future memes and create a data-driven content strategy. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Intermediate
Andrew Hill (Set), Robin Kraft (World Resources Institute), Javier de la Torre (Vizzuality)
Average rating: ****.
(4.25, 8 ratings)
Maps are powerful tools for people to learn from data. In this project, we combine large-scale data processing with Hadoop and data visualization through CartoDB to make over six years of bi-monthly deforestation data accessible in an interactive map on the web. This talk will tell the story of how large-scale data paired with visualization can make data accessible in important new ways. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Non-technical
Average rating: *....
(1.58, 12 ratings)
How do we know what we know? Increasingly discoveries are made from computed data, possibly sourced from the internet. If we are to trust these discoveries, how conclusions are reached is critical. Examples from work in Big Data analytics infrastructure for life sciences and social media analysis will illustrate the key issues. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Intermediate
Mano Marks (Google, Inc. ), Kurt Schwehr (Google, Inc.)
Average rating: **...
(2.30, 10 ratings)
Many big data solutions focus on large data analysis that happens in data centers. Or they focus on data visualization in the browser. When you combine both of these techniques, you get amazing and expressive power. This talk will show how to use the Google Maps API with WebGL and Google Big Query, Cloud Storage, App Engine and Compute Engine to deliver amazing, responsive visualizations. Read more.
Add to your personal schedule
Location: Palace Suite - Blenheim Room Level: Non-technical
Simon Williams (QuantumBlack)
Average rating: ****.
(4.33, 9 ratings)
Crossrail will help deliver a new London. It is one of the largest civil engineering projects -- taking place literally under the feet of Strata London. We'll present how data science is being deployed at Crossrail to fundamentally change the way decisions are made and the operation is being run; from the CEO to the engineers monitoring ground movement in the tunnels. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
yodit stanton (opensensors.io)
Average rating: ***..
(3.38, 8 ratings)
Medical treatments have have come a long away in the last couple of decades. On the other hand, we could be doing a lot better in monitoring people within their own homes between hospital visits using sensors. Sensors combined with Big Data technologies are set to bring about profound changes for the future of health and social care. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Non-technical
Claire Miller (Trinity Mirror Regionals)
Average rating: **...
(2.71, 7 ratings)
How do you do data journalism when you are not the Guardian, the New York Times or the Washington Post? You don't need a data team, developers, much time or any funding to get started and produce data journalism that grabs headlines and engages readers. This workshop will focus on quick start techniques for getting started and making the most of few resources. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Arshak Navruzyan (Argyle Data)
Average rating: ***..
(3.57, 7 ratings)
Fast read and write performance and scalability of distributed in-memory clusters is making it possible to retrain machine learning algorithms in real-time. The application of such algorithms to risk, infrastructure security and other areas can be transformative. Read more.
Add to your personal schedule
Location: Palace Suite - Blenheim Room Level: Non-technical
Stian Westlake (Nesta), Louise Marston (Nesta), Hasan Bakhshi (Nesta)
Average rating: ***..
(3.11, 9 ratings)
The economy is in a mess. But good data can help fix it. Timely analysis of large data sets is beginning to provide insight into what's really happening to business growth, employment and prosperity. We'll look at some of the most exciting examples of how Big Data is changing the way we look at the economy, and how governments and businesses can use them to their advantage. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral
Alasdair Allan (Babilim Light Industries)
Average rating: ****.
(4.40, 5 ratings)
Everyday objects are becoming smarter. In ten years’ time, every piece of clothing you own, every piece of jewelry, and every thing you carry with you will be measuring, weighing and calculating your life. In ten years, the world — your world — will be full of sensors. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Advanced
Alexander Kagoshima (Pivotal), Noelle Sio (Pivotal)
Average rating: ***..
(3.75, 12 ratings)
In the future we will see huge growth in the amount of traffic data generated through built-in car sensors. This talk presents a case study of analytics on traffic and traffic light data. Methods will be presented that yield a deep understanding of traffic and its characteristics by analyzing past traffic data. These methods could be extended to predict traffic jams and optimize routing systems. Read more.
Add to your personal schedule
Location: King's Suite
Julie Steele (Silicon Valley Data Science)
Average rating: ***..
(3.83, 23 ratings)
Data science may seem like a revolutionary new field, but it is merely the latest incarnation of a tradition as old as we are: storytelling. And because it is part of such an inherently human practice, it is most valuable when it takes humanity into account. This talk explores how to use data and the techniques associated with data to build things that matter, by looking back to look forward. Read more.
Add to your personal schedule
Location: King's Suite
Average rating: ****.
(4.75, 36 ratings)
Keynote by James Burke, science and technology historian, futurist, and author. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
Sean Owen (Cloudera)
Average rating: ***..
(3.17, 6 ratings)
To keep analyzing more data, and faster, we need a secret weapon: cheating. In this brief survey, learn how you may be doing too much work in your analytics and learning processes, and how giving up a little accuracy can gain a lot of performance. With examples from Apache Hadoop, Mahout, and ML tools from Cloudera. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham
Hitesh Shah (Hortonworks), Siddharth Seth (Hortonworks)
Average rating: ***..
(3.40, 5 ratings)
Apache Hadoop has become popular from its specialization in the execution of MapReduce programs. However, it has been hard to leverage existing Hadoop infrastructure for various other processing paradigms such as real-time streaming, graph processing and message-passing. That was true until the introduction of Apache Hadoop YARN in Apache Hadoop 2.0. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
Noel Welsh (Underscore Consulting)
Average rating: ***..
(3.83, 6 ratings)
Analytics is useless if it doesn't lead to action. It is often desirable to put a computer in control of decision making. In this talk I'll discuss bandit algorithms, a class of decision making algorithms that solve a simple but widely applicable decision problem, and have found application in ad serving, content recommendation, and more. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Non-technical
Francine Bennett (Mastodon C), Duncan Ross (TES Global)
Average rating: ****.
(4.73, 11 ratings)
Being good is hard. Being evil is much more fun and gets you paid a lot more. We give a survey of the field of doing high-impact evil with data and analysis. We will look at some of the simplest things you can do to make the maximum (negative) impact on your friends, your business and the world. If you happen to learn something about doing good with data that will be your problem. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
Ulrich Rueckert (Datameer)
Average rating: ****.
(4.00, 5 ratings)
Even if one has big data, sometimes there is a lack of key data. This is a problem for predictive analytics: if there is only a limited amount of training material (e.g. user ratings, categorized documents), then it is hard to generate accurate models. The talk introduces new semi-supervised learning methods to overcome this problem by utilizing the vast amount of unlabeled data. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Tomer Shiran (Dremio)
Average rating: ***..
(3.78, 9 ratings)
Predictive Analytics has emerged as one of the primary use cases for Hadoop, leveraging various Machine Learning techniques to increase revenue or reduce costs. In this talk we provide real-world use cases from several different industries, and then discuss the open source technologies available to companies wishing to implement Predictive Analytics with Hadoop. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Intermediate
Aurélie Pols (Mind Your Privacy)
Average rating: ***..
(3.00, 1 rating)
Analytics best practices, data feeds and flows between tools and continents are put in parallel with legislation, showing which steps to undertake for legal compliance; how to train for data protection & assure minimal liability. It’s not about security, goes beyond the cookie debate, highlighting how the EU Personal Data Protection Regulation will influence analytics & how Privacy by Design helps Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Intermediate
Francois Mercier (mgrafit)
Average rating: **...
(2.75, 4 ratings)
To take the right decision, you need the right data. As complexity and abundance of data increase, the communication of data analysis results becomes more challenging. Grounding our talk in the pharma R&D arena, we illustrate how animated and interactive graphics can streamline communication on complex data analysis and inform decision making. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
Jurgen Van Gael (Rangespan, Ltd)
Average rating: ****.
(4.46, 13 ratings)
As data scientists, uncertainty is all around us: data is noisy, missing, wrong or inherently uncertain. In this talk I want to introduce a branch of statistics called Bayesian reasoning which is a unifying, consistent, logical and practically successful way of handling uncertainty. In short, I'd like to convince people that Bayes rule is the E=MC^2 of data science. Read more.
Add to your personal schedule
Location: Palace Suite - Blenheim Room Level: Non-technical
Sheldon Monteiro (SapientNitro), John Cain (SapientNitro), Thomas John Mcleish (SapientNitro)
Average rating: ***..
(3.00, 2 ratings)
78% of consumers use their smartphone while shopping in-store. What are they doing? More importantly, why? For all the media buzz around showrooming – look in-store, buy online - there is little insight on the issue. SapientNitro explains how key business questions drove hypotheses, data collection using novel instruments, and insights from analytic tools for testing and interpretive analysis. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Intermediate
James Stewart (Government Digital Service), James Abley (Government Digital Service)
Average rating: ****.
(4.67, 3 ratings)
The UK Government team behind the GOV.UK website talk about their work on the Performance Platform, a suite of services and a cultural shift taking people away from immensely detailed value stream maps about a call-centre and paper process (which might be an inherently 5-day long journey), to something that's digital, lightweight, fast and pleasant to use. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
Stefan Franczuk (Cognizant)
Average rating: **...
(2.14, 7 ratings)
How do you indentify duplicate data and why is it important? What do you do with such data when you find it? Data Matching using the mathematics of probability has been around since the 1950’s. But, how does it actually work? What is the mathematics behind it? How do probabilities allow us to identify duplicate entries? Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Paul Lam (uSwitch)
Average rating: **...
(2.71, 7 ratings)
What questions would you ask if you have a Facebook-like graph of what your customer likes, what they bought, and what they viewed? This is what we built at uSwitch by transforming flat data from Hadoop into Neo4J. This talk will walk through how we bridged big data and linked data technologies and the results of such amalgamation. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
Adam Kocoloski (Cloudant)
Average rating: ***..
(3.75, 4 ratings)
This talk will discuss how particle physics research can inform the field of data science. The importance of blind analyses and machine learning algorithms will be discussed as tools for filtering growing bodies of data as the big data trend continues. Read more.
Add to your personal schedule
Location: Palace Suite - Blenheim Room
Roger Magoulas (O'Reilly Media)
Average rating: **...
(2.00, 2 ratings)
How combining quantitative data analysis and qualitative social science work can complement each other, providing deeper understanding of behavior and open new doors of enquiry. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Intermediate
Piet Daas (Statistics Netherlands), Edwin De Jonge (Statistics Netherlands)
Average rating: ***..
(3.00, 1 rating)
Big Data are very interesting for official statistics. Results obtained by analyzing large amounts of Dutch traffic loop detection records, Mobile phone data and Dutch social media messages are discussed to illustrate this. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Markus Schmidberger (comSysto GmbH)
Average rating: ****.
(4.40, 5 ratings)
The tutorial will give a first introduction running Big Data Analyses in the statistical software R. R brings together latest Big Data technologies and latest high-level statistical methods. Bring your laptop, use your web browser to access a RStudio based analyses platform in the cloud and leave with a lot of new ideas for efficient Big Data analyses with R. Read more.

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts