Schedule: Tools & Technology sessions

Everything you need to know about big data and analytics in practice. From Hadoop and Hive, to real-time data and predictive analytics, learn new tools and techniques first-hand from the developers at the cutting edge of data.

Add to your personal schedule
Location: King's Suite - Sandringham Level: Non-technical
Patrick Wendell (Databricks)
Average rating: ****.
(4.67, 12 ratings)
As big data analytics evolves beyond simple batch jobs, there is a need for both lower-latency processing (interactive queries and steam processing) and more complex analytics (e.g. machine learning, graph algorithms). This talk will introduce Spark and Shark, popular open source projects from Berkeley that address this need through an optimized runtime engine and in-memory computing capabilities. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Non-technical
Average rating: ***..
(3.50, 2 ratings)
We're getting better all the time. See how the Cato Institute used responsive design and D3.js to show how human development indicators improve as economic freedom spreads. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Non-technical
Average rating: ***..
(3.31, 16 ratings)
How Stuff Spreads looks at how two recent memes spread online: Gangnam Style vs Harlem Shake. The talk dissects the memes through the lens of big data to show what made them go viral, what do they have in common, how quantitative and qualitative analysis have to come together to craft insights and tell a story, and finally how to predict future memes and create a data-driven content strategy. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Advanced
Amy Unruh (Google), Felipe Hoffa (Google), Alasdair Allan (Babilim Light Industries)
Average rating: **...
(2.19, 16 ratings)
In May 2013, the O'Reilly Data Sensing Lab collaborated with the Google Cloud Platform and Device Cloud by Etherios, to deploy a network of hundreds of environmental sensors at Google I/O. Learn how the Google Cloud Platform was used to build an end-to-end, scaleable, and high-throughput pipeline for data collection, processing, and analysis. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Bruce Durling (Mastodon C)
Average rating: **...
(2.80, 10 ratings)
It has been said by many that 80% data science is scrubbing data. In this talk we'll cover how you can use Cascalog to scrub, transform, manipulate and mangle data into the formats you need, fix things that are wrong and filter out things that are broken. Clojure and Cascalog together provide fantastic tools for this. Learn about using Hadoop with the messy data that exists in the real world. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Intermediate
Mano Marks (Google, Inc. ), Kurt Schwehr (Google, Inc.)
Average rating: **...
(2.30, 10 ratings)
Many big data solutions focus on large data analysis that happens in data centers. Or they focus on data visualization in the browser. When you combine both of these techniques, you get amazing and expressive power. This talk will show how to use the Google Maps API with WebGL and Google Big Query, Cloud Storage, App Engine and Compute Engine to deliver amazing, responsive visualizations. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Non-technical
Average rating: *....
(1.58, 12 ratings)
How do we know what we know? Increasingly discoveries are made from computed data, possibly sourced from the internet. If we are to trust these discoveries, how conclusions are reached is critical. Examples from work in Big Data analytics infrastructure for life sciences and social media analysis will illustrate the key issues. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
yodit stanton (opensensors.io)
Average rating: ***..
(3.38, 8 ratings)
Medical treatments have have come a long away in the last couple of decades. On the other hand, we could be doing a lot better in monitoring people within their own homes between hospital visits using sensors. Sensors combined with Big Data technologies are set to bring about profound changes for the future of health and social care. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Isabel Drost (Apache Software Foundation/ Nokia Gate 5 GmbH)
Average rating: *....
(1.88, 8 ratings)
"In order to classify documents, simply first convert them to vectors, train, test and finally apply the model." Sounds easy - in theory. Converting documents to vectors usually is the tricky part. This talk walks you through the steps necessary to convert your text documents into feature vectors that Mahout classifiers can use including a few anecdotes on drafting domain specific features. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Rajappa Iyer (LinkedIn)
Average rating: ****.
(4.67, 3 ratings)
To feed LinkedIn's data-driven products, we need to run a complex graph of ETL workflows that deliver the right data to the right systems reliably on a 24x7 basis. To achieve this goal, we have developed a metadata system that captures process dependencies, data dependencies, and execution histories -- this system also lays the foundation for a combined dataflow and workflow engine. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Tomer Shiran (Dremio)
Average rating: ***..
(3.78, 9 ratings)
Predictive Analytics has emerged as one of the primary use cases for Hadoop, leveraging various Machine Learning techniques to increase revenue or reduce costs. In this talk we provide real-world use cases from several different industries, and then discuss the open source technologies available to companies wishing to implement Predictive Analytics with Hadoop. Read more.
Add to your personal schedule
Location: Palace Suite - Blenheim Room Level: Intermediate
Pascal Clarysse (TomTom)
Average rating: ****.
(4.00, 2 ratings)
Learn how hadoop is helping TomTom to make fresher maps by continuously processing the incoming GPS data and how hbase is used to present that data to an Operator Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Alan Gates (Hortonworks)
Average rating: ****.
(4.40, 5 ratings)
People want more out of Hive. They want it to be fast, useful, and connect to their tools. Work is being done to reduce start up time, improve the optimizer, extend it to use Tez, process records 50x faster, add support for functions like RANK, add subqueries, and add standard SQL datatypes. We will review this work plus show current benchmarks. Read more.
Add to your personal schedule
Location: Palace Suite - Buckingham Room Level: Intermediate
Francois Mercier (mgrafit)
Average rating: **...
(2.75, 4 ratings)
To take the right decision, you need the right data. As complexity and abundance of data increase, the communication of data analysis results becomes more challenging. Grounding our talk in the pharma R&D arena, we illustrate how animated and interactive graphics can streamline communication on complex data analysis and inform decision making. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Neil Ferguson (NICE Systems)
Average rating: ***..
(3.33, 6 ratings)
NICE Systems is a leading provider of Customer Experience Management software, providing real-time offer management and predictive analytics applications based on HBase. We have recently migrated to HBase from our own custom-built data store, and in this session we will share the challenges we overcame getting HBase to perform to our demanding performance requirements. Read more.
Add to your personal schedule
Location: Palace Suite - Blenheim Room Level: Non-technical
Sheldon Monteiro (SapientNitro), John Cain (SapientNitro), Thomas John Mcleish (SapientNitro)
Average rating: ***..
(3.00, 2 ratings)
78% of consumers use their smartphone while shopping in-store. What are they doing? More importantly, why? For all the media buzz around showrooming – look in-store, buy online - there is little insight on the issue. SapientNitro explains how key business questions drove hypotheses, data collection using novel instruments, and insights from analytic tools for testing and interpretive analysis. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Paul Lam (uSwitch)
Average rating: **...
(2.71, 7 ratings)
What questions would you ask if you have a Facebook-like graph of what your customer likes, what they bought, and what they viewed? This is what we built at uSwitch by transforming flat data from Hadoop into Neo4J. This talk will walk through how we bridged big data and linked data technologies and the results of such amalgamation. Read more.
Add to your personal schedule
Location: King's Suite - Sandringham Level: Intermediate
Markus Schmidberger (comSysto GmbH)
Average rating: ****.
(4.40, 5 ratings)
The tutorial will give a first introduction running Big Data Analyses in the statistical software R. R brings together latest Big Data technologies and latest high-level statistical methods. Bring your laptop, use your web browser to access a RStudio based analyses platform in the cloud and leave with a lot of new ideas for efficient Big Data analyses with R. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
Tom White (Cloudera)
Average rating: ***..
(3.00, 9 ratings)
In this tutorial we'll use the Cloudera Development Kit (CDK) to build a Java web app that logs application events to Hadoop, and then run ad hoc and scheduled queries against the collected data. Read more.
Add to your personal schedule
Location: King's Suite - Balmoral Level: Intermediate
Mischa Tuffield (PeerIndex), Davide Palmisano (PeerIndex Ltd.), Enno Shioji (PeerIndex)
Average rating: ***..
(3.50, 6 ratings)
This tutorial will describe how to process real-time streams and using the open-source Storm framework. We will define Storm's core concepts whilst focusing on creating a simple topology that counts, in real-time, key-words and hashtags seen in Twitter's public (1%) feed. Read more.

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts