Schedule: Full Listing

Below are the confirmed and scheduled talks at Strata + Hadoop World in Barcelona 2014
(schedule subject to change).

Customize Your Own Schedule

Click the calendar icon [calendar icon] next to each listing you want to attend. Then go to your personal schedule to generate your customized schedule.

Wednesday, 19-11-2014

8:30

Wednesday, 19-11-2014
Location: P1 Foyer
Coffee Break (1h)

9:30

Add to your personal schedule
Wednesday, 19-11-2014
Hadoop & Beyond
Location: 211
Paco Nathan (O'Reilly Media), Hossein Falaki (Databricks Inc.), Aaron Davidson (Databricks), Sameer Farooqui (Databricks), Alex Sicoe (Elsevier), Olivier Girardot (Lateral Thoughts)
Average rating: ***..
(3.00, 30 ratings)
Spark Camp: An Introduction to Apache Spark with Hands-on Tutorials. Read more.
Add to your personal schedule
SOLD OUT
Wednesday, 19-11-2014
Data-Driven Business
Location: 118-119
Average rating: ****.
(4.14, 7 ratings)
In a day of thought-provoking presentations and fast-paced panels, we’ll tackle how big data is challenging some of the underlying models of business strategy. Read more.
Add to your personal schedule
Wednesday, 19-11-2014
Design
Location: 120-121
Average rating: ****.
(4.05, 20 ratings)
Communicating Data Clearly describes how to draw clear, concise, accurate graphs that are easier to understand than many of the graphs one sees today. The tutorial emphasizes how to avoid common mistakes that produce confusing or even misleading graphs. Graphs for one, two, three, and many variables are covered as well as general principles for creating effective graphs. Read more.
Add to your personal schedule
Wednesday, 19-11-2014
Data Science
Location: 122-123
Garrett Grolemund (RStudio)
Average rating: ****.
(4.21, 14 ratings)
This tutorial will teach you how to streamline your code and your thinking when doing data science. Analysts often spend over 80% of their time preparing and exploring data sets before they begin more formal analysis work. In this tutorial, I will introduce a set of principles -- and R packages -- that make this work easier and faster. Read more.

14:00

Add to your personal schedule
Wednesday, 19-11-2014
Design
Location: 120-121
Sebastian Gutierrez (DashingD3js.com)
Average rating: ****.
(4.53, 15 ratings)
D3.js has a very steep learning curve. However, there are three main concepts that, once you get your head around them, will make the climb much easier. Focusing on these three main concepts, we will walk through many examples to teach the fundamental building blocks of D3.js. Read more.
Add to your personal schedule
Wednesday, 19-11-2014
Hadoop & Beyond
Location: 122-123
Mark Grover (Cloudera), Gwen Shapira (Confluent), Ted Malaska (Blizzard), Jonathan Seidman (Cloudera)
Average rating: ***..
(3.77, 13 ratings)
Are you looking for a deeper understanding of how to integrate components in the Apache Hadoop ecosystem to implement data management and processing solutions? Then this tutorial is for you. We'll provide a clickstream analytics example illustrating how to architect solutions with Apache Hadoop along with providing best practices and recommendations for using Hadoop and related tools. Read more.

17:30

Add to your personal schedule
Wednesday, 19-11-2014
Location: 116
Average rating: ****.
(4.00, 10 ratings)
If you had five minutes on stage what would you say? Would you talk about your latest passion? Describe the trip of a lifetime? Teach a hack? We’ll find out at in this high-energy, fast-paced, technology show-and-tell. Read more.

19:30

Add to your personal schedule
Wednesday, 19-11-2014
Location: Ocana
Average rating: ****.
(4.60, 5 ratings)
Come celebrate and meet other data enthusiasts with our combined Data Science Spain and Strata + Hadoop World Conference mixer. Read more.

Thursday, 20-11-2014

9:30

Add to your personal schedule
Thursday, 20-11-2014
Location: 211-212
Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Edd Wilder-James (Silicon Valley Data Science)
Average rating: ***..
(3.82, 17 ratings)
Program Chairs, Roger Magoulas, Doug Cutting, and Edd Dumbill, welcome you to the first day of keynotes. Read more.

9:40

Add to your personal schedule
Thursday, 20-11-2014
Location: 211-212
Mike Olson (Cloudera)
Average rating: ***..
(3.43, 28 ratings)
Mike Olson, CSO and Chairman, Cloudera Read more.

9:55

Add to your personal schedule
Thursday, 20-11-2014
Location: 211-212
Geoff McGrath (McLaren Applied Technologies)
Average rating: ***..
(3.62, 42 ratings)
McLaren Applied Technologies capitalises on the convergence of real-time data management, predictive analytics and simulation to produce high performance design of products and processes. In this talk we will describe how the approach of data-driven design can transform the way we go about creating and using products that are intrinsically intelligent and capable of adaptation Read more.

10:10

Add to your personal schedule
Thursday, 20-11-2014
Location: 211-212
Rod Smith (IBM Emerging Internet Technologies )
Average rating: **...
(2.92, 26 ratings)
Big Data & Analytics continues to be a disruptive business force. Are we entering a new phase – Big Data & Analytics 3.0? Read more.

10:20

Add to your personal schedule
Thursday, 20-11-2014
Location: 211-212
Alicia Asin (Libelium)
Average rating: ****.
(4.03, 38 ratings)
“Welcome to the era of big, bad, open information.” Analysts have predicted huge numbers of Internet-connected devices in our future for years now. We may dispute the number, but it is clear that the Internet of Things (IoT) will produce a colossal amount of data. Read more.

10:35

Add to your personal schedule
Thursday, 20-11-2014
Location: 211-212
Camille Fournier (Independent)
Average rating: ****.
(4.07, 44 ratings)
Camille Fournier, Head of Engineering, Rent the Runway Read more.

10:50

Add to your personal schedule
Thursday, 20-11-2014
Location: 211-212
David Richards (WANdisco, Inc.)
Average rating: **...
(2.77, 22 ratings)
WANdisco CEO and Co-Founder David Richards will explore ‘mission critical’ applications of Big Data across industry sectors, and highlight the importance of continuous availability, performance, and scalability in its application. Read more.

10:55

Add to your personal schedule
Thursday, 20-11-2014
Location: 211-212
Tim O'Reilly (O'Reilly Media, Inc.)
Average rating: ****.
(4.39, 41 ratings)
The network, new data capabilities, and mobile devices rich in sensors have created fresh and unconventional possibilities to rethink workflows and processes in the real world. To succeed in creating totally new services and rethinking old ones, we must first adopt fresh thinking about the design process, and how sensors and algorithms are driving significant changes in what is possible. Read more.

11:50

Add to your personal schedule
Thursday, 20-11-2014
Hadoop & Beyond
Location: 212
Average rating: ***..
(3.67, 6 ratings)
SAMOA is an open-source platform for mining big data streams that runs on several distributed stream processing engines (such as S4 and Storm), and includes streaming algorithms for the most common machine learning tasks such as classification and clustering. More info at http://samoa-project.net Read more.
Add to your personal schedule
Thursday, 20-11-2014
Sponsored
Location: 118-119
Rod Smith (IBM Emerging Internet Technologies )
Average rating: ***..
(3.00, 4 ratings)
Analytics 3.0 is all about exploiting big data for just-in-time results to impact business outcomes. But what's really changing? Read more.
Add to your personal schedule
Thursday, 20-11-2014
Business & Industry
Location: 120-121
Uwe Weiss (Blue Yonder)
Average rating: ****.
(4.00, 8 ratings)
While many companies are struggling to adopt big data and to unlock its potential, facing challenges of visualization and democratization of insight, a number of industry leaders are leapfrogging big data adoption and circumvent the analyst bottleneck by going straight to automation of core business processes. This requires overcoming a set of tough cultural, technical and scientific challenges. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Internet of Things
Location: 116
Marco Puts (Statistics Netherlands), Martijn Tennekes (Statistics Netherlands), Piet Daas (Statistics Netherlands)
Average rating: ***..
(3.25, 8 ratings)
We show how to use road sensor data for making reliable statistics about traffic intensities on the 3000 km long Dutch motorways. To use the data of 20.000 road sensors, dimension reduction is applied on the sensor data, which is highly redundant, for compensating the poor quality of the data. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Data Science
Location: 113
Simon Worgan (Jagex Ltd), Samuel Kerrien (RESEREC)
Average rating: ***..
(3.29, 14 ratings)
We will detail the development of a bi-directional event stream recommendation system in RuneScape, a massively multiplayer online game. By capturing a feature rich relationship between player and content we were able to train different 'flavours' of recommendation. Delivered in real-time these 'flavours' balance engagement, monetisation and enjoyment according to shifting business needs. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Sponsored
Location: 127-128
Frank Saeuberlich (Teradata)
How can big data make your journey to work better? In this case study we’ll explore how! Trains today are complex systems consisting of many embedded subsystems, which operate together with the overall goal of delivering a high quality transportation service... Read more.
Add to your personal schedule
Thursday, 20-11-2014
Hadoop Platform
Location: 114
Lars George (Cloudera), Jonathan Hsieh (Cloudera, Inc)
Average rating: *****
(5.00, 5 ratings)
This talk will show how HBase use-cases vary significantly from write-once, read many workloads storing events, to updatable entity workloads that use it as random read and write backing store. A discussion of how these use-cases can be classified, along with example, concludes the session. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Design
Location: 115
Jesús Gorriti (Fjord)
Average rating: ***..
(3.17, 6 ratings)
A lot of decisions are made for us based on data – but are we at risk of crossing over into the ‘uncanny valley’ of over-familiar personalisation? Designers need to focus on human elements, rather than allowing tech to lead the way. Jesus Gorriti will discuss SMART, a collaboration with the Harvard Medical School where the pediatric growth chart was reinvented using big data and design thinking. Read more.

12:30

Add to your personal schedule
Thursday, 20-11-2014
Location: Sponsor Pavilion (Banquet Room)
Average rating: ****.
(4.00, 4 ratings)
Birds of a Feather (BoF) discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

13:45

Add to your personal schedule
Thursday, 20-11-2014
Hadoop Platform
Location: 212
Guy Ernest (Amazon Web Services)
Average rating: ***..
(3.62, 8 ratings)
How to extend your toolbox to solve more big data problems with less effort. AWS provides a set of big data services that are elastic, scalable and highly available out of the box. Learning best practices and tips of how to integrate them together and with your architecture adds to your abilities to provide fast and reliable big data solutions. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Sponsored
Location: 118-119
Konstantin Shvachko (WANdisco)
This presentation addresses the geographic scalability of HDFS. It describes unique techniques implemented at WANdisco, which allow scaling HDFS over multiple geographically distributed data centers for continuous availability. . . Read more.
Add to your personal schedule
Thursday, 20-11-2014
Business & Industry
Location: 120-121
Melissa Santos (Big Cartel)
Average rating: ****.
(4.14, 21 ratings)
By having understandable abstractions for important data objects, Etsy has enabled employees across the whole company to actively take part in the collection and analysis of data. Converting data to objects allows us to more naturally convert analysis questions into code, and enforce business rules and definitions consistently. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Internet of Things
Location: 116
Assaf Araki (Intel)
Average rating: ***..
(3.33, 3 ratings)
IoT analytic brings an engineering and analytic complexity to the new market solutions.In this session we will share the learnings from the development of Intel's Cloud IoT Analytics Platform based on open source software.We will share learning from the product development and present use case in the Parkinson Disease research, leverages wearable sensors to monitor PD patient’s activities,24/7. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Data Science
Location: 113
Hossein Falaki (Databricks Inc.)
Average rating: ****.
(4.07, 14 ratings)
Apache Spark enables interactive analysis of big data by reducing query latency to the range of human interactions through caching. Additionally, Spark’s unified programming model and diverse programming interfaces enable smooth integration with popular visualization tools, such as ggplot and matplotlib. We can use these to perform visual exploratory big data analysis with Spark. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Sponsored
Location: 127-128
Matt Casters (Pentaho)
Average rating: ***..
(3.00, 2 ratings)
Lean how Pentaho's data integration and business analytics platform accelerates value from blended big data. * Leverage analytics -from data access and integration, through visualisation and predictive analytics– to deliver near real-time business insights. * Empower users to architect big data blends at the source AND stream for more complete and accurate analytics... Read more.
Add to your personal schedule
Thursday, 20-11-2014
Hadoop Platform
Location: 114
Uri Laserson (Cloudera)
Average rating: *****
(5.00, 2 ratings)
The advent of next-generation DNA sequencing technologies is revolutionizing life sciences research by routinely generating extremely large data sets. Big data tools developed to handle large-scale internet data (like Hadoop) will help scientists effectively manage this new scale of data, and also enable addressing a host of questions that were previously out of reach. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Design
Location: 115
Juliette Melton (New York Times)
Average rating: ***..
(3.67, 12 ratings)
Making meaning and value from data is not only a job for data scientists. Ethnographic researchers, subject matter experts, visual communication designers, and behavioral scientists all play key roles in the data journey. In this talk, we'll explore the data value chain, and share opportunities for how all of us -- whether data scientists or not -- can create and use data for insight and impact. Read more.

14:35

Add to your personal schedule
Thursday, 20-11-2014
Hadoop & Beyond
Location: 212
Claudiu Barbura (Ubix), David Talby (Atigeo)
Average rating: ***..
(3.31, 13 ratings)
Live demo of building an intelligent big data application from a web console. The tools and APIs behind are built on top of Spark, Shark, Tachyon, Mesos, Aurora, Cassandra, iPython and include: ELT pipeline (ingestion and transformation), data warehouse explorer, export to NoSql and generated APIs, predictive model building, training and publishing, dashboard UI, monitoring and instrumentation Read more.
Add to your personal schedule
Thursday, 20-11-2014
Sponsored
Location: 118-119
Michael Hausenblas (Red Hat)
We will discuss requirements for IoT data processing platforms incl. stream processing, dealing with raw device data, ensuring business continuity and to enforce security and privacy. We will dissect a number of IoT applications, such as a manufacturer offering pro-active maintenance, optimisations of waste management as well as streamlining a supply chain. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Business & Industry
Location: 120-121
Aaron Frazer (Seeking Alpha)
Average rating: ****.
(4.40, 5 ratings)
Demonstrating how to use Google Docs for a flexible, extensible, self-service front-end for your data warehouse. A simple, cheap, stable, flexible, user-friendly alternative to traditional tools. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Internet of Things
Location: 116
Jodok Batlogg (CRATE Technology GmbH)
Creating a backend for data intensive apps requires gluing several technologies together, which isn’t always simple, cheap or scalable. The world of sensor and IoT data, together with privacy concerns (mostly European), and the need to make contextual sense of it all, presents an opportunity to bring in the post-hadoop era and democratise data stores. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Data Science
Location: 113
Sean Owen (Cloudera)
Average rating: ****.
(4.00, 20 ratings)
Apache Spark is a popular new paradigm for computation on Hadoop. It's particularly effective for iterative algorithms relevant to data science like clustering, which can be used to detect anomalies in data. Curious? Get a taste of Spark MLlib, Scala and k-means clustering in this walkthrough of anomaly detection as applied to network intrusion, using the KDD Cup '99 data set. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Sponsored
Location: 127-128
James Kinley (Cloudera)
Key takeaways: The business drivers and objectives Multi-tenancy concepts and architecture Multi-tenancy features in EDH Multi-tenancy configuration in EDH Read more.
Add to your personal schedule
Thursday, 20-11-2014
Hadoop Platform
Location: 114
Garry Turkington (Improve Digital), Gabriele Modena (Improve Digital)
Average rating: **...
(2.29, 7 ratings)
Improve Digital is an ad tech company with large data volumes. This talk will explore our learnings from enhancing our established batch infrastructure with streaming near-realtime capabilities. In addition to discussing the impact on our architecture we will also describe how the work changed our approach to data lifecycle management. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Design
Location: 115
Kim Rees (Periscopic)
Average rating: **...
(2.75, 8 ratings)
We have the unfortunate tendency to fit our problems to the technology at hand. We should be looking for ways to bend technology to our problems...our big problems. Kim will take a long look into the future of data covering the controversial and hopeful areas of privacy, open data, hacking, ETL relief, latent machines, M2M, and mass crowdsourcing. Read more.

16:05

Add to your personal schedule
Thursday, 20-11-2014
Hadoop & Beyond
Location: 212
Paco Nathan (O'Reilly Media)
Average rating: ****.
(4.00, 10 ratings)
Apache Spark: Streaming case studies based on interviews with the dev teams, compared and contrasted with alternative open source projects, plus an open source example that demonstrates integration of Spark Streaming, Spark SQL, and Tachyon within a single app. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Sponsored
Location: 118-119
Marcello Bianchetti (UniCredit SPA)
Average rating: **...
(2.00, 1 rating)
This session will show the evolution of big data at UniCredit, from troubleshooting and application monitoring to the real-time analytics of ATMs, mobile banking, transactions and card usage. It will go under the hood of technical decisions in setting up a scalable and reliable architecture and dealing with a heterogeneous, geographical distributed and multi-layered environment. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Business & Industry
Location: 120-121
Alistair Croll (Solve For Interesting)
Average rating: ****.
(4.58, 12 ratings)
In this session, Alistair Croll, author of the best-selling Lean Analytics and chair of O’Reilly Strata, will share what he’s learned in a year of working with and interviewing intrapreneurs all over the world. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Internet of Things
Location: 116
Vincent Spruyt (Argus Labs), Ann Wuyts (Sentiance), Filip Maertens
Average rating: ****.
(4.67, 6 ratings)
We’ll explain how we’re automatically deriving a person’s mood and personality from mobile sensor data, and how we map and quantify these so that it becomes possible for technology to understand and work with ‘how we feel’. We'll cover the technical details of the data gathering setup, our data-mining and machine learning approaches, and the big-data processing architecture developed. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Data Science
Location: 113
Jeroen Janssens (Data Science Workshops)
Average rating: ***..
(3.73, 11 ratings)
The Data Science Toolbox is a new, open source virtual environment for data science. Its mission is to: (1) get data scientists started in a matter minutes, (2) enable teachers and authors to offer a custom virtual environment for their students and readers, and (3) encourage researchers to set up reproducible experiments. We'll discuss its importance, its technology, and its future. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Sponsored
Location: 127-128
bob middleton (Tableau Software)
Understanding the balance between Availability, Risk and Trust when dealing with big data analytics. As we approach the end of 2014 more people are talking “big data” than ever before, but what we are now calling big data is just a drop in the ocean. The danger we all face is that as we step back to consider just how beautifully BIG our data is getting, we start to lose control. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Hadoop Platform
Location: 114
Ameya Kantikar (Groupon)
Average rating: ***..
(3.55, 11 ratings)
Relevance and Personalization is crucial to building personalized local commerce experience at Groupon. Talk covers overview of the real time analytics infrastructure built using open source technologies such as Kafka- Storm - HBase- Redis which handles over 1 million data points per second in real time. Talk covers various solution choices, different techniques and strategies and more. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Design
Location: 115
Håkan Jonsson (Sony Mobile Communications)
Average rating: ***..
(3.50, 2 ratings)
Experiences from development of contextual applications, especially on data, design and privacy issues Read more.

16:55

Add to your personal schedule
Thursday, 20-11-2014
Hadoop & Beyond
Location: 212
Tim Berglund (DataStax)
Average rating: ****.
(4.25, 4 ratings)
An exploration of Apache Spark, an in-memory analytics framework that applies functional programming paradigms to provide ad-hoc analysis for distributed databases like Cassandra. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Sponsored
Location: 118-119
Joe Goldberg (BMC Software Inc.)
Enterprise IT Management tools play a key role in helping IT organizations deliver a high level of service to their customers and manage the ongoing operation of production and mission critical systems according to regulatory requirements and to meet the goals of the business... Read more.
Add to your personal schedule
Thursday, 20-11-2014
Business & Industry
Location: 120-121
Roy Sasson (Outbrain)
Average rating: ****.
(4.27, 15 ratings)
Outbrain serves 150 billion content recommendations to more than 500 million monthly users. Data tells us what’s driving the mindset of the crowed. But how do you analyze if the individual user finds value in recommendations? Why being satisfied with click-focused-metrics is dangerous for growth? We outline a 3-layer framework for Data Scientists to analyze user engagement, facing such challenges. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Government/Open Data
Location: 116
Robert Kaye (MusicBrainz)
Average rating: ****.
(4.25, 4 ratings)
Too many big data sets live in walled gardens and thus limit innovation to a few players. Creating open data sets levels the playing field and allows open source hackers to participate. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Data Science
Location: 113
Aaron Davidson (Databricks)
Average rating: ****.
(4.73, 11 ratings)
Apache Spark lets users build unified data analytic pipelines that combine diverse processing types. In this talk, we will leverage the versatility of Spark to combine SQL, machine learning, and realtime streaming processing to build a complete data pipeline in a single, short program which we will build up throughout the session. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Data Science
Location: 127-128
Get certified as a Spark Developer at Strata + Hadoop World in Barcelona. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Hadoop Platform
Location: 114
Marcel Kornacker (Cloudera)
Average rating: ***..
(3.50, 4 ratings)
Find out how to run real-time analytics over raw data without requiring a manual ETL process targeted at an RDBMS. This talk describes Impala’s approach to on-the-fly data transformation and its support for nested data; examples demonstrate how this can be used to query raw data feeds in formats such as text, JSON and XML, at a performance level commonly associated with specialized engines. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Data Science, Design
Location: 115
Garrett Grolemund (RStudio)
Average rating: ****.
(4.78, 18 ratings)
The ggvis package makes it easy to create interactive data graphics with R, with a declarative syntax similar to that of ggplot2. Like ggplot2, ggvis uses concepts from the grammar of graphics, but it also adds the ability to create interactive graphics and deliver them over the web. Read more.

17:45

Add to your personal schedule
Thursday, 20-11-2014
Hadoop & Beyond
Location: 212
Costin Leau (Elastic)
Average rating: **...
(2.89, 9 ratings)
A practical exploration of anomaly detection (from credit card fraud to incorrectly tagged movies) through harnessing the power of the 'inverted index' - the foundation of information retrieval systems. Use Hadoop, Elasticsearch and Spark to gain insights into your big data and discover 'what stands out' at scale. Read more.
Thursday, 20-11-2014
Location: 118-119
TBC
Add to your personal schedule
Thursday, 20-11-2014
Business & Industry
Location: 120-121
Kim Nilsson (Pivigo)
Average rating: ***..
(3.60, 10 ratings)
A data strategy is only as good as its execution. In the world of Data Science it has become increasingly apparent that business leaders focus on the technical aspects for success in data projects, when in fact the quality of the data team is key. I will in this talk share my experiences training data scientists, and give some key insights into how to build a high-performing Data Science team. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Business & Industry
Location: 116
Average rating: ***..
(3.75, 4 ratings)
We recently moved our entire data infrastructure to AWS: we now use Elastic MapReduce, Redshift and S3 for storage and processing. The talk describes the benefits and challenges of running in the cloud, how treating storage and processing as a utility allowed our small team to work on tools that democratized access to business analytics across the company and made us more happy in general. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Data Science
Location: 113
Alex Dorman (Magnetic), Michal Laclavik (Magnetic)
Average rating: ***..
(3.00, 2 ratings)
The need to categorize short text strings arises in many domains: online advertising, search engines, social networking, etc. In this session, we will share strategies for categorizing large volumes of queries and keywords in the advertising space, our successes with open document collections (Wikipedia, DBPedia, Freebase), and details on our solution using Hadoop and Solr. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Hadoop Platform
Location: 114
Average rating: ***..
(3.00, 2 ratings)
This session presents details on Cisco’s enterprise Hadoop architecture including roadmap details, centralized funding model that helped it get deployed quickly as well as its logical and physical views. Prominent use cases already in use at Cisco will also be covered. Read more.
Add to your personal schedule
Thursday, 20-11-2014
Design
Location: 115
Michael Freeman (University of Washington)
Average rating: ****.
(4.47, 17 ratings)
Complex relationships in big data require involved graphical displays which can be intimidating to users. This talk uses real world examples to identify confusing elements in online visualizations, and articulates a framework for using animation and story-telling to amplify their impact and usability. Tangible and generalizable techniques applicable across fields will be presented. Read more.

18:25

Add to your personal schedule
Thursday, 20-11-2014
Location: Sponsor Pavilion (Banquet Room)
Average rating: ****.
(4.67, 3 ratings)
Join your fellow big data enthusiasts at the Strata + Hadoop World Sponsor Hall Reception on Thursday, 20 November. Read more.

Friday, 21-11-2014

9:30

Add to your personal schedule
Friday, 21-11-2014
Location: 211-212
Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Edd Wilder-James (Silicon Valley Data Science)
Average rating: ***..
(3.00, 2 ratings)
Strata Barcelona Program Chairs, Roger Magoulas, Doug Cutting, and Edd Dumbill, welcome you to the second day of keynotes. Read more.

9:35

Add to your personal schedule
Friday, 21-11-2014
Location: 211-212
Doug Cutting (Cloudera)
Average rating: ***..
(3.00, 19 ratings)
In this presentation Doug Cutting, Cloudera's Chief Architect, will discuss how we might both reap the benefits of data while avoiding its perils. Read more.

9:45

Add to your personal schedule
Friday, 21-11-2014
Location: 211-212
Foster Provost ( NYU | Stern )
Average rating: ****.
(4.00, 26 ratings)
As we've moved from simple statistical analyses of big data to decision-making based on big data and data-science models, we face an ironic "dirty secret." It is becoming increasingly difficult to understand why particular decisions have been made. In many applications, data-driven models now take as input massive numbers of "signals", including words in text, locations frequented... Read more.

10:00

Add to your personal schedule
Friday, 21-11-2014
Location: 211-212
Martin Willcox (Teradata)
Average rating: ***..
(3.67, 33 ratings)
Drinking from the data lake is tempting, but what is it really? How did we get here, and what lessons can we learn from previous technologies? It’s tempting to see this as the solution to data silos, but what are the costs? Martin Willcox provides a practical guide to help you understand the realities… Read more.

10:10

Add to your personal schedule
Friday, 21-11-2014
Location: 211-212
Majken Sander (TimeXtender)
Average rating: ***..
(3.58, 38 ratings)
Open data isn't just about waste pickup schedules and reporting pot holes—it can hold real monetary value for everyday business. Whether it's supply chain enhancement or improved customer segmentation, open data holds unexpected value for everyone. Read more.

10:25

Add to your personal schedule
Friday, 21-11-2014
Location: 211-212
Jordan Tigani (Google )
Average rating: ****.
(4.31, 36 ratings)
How can you turn raw data into predictions? How can you take advantage of both cloud scalability and state-of-the-art Open Source Software? This talk shows how we built a model that correctly predicted the outcome of 14 of 16 games in the World Cup using Google's Cloud Platform and tools like iPython and StatsModels. Read more.

10:40

Add to your personal schedule
Friday, 21-11-2014
Location: 211-212
Rodney Mullen (Almost Skateboards)
Average rating: ***..
(3.22, 32 ratings)
Ever do something perfectly in practice, only to have it blow up as soon as you try it when it really counts? This little phenomenon sends skaters to the hospital on a regular basis, mainly because controlled environments usually can’t evoke the depths of human responses. Read more.

10:50

Add to your personal schedule
Friday, 21-11-2014
Location: 211-212
Francine Bennett (Mastodon C)
Average rating: ***..
(3.30, 23 ratings)
Exploiting big data and analytics through the whole organisation is now business as usual for retail and online businesses. But cities and buildings also create a whole lot of data, which could change lives for better or for worse. This talk explores what’s happening right now in big data and analytics for cities and buildings, where it might head, and what we might want from it all. Read more.

11:00

Add to your personal schedule
Friday, 21-11-2014
Location: 211-212
Ben Okri (Self)
Average rating: ***..
(3.22, 36 ratings)
This talk explores the critical importance of storytelling to science and what we can learn from that relationship. Read more.

11:50

Add to your personal schedule
Friday, 21-11-2014
Government/Open Data
Location: 212
Francine Bennett (Mastodon C), Duncan Ross (TES Global)
Average rating: ****.
(4.20, 5 ratings)
The data philanthropy movement is growing in Europe. DataKind is actively working to expand it's presence, and DataKind UK is now in it's second year, running successful events and projects. This is the story of the last two events - highlighting how charities have joined the data revolution. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop & Beyond
Location: 120-121
Shay Banon (Elasticsearch)
Average rating: ****.
(4.67, 3 ratings)
Thanks to technologies like NoSQL and Hadoop, organizations can store massive amounts of data that’s important to their business. Now the challenge is how to extract actionable insights from it. This session will explore why search is the foundation to gain value from “big data” across your business - from marketing, to product, to backend infrastructure - highlighting a few real-world examples. Read more.
Add to your personal schedule
Friday, 21-11-2014
Privacy, Law & Ethics
Location: 116
Aurélie Pols (Mind Your Group)
Borrowing from Spanish information security best practices and in the light of increasing data breach regulations, the presentation examines how data flows should ideally be defined and secured in order to assure accountability through an entire data lifecycle. Read more.
Add to your personal schedule
Friday, 21-11-2014
Business & Industry
Location: 113
Daniel Waisberg (Google)
Average rating: **...
(2.82, 22 ratings)
In this presentation Daniel will discuss a process that can be used to go from data to stories. He will talk about ways to define the audience, create hypotheses, sketch data, analyze and build a story around it. The presentation includes the connecting dots game, Hulk, comics, architecture and other stories. Read more.
Add to your personal schedule
Friday, 21-11-2014
Sponsored
Location: 127-128
Greg Kleiman (Red Hat)
It’s been twenty years since Red Hat first launched Linux. Since then Red Hat has fueled the rapid adoption of open source technologies. As Big Data transitions into enterprise mode, Red Hat is again poised to facilitate the innovation and communities needed to empower multiple data stakeholders across your organization so you can truly open the possibilities of your data. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop Platform
Location: 114
Abed Ajraou (Solocal)
Average rating: **...
(2.83, 6 ratings)
Solocal, the French company behind PagesJaunes.fr, recently put Big Data and Hadoop into action to replace its traditional BI infrastructure. In this session, you will learn why and how that was done. Read more.
Add to your personal schedule
Friday, 21-11-2014
Data Science
Location: 115
Average rating: ****.
(4.73, 11 ratings)
Linking data to create broader data sets can dramatically improve analysis results, but what if the data sets lack common identifiers? Similarly, duplicates in data is very common, and can seriously skew analysis results. This talk covers common techniques from record linkage research for solving this, as well as an open source tool implementing those techniques, and real-world examples. Read more.

12:30

Add to your personal schedule
Friday, 21-11-2014
Location: Sponsor Pavilion (Banquet Room)
Average rating: *****
(5.00, 1 rating)
Birds of a Feather (BoF) discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

13:45

Add to your personal schedule
Friday, 21-11-2014
Government/Open Data
Location: 212
Lisa Green (Common Crawl), Peter Adolphs (Neofonie)
Average rating: **...
(2.00, 1 rating)
The Web in itself forms a versatile dataset capable of powering most diverse applications. In our joint talk, we will present Common Crawl, an immense collection of Web data made freely available to anyone. We will then introduce MIA and show how this Cloud-based analysis platform and marketplace for data and algorithms enables users to perform analytical tasks on datasets at Web scale. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop & Beyond
Location: 120-121
Kathleen Ting (Cloudera)
Average rating: ****.
(4.50, 4 ratings)
The next generation of MapReduce, YARN, has widely touted job throughput and Apache Hadoop cluster utilization benefits. Less known are the pitfalls littering the migration path to YARN. Learn from our extensive field experience to avoid those pitfalls and get your YARN cluster configured right the first time. Read more.
Add to your personal schedule
Friday, 21-11-2014
Privacy, Law & Ethics
Location: 116
Yves-Alexandre de Montjoye (Imperial College London | MIT Media Lab)
Average rating: ****.
(4.91, 11 ratings)
We're living in an age of big data, a time when most of our movements and actions are collected and stored in real time. These data offer unprecedented insights on how we behave as a species. Mathematical analysis of location data however reveals how unique our individual behavior is and how this behavior puts fundamental constraints on our privacy. Read more.
Add to your personal schedule
Friday, 21-11-2014
Business & Industry
Location: 113
Marcelo Soria-Rodriguez (BBVA Data & Analytics)
Average rating: ***..
(3.90, 10 ratings)
In this talk we will present practical cases on innovating with data in retail banking, a conservative industry. From initial idea to embracing open as a fundamental culture change, the talk will walk the audience through insights, lessons learned and practical examples on how to change the way value is delivered to customers. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop & Beyond
Location: 127-128
Paco Nathan (O'Reilly Media), Aaron Davidson (Databricks), Sameer Farooqui (Databricks), Hossein Falaki (Databricks Inc.), Alex Sicoe (Elsevier), Olivier Girardot (Lateral Thoughts)
Average rating: ****.
(4.00, 2 ratings)
Join the Spark Team for an informal question and answer session. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop Platform
Location: 114
tod davis (Children's Healthcare of Atlanta)
Average rating: *****
(5.00, 3 ratings)
Children’s Healthcare of Atlanta in the US implemented Hadoop to capture and analyze vital sign sensor data in the ICU. Its goal is to understand the impact of stressful procedures, to reduce pain, and to improve outcomes in their most fragile patients. This session will highlight the challenges of pediatric healthcare data management and the strategies used to make this project a success. Read more.
Add to your personal schedule
Friday, 21-11-2014
Data Science
Location: 115
Mikio Braun (Zalando SE)
Average rating: ****.
(4.19, 16 ratings)
Processing huge volume event streams in realtime poses quite some challenges. Based on our experience with social media data and realtime user interaction data, we discuss our experience with building such systems starting with a single computer. We have distilled this experience in a number of realtime data analysis patterns, which solve key aspects of such systems. Read more.

14:35

Add to your personal schedule
Friday, 21-11-2014
Business & Industry
Location: 212
Amir Halfon (ScalingData)
This session will examine the challenges and opportunities associated with Big Data in a regulated environment, and the use of a new generation of data management technology to address them. Several case studies will be presented based on real-life production deployments. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop & Beyond
Location: 120-121
nick dimiduk (Hortonworks, Inc)
Average rating: ****.
(4.17, 6 ratings)
Your application is out-growing its database, you've started shopping NoSQL options. Maybe you've adopted Hadoop into your Data Warehouse. You've heard HBase might be an appropriate technology, but you need to know more. This talk is for you. To understand its use, first understand how it works. This talk explores the design of HBase and its critical paths to ground an understanding of its use. Read more.
Add to your personal schedule
Friday, 21-11-2014
Privacy, Law & Ethics
Location: 116
Jeremy Heffner (Azavea)
Average rating: ****.
(4.50, 8 ratings)
We often face the need to analyze the count of discrete events which occur at a specific time and place whether they be crime events, taxi requests, or phone calls. Forecasting these space-time events brings particular challenges: finding suitable tools for geographic processing and techniques for modeling the data. The session will cover the lessons learned in building such a system. Read more.
Add to your personal schedule
Friday, 21-11-2014
Business & Industry
Location: 113
David Boyle (BBC Worldwide), Amanda Hill (BBC Worldwide), Dan Jabry (CrowdEmotion)
Average rating: ***..
(3.89, 9 ratings)
Emotions are messy and complicated. That meant we had to develop new data science and research methods to understand emotional engagement with out TV shows. But it also meant we had to be careful about how we brought that data to bear in a creative business like BBC Worldwide. Hear about how data science is making a big difference to how we build brands around the world. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop Platform
Location: 114
Neil Martin (comparethemarket.com), Rob Siwicki (comparethemarket.com)
Average rating: ***..
(3.00, 1 rating)
The talk will provide insight into how to achieve coordinated technological change in a highly agile IT organization; an organisational function that supports one of the UK’s most recognisable brands. Discover valuable lessons learned and begin to understand how your organization may want to take first steps in its engagement proving and implementing Big Data technology. Read more.
Add to your personal schedule
Friday, 21-11-2014
Data Science
Location: 115
Shawn Scully (Dato)
Average rating: ***..
(3.95, 19 ratings)
One of the most exciting areas in Big Data is the development of new data products; predictive applications used to drive product recommendations, predict machine failures, forecast airfare, social match-make, identify fraud, predict disease outbreaks, and repurpose pharmaceuticals. In this talk, I’ll share the trends we’re seeing in predictive application development, show how to.... Read more.

16:05

Add to your personal schedule
Friday, 21-11-2014
Government/Open Data
Location: 212
Daniele Quercia (Bell Labs)
Average rating: ****.
(4.89, 9 ratings)
How can we change architecture to design more for the people and less for the architects? We present crowd-based solutions with which urban planners can get valuable information about what kind of urban design is attractive to the people. This leads to GPS systems that show you the "most beautiful" path to your destination and to indicators about the beauty of a city. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop & Beyond
Location: 120-121
Cindy Lamm (comSysto GmbH), Michael Hausenblas (Red Hat)
Average rating: ***..
(3.89, 9 ratings)
We will describe our experiences in implementing a full-scale, data-driven application applied to a large anonymised dataset from the mobile operator Telefonica using Map-Reduce Our project was unusual in the breadth of techniques used and also in the diversity in our goals. We will provide our perspective based on our project and examine how upcoming technologies would have impacted our efforts Read more.
Add to your personal schedule
Friday, 21-11-2014
Privacy, Law & Ethics
Location: 116
Joshua Koran (Turn)
Average rating: ***..
(3.67, 3 ratings)
Before Edward Snowden disclosed the US intelligence services’ digital surveillance, marketers had been collecting, aggregating and inferring behavioral profiles on consumers around the world. This talk describes the chief technologies firms use to transform online activities into target audience segments, as well as the current and proposed regulations and public policies being considered. Read more.
Add to your personal schedule
Friday, 21-11-2014
Business & Industry
Location: 113
Average rating: ***..
(3.67, 3 ratings)
Everyone knows that creating value from big data requires the right skills, but what does this mean in practice? We present findings of a research project where we measure the skills needs of data-driven companies in 6 sectors, quantify the impact of data talent on company performance, and identify good practices to find, create value from and retain data talent. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop Platform
Location: 114
Ankit Tharwani (Barclays Bank)
Average rating: ****.
(4.00, 7 ratings)
With traditional revenue sources maturing and new entrants at the gate, data can be a powerful differentiator. This session will present the challenges involved in deploying the right technologies and the change management culture at the foundations of new info-led propositions. Read more.
Add to your personal schedule
Friday, 21-11-2014
Data Science
Location: 115
Ofer Ron (LivePerson)
Average rating: ***..
(3.69, 13 ratings)
Many people assume that researching/designing a predictive modeling algorithm is the hard part of building a predictive modeling system over Big Data. We will focus on the far less romantic infrastructure needed to support a system, by reviewing the necessary components and the common pitfalls encountered when trying to automate both horizontally and vertically scalable systems. Read more.

16:55

Add to your personal schedule
Friday, 21-11-2014
Government/Open Data
Location: 212
Alex Priem (Statistics Netherlands), Edwin De Jonge (Statistics Netherlands)
Average rating: ***..
(3.00, 4 ratings)
Histograms and heatmaps are often used to summarize large data sets. We provide guidelines for using them effectively and efficiently. We illustrate this using the complete Dutch income tax data by looking at distributions in wealth and income. Analysis of this data set is complicated by the large amount of variables. We use clustering techniques to automatically find relevant patterns. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop & Beyond
Location: 120-121
John Akred (Silicon Valley Data Science)
Average rating: ***..
(3.75, 4 ratings)
Creating a data architecture involves many moving parts. By examining the data value chain, from ingestion through to analytics, we will explain how the various parts of the Hadoop and big data ecosystem fit together to support batch, interactive and realtime analytical workloads. Read more.
Add to your personal schedule
Friday, 21-11-2014
Privacy, Law & Ethics
Location: 116
Joerg Blumtritt (Datarella)
Average rating: ***..
(3.83, 6 ratings)
Smartphones carry mighty sensors: GPS, wifi, acceleration, gyroscope, microphone, magnetic field, etc., tracking behavior and environment, giving answer to complex questions like "is the user driving in a car or riding on a train?" We will show cases from travel industry, sports retail, and health. We will propose, how to use such intrusive technology in an ethically correct way. Read more.
Add to your personal schedule
Friday, 21-11-2014
Business & Industry
Location: 113
Carme Artigas (Synergic Partners)
Average rating: ****.
(4.80, 5 ratings)
Datafication is a new term used to describe the process of turning an existing business into a "data business".This is affecting many industry and services sectors.For this,data monetization strategies must be in place. New data sources(open data..) have a key role as well as the need to protect data privacy. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop Platform
Location: 114
Georgos Siganos (Qatar Computing Research Institute)
Average rating: ***..
(3.00, 1 rating)
Graph mining of large highly dynamic graphs is a challenging algorithmic and programming task requiring custom algorithms. Additionally, existing graph mining architectures are designed for batch workloads. The RT-Giraph open source project simplifies online graph mining by maintaining the programming and algorithmic simplicity of Apache Giraph, while supporting dynamic graphs. Read more.
Add to your personal schedule
Friday, 21-11-2014
Data Science
Location: 115
Ted Dunning (MapR Technologies)
Average rating: ****.
(4.88, 17 ratings)
Computing various quantities such as medians or the number of unique elements requires a lot of time or a lot of memory or both. It is, however, possible to get really close to the right answer with much less time and much less memory. Such algorithms can be simpler than you might expect. I will describe these and show how they can be applied to applications like anomaly detection. Read more.

17:45

Add to your personal schedule
Friday, 21-11-2014
Government/Open Data
Location: 212
Bart van Leeuwen (Netage)
Average rating: ****.
(4.00, 1 rating)
It is 2:30 in the night, you are barely awake and racing through the city center of Amsterdam while you hear a 120db horn screaming overhead. You are in a fire truck. Within 4 minutes you will be facing a potential life threatening situation. How do you deal with all the data that can make your work safer in a environment like that? Learn about how we started solving these problems in a agile way. Read more.
Add to your personal schedule
Friday, 21-11-2014
Hadoop & Beyond
Location: 120-121
Jim Scott (MapR Technologies, Inc.)
Average rating: ****.
(4.25, 8 ratings)
Apache Mesos, Apache Hadoop, Apache Spark + Custom Enterprise Applications: This stack combined is greater than the sum of each of the pieces of this stack. Couple all of that with custom enterprise applications, and the data center turns into a well-oiled machine. When combined, this software stack delivers unlimited flexibility for the entire data center. Read more.
Add to your personal schedule
Friday, 21-11-2014
Privacy, Law & Ethics
Location: 116
Anne-Lise Bouyer (Journalism++)
Average rating: ****.
(4.00, 2 ratings)
Breaking news from data that's already published, that's efficient Open Source Intelligence applied to journalism. The tools and methodologies available today make it possible to go big on a budget. Read more.
Friday, 21-11-2014
Location: 113
TBC
Add to your personal schedule
Friday, 21-11-2014
Tomas Petricek (University of Cambridge)
Average rating: ****.
(4.33, 6 ratings)
The world of data is inherently diverse and "messy". Wouldn't it be nice if your programming language was aware of the external data sources that you are accessing? In this talk, we look at doing data science with F#, which provides unique way of integrating external data sources and libraries. You can access data, but also Matlab scripts or R packages, all from a single environment. Read more.
Friday, 21-11-2014
Location: 115
TBC