Strata + Hadoop World in Barcelona Schedule

Below are the confirmed and scheduled talks at Strata + Hadoop World in Barcelona 2014
(schedule subject to change).

Customize Your Own Schedule

Click the calendar icon [calendar icon] next to each listing you want to attend. Then go to your personal schedule to generate your customized schedule.

212
Add Using Data for GOOD to your personal schedule
11:50 Using Data for GOOD Francine Bennett (Mastodon C), Duncan Ross (TES Global)
Add Crunching Common Crawl with the Cloud-Based MIA Platform to your personal schedule
13:45 Crunching Common Crawl with the Cloud-Based MIA Platform Lisa Green (Common Crawl), Peter Adolphs (Neofonie)
Add Data Governance for Regulated Industries to your personal schedule
14:35 Data Governance for Regulated Industries Amir Halfon (ScalingData)
Add Patterns and Metapatterns in Income Tax Data to your personal schedule
16:55 Patterns and Metapatterns in Income Tax Data Alex Priem (Statistics Netherlands), Edwin De Jonge (Statistics Netherlands)
113
Add Telling Meaningful Stories With Data  to your personal schedule
11:50 Telling Meaningful Stories With Data Daniel Waisberg (Google)
Add The Unlikely Match between Financial Data and Open Innovation to your personal schedule
13:45 The Unlikely Match between Financial Data and Open Innovation Marcelo Soria-Rodriguez (BBVA Data & Analytics)
Add Welcoming Machine Learning into the Heart of a Creative Business to your personal schedule
14:35 Welcoming Machine Learning into the Heart of a Creative Business David Boyle (BBC Worldwide), Amanda Hill (BBC Worldwide), Dan Jabry (CrowdEmotion)
Add How to Datafy Your Business to your personal schedule
16:55 How to Datafy Your Business Carme Artigas (Synergic Partners)
17:45 TBC
114
Add From BI to Big Data at Solocal to your personal schedule
11:50 From BI to Big Data at Solocal Abed Ajraou (Solocal)
Add Hadoop and Pediatric Healthcare: Bedside Vitals and Better Babies to your personal schedule
13:45 Hadoop and Pediatric Healthcare: Bedside Vitals and Better Babies tod davis (Children's Healthcare of Atlanta)
Add Leading Change in Data Engineering to your personal schedule
14:35 Leading Change in Data Engineering Neil Martin (comparethemarket.com), Rob Siwicki (comparethemarket.com)
Add How Big Data is Redefining Banking to your personal schedule
16:05 How Big Data is Redefining Banking Ankit Tharwani (Barclays UK)
Add RT-Giraph: Online Graph Mining Simplified to your personal schedule
16:55 RT-Giraph: Online Graph Mining Simplified Georgos Siganos (Qatar Computing Research Institute)
Add Integrating Big Data into a Programming Language to your personal schedule
17:45 Integrating Big Data into a Programming Language Tomas Petricek (University of Cambridge)
115
Add Linking Data Without Common Identifiers to your personal schedule
11:50 Linking Data Without Common Identifiers Lars Marius Garshol (Bouvet)
Add Realtime Data Analysis Patterns to your personal schedule
13:45 Realtime Data Analysis Patterns Mikio Braun (Zalando SE)
17:45 TBC
120-121
Add How Search Can Save Your Hadoop Investment and More to your personal schedule
11:50 How Search Can Save Your Hadoop Investment and More Shay Banon (Elasticsearch)
Add Yarns about YARN: Migrating to MapReduce v2 to your personal schedule
13:45 Yarns about YARN: Migrating to MapReduce v2 Kathleen Ting (Cloudera)
Add HBase for Architects to your personal schedule
14:35 HBase for Architects nick dimiduk (Hortonworks, Inc)
Add Data, Data Everywhere and Only Map-Reduce to Drink to your personal schedule
16:05 Data, Data Everywhere and Only Map-Reduce to Drink Cindy Lamm (comSysto GmbH), Michael Hausenblas (Red Hat)
Add From Pilot to Production: Building a Data Infrastructure to your personal schedule
16:55 From Pilot to Production: Building a Data Infrastructure John Akred (Silicon Valley Data Science)
127-128
Add For Red Hat, it's 1994 all over again to your personal schedule
11:50 For Red Hat, it's 1994 all over again Greg Kleiman (Red Hat)
Add Apache Spark: Ask Us Anything to your personal schedule
13:45 Apache Spark: Ask Us Anything Paco Nathan (O'Reilly Media), Aaron Davidson (Databricks), Sameer Farooqui (Databricks), Hossein Falaki (Databricks Inc.), Alex Sicoe (Elsevier), Olivier Girardot (Lateral Thoughts)
116
Add  Big but Personal Data: How Human Behavior Bounds Privacy to your personal schedule
13:45 Big but Personal Data: How Human Behavior Bounds Privacy Yves-Alexandre de Montjoye (Imperial College London | MIT Media Lab)
Add Forecasting Space-time Events to your personal schedule
14:35 Forecasting Space-time Events Jeremy Heffner (Azavea)
Add Behavioral Analytics with Smartphone Data to your personal schedule
16:55 Behavioral Analytics with Smartphone Data Joerg Blumtritt (Datarella)
Add Open Source Intelligence in Data Journalism to your personal schedule
17:45 Open Source Intelligence in Data Journalism Anne-Lise Bouyer (Journalism++)
Add Friday Keynote Welcome to your personal schedule
9:30 Plenary
Room: 211-212
Friday Keynote Welcome Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Edd Wilder-James (Silicon Valley Data Science)
Add Pax Data to your personal schedule
9:35 Plenary
Room: 211-212
Pax Data Doug Cutting (Cloudera)
Add What is a Data Lake, Anyway?  to your personal schedule
10:00 Plenary
Room: 211-212
What is a Data Lake, Anyway? Martin Willcox (Teradata)
Add Yes, Open Data Has Value! to your personal schedule
10:10 Plenary
Room: 211-212
Yes, Open Data Has Value! Majken Sander (TimeXtender)
Add Predictive Analytics in the Cloud: Predicting Football to your personal schedule
10:25 Plenary
Room: 211-212
Predictive Analytics in the Cloud: Predicting Football Jordan Tigani (Google )
Add Embracing the Human Element to your personal schedule
10:40 Plenary
Room: 211-212
Embracing the Human Element Rodney Mullen (Almost Skateboards)
Add What's the Big Deal About City Data? to your personal schedule
10:50 Plenary
Room: 211-212
What's the Big Deal About City Data? Francine Bennett (Mastodon C)
Add Storytelling and Science to your personal schedule
11:00 Plenary
Room: 211-212
Storytelling and Science Ben Okri (Self)
11:20 Morning Break
Room: Sponsor Pavilion (Banquet Room)
Add Lunch / Friday Birds of a Feather to your personal schedule
12:30 Lunch sponsored by Teradata
Room: Sponsor Pavilion (Banquet Room)
Lunch / Friday Birds of a Feather
15:15 Break
Room: Sponsor Pavilion (Banquet Room)
8:30 Coffee Break
Room: P2 211-212 Foyer
11:50-12:30 (40m) Government/Open Data
Using Data for GOOD
Francine Bennett (Mastodon C), Duncan Ross (TES Global)
The data philanthropy movement is growing in Europe. DataKind is actively working to expand it's presence, and DataKind UK is now in it's second year, running successful events and projects. This is the story of the last two events - highlighting how charities have joined the data revolution.
13:45-14:25 (40m) Government/Open Data
Crunching Common Crawl with the Cloud-Based MIA Platform
Lisa Green (Common Crawl), Peter Adolphs (Neofonie)
The Web in itself forms a versatile dataset capable of powering most diverse applications. In our joint talk, we will present Common Crawl, an immense collection of Web data made freely available to anyone. We will then introduce MIA and show how this Cloud-based analysis platform and marketplace for data and algorithms enables users to perform analytical tasks on datasets at Web scale.
14:35-15:15 (40m) Business & Industry
Data Governance for Regulated Industries
Amir Halfon (ScalingData)
This session will examine the challenges and opportunities associated with Big Data in a regulated environment, and the use of a new generation of data management technology to address them. Several case studies will be presented based on real-life production deployments.
16:05-16:45 (40m) Government/Open Data
Happy City: Shortest Urban Paths or Shortcuts to Happiness?
Daniele Quercia (Bell Labs)
How can we change architecture to design more for the people and less for the architects? We present crowd-based solutions with which urban planners can get valuable information about what kind of urban design is attractive to the people. This leads to GPS systems that show you the "most beautiful" path to your destination and to indicators about the beauty of a city.
16:55-17:35 (40m) Government/Open Data
Patterns and Metapatterns in Income Tax Data
Alex Priem (Statistics Netherlands), Edwin De Jonge (Statistics Netherlands)
Histograms and heatmaps are often used to summarize large data sets. We provide guidelines for using them effectively and efficiently. We illustrate this using the complete Dutch income tax data by looking at distributions in wealth and income. Analysis of this data set is complicated by the large amount of variables. We use clustering techniques to automatically find relevant patterns.
17:45-18:25 (40m) Government/Open Data
Making the Work of Fire Fighters Safer with Information Awareness
Bart van Leeuwen (Netage)
It is 2:30 in the night, you are barely awake and racing through the city center of Amsterdam while you hear a 120db horn screaming overhead. You are in a fire truck. Within 4 minutes you will be facing a potential life threatening situation. How do you deal with all the data that can make your work safer in a environment like that? Learn about how we started solving these problems in a agile way.
11:50-12:30 (40m) Business & Industry
Telling Meaningful Stories With Data
Daniel Waisberg (Google)
In this presentation Daniel will discuss a process that can be used to go from data to stories. He will talk about ways to define the audience, create hypotheses, sketch data, analyze and build a story around it. The presentation includes the connecting dots game, Hulk, comics, architecture and other stories.
13:45-14:25 (40m) Business & Industry
The Unlikely Match between Financial Data and Open Innovation
Marcelo Soria-Rodriguez (BBVA Data & Analytics)
In this talk we will present practical cases on innovating with data in retail banking, a conservative industry. From initial idea to embracing open as a fundamental culture change, the talk will walk the audience through insights, lessons learned and practical examples on how to change the way value is delivered to customers.
14:35-15:15 (40m) Business & Industry
Welcoming Machine Learning into the Heart of a Creative Business
David Boyle (BBC Worldwide), Amanda Hill (BBC Worldwide), Dan Jabry (CrowdEmotion)
Emotions are messy and complicated. That meant we had to develop new data science and research methods to understand emotional engagement with out TV shows. But it also meant we had to be careful about how we brought that data to bear in a creative business like BBC Worldwide. Hear about how data science is making a big difference to how we build brands around the world.
16:05-16:45 (40m) Business & Industry
Model Workers: How leading companies are securing and creating value from their data talent
Juan Mateos Garcia (Nesta)
Everyone knows that creating value from big data requires the right skills, but what does this mean in practice? We present findings of a research project where we measure the skills needs of data-driven companies in 6 sectors, quantify the impact of data talent on company performance, and identify good practices to find, create value from and retain data talent.
16:55-17:35 (40m) Business & Industry
How to Datafy Your Business
Carme Artigas (Synergic Partners)
Datafication is a new term used to describe the process of turning an existing business into a "data business".This is affecting many industry and services sectors.For this,data monetization strategies must be in place. New data sources(open data..) have a key role as well as the need to protect data privacy.
17:45-18:25 (40m)
Session
To be confirmed
11:50-12:30 (40m) Hadoop Platform
From BI to Big Data at Solocal
Abed Ajraou (Solocal)
Solocal, the French company behind PagesJaunes.fr, recently put Big Data and Hadoop into action to replace its traditional BI infrastructure. In this session, you will learn why and how that was done.
13:45-14:25 (40m) Hadoop Platform
Hadoop and Pediatric Healthcare: Bedside Vitals and Better Babies
tod davis (Children's Healthcare of Atlanta)
Children’s Healthcare of Atlanta in the US implemented Hadoop to capture and analyze vital sign sensor data in the ICU. Its goal is to understand the impact of stressful procedures, to reduce pain, and to improve outcomes in their most fragile patients. This session will highlight the challenges of pediatric healthcare data management and the strategies used to make this project a success.
14:35-15:15 (40m) Hadoop Platform
Leading Change in Data Engineering
Neil Martin (comparethemarket.com), Rob Siwicki (comparethemarket.com)
The talk will provide insight into how to achieve coordinated technological change in a highly agile IT organization; an organisational function that supports one of the UK’s most recognisable brands. Discover valuable lessons learned and begin to understand how your organization may want to take first steps in its engagement proving and implementing Big Data technology.
16:05-16:45 (40m) Hadoop Platform
How Big Data is Redefining Banking
Ankit Tharwani (Barclays UK)
With traditional revenue sources maturing and new entrants at the gate, data can be a powerful differentiator. This session will present the challenges involved in deploying the right technologies and the change management culture at the foundations of new info-led propositions.
16:55-17:35 (40m) Hadoop Platform
RT-Giraph: Online Graph Mining Simplified
Georgos Siganos (Qatar Computing Research Institute)
Graph mining of large highly dynamic graphs is a challenging algorithmic and programming task requiring custom algorithms. Additionally, existing graph mining architectures are designed for batch workloads. The RT-Giraph open source project simplifies online graph mining by maintaining the programming and algorithmic simplicity of Apache Giraph, while supporting dynamic graphs.
17:45-18:25 (40m) Data Science, Open Data, Tools & Technology
Integrating Big Data into a Programming Language
Tomas Petricek (University of Cambridge)
The world of data is inherently diverse and "messy". Wouldn't it be nice if your programming language was aware of the external data sources that you are accessing? In this talk, we look at doing data science with F#, which provides unique way of integrating external data sources and libraries. You can access data, but also Matlab scripts or R packages, all from a single environment.
11:50-12:30 (40m) Data Science
Linking Data Without Common Identifiers
Lars Marius Garshol (Bouvet)
Linking data to create broader data sets can dramatically improve analysis results, but what if the data sets lack common identifiers? Similarly, duplicates in data is very common, and can seriously skew analysis results. This talk covers common techniques from record linkage research for solving this, as well as an open source tool implementing those techniques, and real-world examples.
13:45-14:25 (40m) Data Science
Realtime Data Analysis Patterns
Mikio Braun (Zalando SE)
Processing huge volume event streams in realtime poses quite some challenges. Based on our experience with social media data and realtime user interaction data, we discuss our experience with building such systems starting with a single computer. We have distilled this experience in a number of realtime data analysis patterns, which solve key aspects of such systems.
14:35-15:15 (40m) Data Science
Get Productive with Predictive Applications. Unleash Your Inner Data Scientist
Shawn Scully (Dato)
One of the most exciting areas in Big Data is the development of new data products; predictive applications used to drive product recommendations, predict machine failures, forecast airfare, social match-make, identify fraud, predict disease outbreaks, and repurpose pharmaceuticals. In this talk, I’ll share the trends we’re seeing in predictive application development, show how to....
16:05-16:45 (40m) Data Science
Automating Machine Learning Systems: Lessons Learned
Ofer Ron (LivePerson)
Many people assume that researching/designing a predictive modeling algorithm is the hard part of building a predictive modeling system over Big Data. We will focus on the far less romantic infrastructure needed to support a system, by reviewing the necessary components and the common pitfalls encountered when trying to automate both horizontally and vertically scalable systems.
16:55-17:35 (40m) Data Science
Doing the Impossible, Almost (A survey of approximation algorithms that make queries vastly faster)
Ted Dunning (MapR Technologies)
Computing various quantities such as medians or the number of unique elements requires a lot of time or a lot of memory or both. It is, however, possible to get really close to the right answer with much less time and much less memory. Such algorithms can be simpler than you might expect. I will describe these and show how they can be applied to applications like anomaly detection.
17:45-18:25 (40m)
Session
To be confirmed
11:50-12:30 (40m) Hadoop & Beyond
How Search Can Save Your Hadoop Investment and More
Shay Banon (Elasticsearch)
Thanks to technologies like NoSQL and Hadoop, organizations can store massive amounts of data that’s important to their business. Now the challenge is how to extract actionable insights from it. This session will explore why search is the foundation to gain value from “big data” across your business - from marketing, to product, to backend infrastructure - highlighting a few real-world examples.
13:45-14:25 (40m) Hadoop & Beyond
Yarns about YARN: Migrating to MapReduce v2
Kathleen Ting (Cloudera)
The next generation of MapReduce, YARN, has widely touted job throughput and Apache Hadoop cluster utilization benefits. Less known are the pitfalls littering the migration path to YARN. Learn from our extensive field experience to avoid those pitfalls and get your YARN cluster configured right the first time.
14:35-15:15 (40m) Hadoop & Beyond
HBase for Architects
nick dimiduk (Hortonworks, Inc)
Your application is out-growing its database, you've started shopping NoSQL options. Maybe you've adopted Hadoop into your Data Warehouse. You've heard HBase might be an appropriate technology, but you need to know more. This talk is for you. To understand its use, first understand how it works. This talk explores the design of HBase and its critical paths to ground an understanding of its use.
16:05-16:45 (40m) Hadoop & Beyond
Data, Data Everywhere and Only Map-Reduce to Drink
Cindy Lamm (comSysto GmbH), Michael Hausenblas (Red Hat)
We will describe our experiences in implementing a full-scale, data-driven application applied to a large anonymised dataset from the mobile operator Telefonica using Map-Reduce Our project was unusual in the breadth of techniques used and also in the diversity in our goals. We will provide our perspective based on our project and examine how upcoming technologies would have impacted our efforts
16:55-17:35 (40m) Hadoop & Beyond
From Pilot to Production: Building a Data Infrastructure
John Akred (Silicon Valley Data Science)
Creating a data architecture involves many moving parts. By examining the data value chain, from ingestion through to analytics, we will explain how the various parts of the Hadoop and big data ecosystem fit together to support batch, interactive and realtime analytical workloads.
17:45-18:25 (40m) Hadoop & Beyond
Resistance is Futile: The Next Generation Big Data Architecture
Jim Scott (MapR Technologies)
Apache Mesos, Apache Hadoop, Apache Spark + Custom Enterprise Applications: This stack combined is greater than the sum of each of the pieces of this stack. Couple all of that with custom enterprise applications, and the data center turns into a well-oiled machine. When combined, this software stack delivers unlimited flexibility for the entire data center.
11:50-12:30 (40m) Sponsored
For Red Hat, it's 1994 all over again
Greg Kleiman (Red Hat)
It’s been twenty years since Red Hat first launched Linux. Since then Red Hat has fueled the rapid adoption of open source technologies. As Big Data transitions into enterprise mode, Red Hat is again poised to facilitate the innovation and communities needed to empower multiple data stakeholders across your organization so you can truly open the possibilities of your data.
13:45-14:25 (40m) Hadoop & Beyond
Apache Spark: Ask Us Anything
Paco Nathan (O'Reilly Media), Aaron Davidson (Databricks), Sameer Farooqui (Databricks), Hossein Falaki (Databricks Inc.), Alex Sicoe (Elsevier), Olivier Girardot (Lateral Thoughts)
Join the Spark Team for an informal question and answer session.
11:50-12:30 (40m) Privacy, Law & Ethics
A Framework of Purpose and Consent for Data Security and Consumer Privacy
Aurélie Pols (Mind Your Privacy)
Borrowing from Spanish information security best practices and in the light of increasing data breach regulations, the presentation examines how data flows should ideally be defined and secured in order to assure accountability through an entire data lifecycle.
13:45-14:25 (40m) Privacy, Law & Ethics
Big but Personal Data: How Human Behavior Bounds Privacy
Yves-Alexandre de Montjoye (Imperial College London | MIT Media Lab)
We're living in an age of big data, a time when most of our movements and actions are collected and stored in real time. These data offer unprecedented insights on how we behave as a species. Mathematical analysis of location data however reveals how unique our individual behavior is and how this behavior puts fundamental constraints on our privacy.
14:35-15:15 (40m) Privacy, Law & Ethics
Forecasting Space-time Events
Jeremy Heffner (Azavea)
We often face the need to analyze the count of discrete events which occur at a specific time and place whether they be crime events, taxi requests, or phone calls. Forecasting these space-time events brings particular challenges: finding suitable tools for geographic processing and techniques for modeling the data. The session will cover the lessons learned in building such a system.
16:05-16:45 (40m) Privacy, Law & Ethics
Unraveling Myths of Digital Privacy & Advertising
Joshua Koran (Turn)
Before Edward Snowden disclosed the US intelligence services’ digital surveillance, marketers had been collecting, aggregating and inferring behavioral profiles on consumers around the world. This talk describes the chief technologies firms use to transform online activities into target audience segments, as well as the current and proposed regulations and public policies being considered.
16:55-17:35 (40m) Privacy, Law & Ethics
Behavioral Analytics with Smartphone Data
Joerg Blumtritt (Datarella)
Smartphones carry mighty sensors: GPS, wifi, acceleration, gyroscope, microphone, magnetic field, etc., tracking behavior and environment, giving answer to complex questions like "is the user driving in a car or riding on a train?" We will show cases from travel industry, sports retail, and health. We will propose, how to use such intrusive technology in an ethically correct way.
17:45-18:25 (40m) Privacy, Law & Ethics
Open Source Intelligence in Data Journalism
Anne-Lise Bouyer (Journalism++)
Breaking news from data that's already published, that's efficient Open Source Intelligence applied to journalism. The tools and methodologies available today make it possible to go big on a budget.
9:30-9:35 (5m)
Friday Keynote Welcome
Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Edd Wilder-James (Silicon Valley Data Science)
Strata Barcelona Program Chairs, Roger Magoulas, Doug Cutting, and Edd Dumbill, welcome you to the second day of keynotes.
9:35-9:45 (10m)
Pax Data
Doug Cutting (Cloudera)
In this presentation Doug Cutting, Cloudera's Chief Architect, will discuss how we might both reap the benefits of data while avoiding its perils.
9:45-10:00 (15m)
Understanding Decisions Driven by Big Data: From Analytics Management to Privacy-friendly Cloaking Devices
Foster Provost ( NYU | Stern )
As we've moved from simple statistical analyses of big data to decision-making based on big data and data-science models, we face an ironic "dirty secret." It is becoming increasingly difficult to understand why particular decisions have been made. In many applications, data-driven models now take as input massive numbers of "signals", including words in text, locations frequented...
10:00-10:10 (10m) Sponsored
What is a Data Lake, Anyway?
Martin Willcox (Teradata)
Drinking from the data lake is tempting, but what is it really? How did we get here, and what lessons can we learn from previous technologies? It’s tempting to see this as the solution to data silos, but what are the costs? Martin Willcox provides a practical guide to help you understand the realities…
10:10-10:25 (15m)
Yes, Open Data Has Value!
Majken Sander (TimeXtender)
Open data isn't just about waste pickup schedules and reporting pot holes—it can hold real monetary value for everyday business. Whether it's supply chain enhancement or improved customer segmentation, open data holds unexpected value for everyone.
10:25-10:40 (15m)
Predictive Analytics in the Cloud: Predicting Football
Jordan Tigani (Google )
How can you turn raw data into predictions? How can you take advantage of both cloud scalability and state-of-the-art Open Source Software? This talk shows how we built a model that correctly predicted the outcome of 14 of 16 games in the World Cup using Google's Cloud Platform and tools like iPython and StatsModels.
10:40-10:50 (10m)
Embracing the Human Element
Rodney Mullen (Almost Skateboards)
Ever do something perfectly in practice, only to have it blow up as soon as you try it when it really counts? This little phenomenon sends skaters to the hospital on a regular basis, mainly because controlled environments usually can’t evoke the depths of human responses.
10:50-11:00 (10m)
What's the Big Deal About City Data?
Francine Bennett (Mastodon C)
Exploiting big data and analytics through the whole organisation is now business as usual for retail and online businesses. But cities and buildings also create a whole lot of data, which could change lives for better or for worse. This talk explores what’s happening right now in big data and analytics for cities and buildings, where it might head, and what we might want from it all.
11:00-11:20 (20m)
Storytelling and Science
Ben Okri (Self)
This talk explores the critical importance of storytelling to science and what we can learn from that relationship.
11:20-11:50 (30m)
Break: Morning Break
12:30-13:45 (1h 15m)
Lunch / Friday Birds of a Feather
Birds of a Feather (BoF) discussions are a great way to informally network with people in similar industries or interested in the same topics.
15:15-16:05 (50m)
Break
8:30-9:30 (1h)
Break: Coffee Break