Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY

Strata + Hadoop World 2014 Schedule

Use the calendar icon [calendar icon] next to each listing you want to attend. Then use the personal schedule button below to generate your schedule.

Schedule Views

List Grid

  or 
All Topics
Business & Industry Connected World Data Science Design & Interfaces Hadoop & Beyond Hadoop Platform Hadoop in Action Law, Ethics & Open Data Machine Data Security At Cultivate, a two-day event taking place September 28-29, experts from leading companies who have been through this before will tell you how they do it, and how you can, too.

Cultivate explores the business practices managers, people in product development groups who want to become managers, and project team leads need to thrive in the new world: enabling design thinking, collaboration, and agility. The focus is on the way corporate cultures have to change to adapt to current trends like rapid release cycles, the use of data to inform discussion, and building environments where everyone, including women and other underrepresented groups, can contribute freely.

Changing culture isn’t about making superficial organizational tweaks; these are significant changes, and they have to be made from the bottom up, as well as from the top down. The companies that can make those changes will prosper; the ones that can’t, won’t.

">Cultivate

Sessions By Industry

Wednesday, 10/15/2014

7:00am

7:00am–9:00am Wednesday, 10/15/2014
Location: Hall E
Coffee Break (2h)

9:00am

Add to your personal schedule
9:00am–12:30pm Wednesday, 10/15/2014
Data Science
Location: 1 E8/1 E9
Jeroen Janssens (Data Science Workshops)
Average rating: ***..
(3.96, 27 ratings)
The command line, although invented decades ago, remains an amazing environment for doing data science. By combining small, yet powerful, command-line tools you can quickly obtain, scrub, explore, visualize, and model your data. In this hands-on tutorial you will gain a solid understanding of how to leverage the power of the command line and integrate it into your existing data science workflow. Read more.
Add to your personal schedule
SOLD OUT
9:00am–12:30pm Wednesday, 10/15/2014
Hadoop in Action
Location: 1 E10/1 E11
Mark Grover (Lyft), Jonathan Seidman (Cloudera), Gwen Shapira (Confluent), Ted Malaska (Blizzard Entertainment)
Average rating: ***..
(3.50, 36 ratings)
Are you looking for a deeper understanding of how to integrate components in the Apache Hadoop ecosystem to implement data management and processing solutions? Then this tutorial is for you. We'll provide a clickstream analytics example illustrating how to architect solutions with Apache Hadoop along with providing best practices and recommendations for using Hadoop and related tools. Read more.
Add to your personal schedule
9:00am–5:00pm Wednesday, 10/15/2014
Data Science
Location: 1 E12/1 E13
Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory), Brian Granger (Cal Poly San Luis Obispo), Andy Terrel (NumFOCUS), Peter Wang (Anaconda), Jake Vanderplas (eScience Institute, University of Washington), Olivier Grisel (Inria & scikit-learn), Travis Oliphant (Anaconda), Wes McKinney (Two Sigma Investments), Trent Nelson (Continuum Analytics), Kayur Patel (Google), Kester Tong (Google)
Average rating: ****.
(4.43, 14 ratings)
Python has become an increasingly important part of the data engineer and analytic tool landscape. Pydata at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including iPython Notebook, NumPy/matplotlib for visualization, SciPy, scikit-learn, and how to scale Python performance, including how to handle large, distributed data sets. Read more.
Add to your personal schedule
SOLD OUT
9:00am–5:00pm Wednesday, 10/15/2014
Hardcore Data Science
Location: 1 E14/1 E15
Ben Lorica (O'Reilly Media), Ted Dunning (MapR Technologies), Tim Kraska (Brown University), Alice Zheng (Amazon), Anna Gilbert (University of Michigan), Jon Kleinberg (Cornell University), Kira Radinsky (eBay | Technion), Rob Fergus (New York University and Facebook), Ben Recht (University of California, Berkeley), Brian Whitman (Spotify), Hanna Wallach (Microsoft Research NYC & University of Massachusetts Amherst), Dafna Shahaf (The Hebrew University of Jerusalem)
Average rating: ****.
(4.27, 15 ratings)
All-Day: Strata's regular data science track has great talks with real world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting... Read more.
Add to your personal schedule
SOLD OUT
9:00am–5:00pm Wednesday, 10/15/2014
Hadoop & Beyond
Location: Hall A 23/24
Paco Nathan (O'Reilly Media), Michael Armbrust (Databricks), Tathagata Das (Databricks), Matei Zaharia (Databricks), Reynold Xin (Databricks), Ameet Talwalkar (Determined AI), Holden Karau (Google), Joseph Bradley (Databricks), Sameer Farooqui (Databricks), Patrick Wendell (Databricks)
Average rating: ***..
(3.75, 20 ratings)
Spark Camp, organized by the creators of the Apache Spark project at Databricks, will be a day long hands-on introduction to the Spark platform including Spark Core, the Spark Shell, Spark Streaming, Spark SQL, MLlib, and more. Read more.
Add to your personal schedule
SOLD OUT
9:00am–5:00pm Wednesday, 10/15/2014
Data-Driven Business Day
Location: 1 E20/1 E21
Alistair Croll (Solve For Interesting), Farrah Bostic (The Difference Engine), Edd Wilder-James (Google), Jennifer Zeszut (Beckon), Brian Dalessandro (Zocdoc), Jana Eggers (Nara Logics), Joe Caserta (Caserta Concepts), Joy Beatty (Seilevel), Kim Rees (Periscopic), Peter Ferns (Goldman Sachs & Co), Brigitte Piniewski (nonaffiliated ), Nellwyn Thomas (Etsy), Michael Rosenbaum (Pegged Software), Merici Vinton (OI Engine @ IDEO ), Mary Ann Wayer (Premier Inc), Rohit Jain (Esgyn), Amy Gaskins (Panopticon), Jen van der Meer (Reason Street), Mark Doms (United States Department of Commerce), Halle Tecco (Rock Health)
Average rating: ***..
(3.33, 12 ratings)
All-Day: For business strategists, marketers, product managers, and entrepreneurs, Data-Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world. Read more.
Add to your personal schedule
9:00am–12:30pm Wednesday, 10/15/2014
Business & Industry, Law, Ethics & Open Data, Security
Location: 1 C03/1 C04
Ari Gesher (Palantir Technologies), John Grant (Palantir Technologies), Courtney Bowman (Palantir Technologies)
Average rating: ***..
(3.58, 12 ratings)
Technologists focused on privacy and civil liberties will run through the material in their book. The workshop will cover how to think about privacy, privacy protection properties that a system can have and the architectures that implement them, related issues in information security, and privacy issues in data collection. Read more.
Add to your personal schedule
9:00am–12:30pm Wednesday, 10/15/2014
Design & Interfaces
Location: 1 E6/1 E7
Sebastian Gutierrez (DashingD3js.com)
Average rating: ****.
(4.55, 11 ratings)
D3.js has a very steep learning curve. However, there are three main concepts that, once you get your head around them, will make the climb much easier. Focusing on these three main concepts, we will walk through many examples to teach the fundamental building blocks of D3.js. Read more.
Add to your personal schedule
9:00am–5:00pm Wednesday, 10/15/2014
Data Science
Location: 1 E16/ 1 E17
Hadley Wickham (Rice University / RStudio), Winston Chang (RStudio), Garrett Grolemund (RStudio), Joseph Allaire (Rstudio, Inc.), Yihui Xie (RStudio, Inc.)
Average rating: *****
(5.00, 10 ratings)
From advanced visualization, collaboration, reproducibility to data manipulation, R Day at Strata covers a raft of current topics that analysts and R users need to pay attention to. The R Day tutorials come from leading luminaries and R committers, the folks keeping the R ecosystem apace of the challenges facing analysts and others who work with data. Read more.
Add to your personal schedule
9:00am–5:00pm Wednesday, 10/15/2014
Business & Industry
Location: 1 D03/1 D04
Jon Bruner (O'Reilly Media), Daniel Koffler (Rio Tinto Alcan), Ami Daniel (Windward), David Simchi-Levi (MIT), Victor Fang (Pivotal), Yu Cao (EMC), Nathan Oostendorp (Sight Machine), Alasdair Allan (Babilim Light Industries), Cameron Turner (The Data Guild), Leo Spiegel (Pivotal), Edy Liongosari (Accenture), Mark Grabb (General Electric Global Research Center)
Average rating: *****
(5.00, 3 ratings)
Big Data is reaching beyond the Internet and into the machines that drive our world. Visit Industrial Internet day to gain insights from the way that power plants, factories, cars, and airplanes make use of sensors and software intelligence to improve operations and help managers make good decisions. Read more.
Add to your personal schedule
SOLD OUT
9:00am–12:30pm Wednesday, 10/15/2014
Hadoop & Beyond
Location: 1 E05
Patrick McFadin (Datastax), Helena Edelson (Apple)
Average rating: **...
(2.80, 5 ratings)
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. Add in Apache Spark and Kafka, you have an amazing time series solution. We will talk data models, go through deployment and code to build a functional, real-time application. Languages used: Java, Scala Read more.

1:30pm

Add to your personal schedule
1:30pm–5:00pm Wednesday, 10/15/2014
Hadoop in Action
Location: 1 E8/1 E9
Sridhar Reddy (MapR Technologies), carol mcdonald (MapR Technologies)
Average rating: ***..
(3.80, 10 ratings)
This tutorial will help you get a jump start on HBase development. We’ll start with a quick overview of HBase, the HBase data model, and architecture, and then we’ll dive directly into code to help you understand how to build HBase applications. We will also offer guidelines for good schema design, and will cover a few advanced concepts such as using HBase for transactions. Read more.
Add to your personal schedule
SOLD OUT
1:30pm–5:00pm Wednesday, 10/15/2014
Hadoop Platform
Location: 1 E10/1 E11
Stephen O'Sullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science)
Average rating: ***..
(3.09, 23 ratings)
What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads. Read more.
Add to your personal schedule
SOLD OUT
1:30pm–5:00pm Wednesday, 10/15/2014
Data Science
Location: 1 C03/1 C04
Carlos Guestrin (Apple | University of Washington ), Alice Zheng (Amazon), Shawn Scully (Dato)
Average rating: **...
(2.50, 10 ratings)
This tutorial focuses on hands-on data science skills from prototyping to production. Using GraphLab tools, we walk through multiple case studies such as fraud detection, social network analysis, and building personalized recommendation services. Read more.
Add to your personal schedule
1:30pm–5:00pm Wednesday, 10/15/2014
Business & Industry
Location: 1 E6/1 E7
Paco Nathan (O'Reilly Media), Allen Day (MapR Technologies)
Average rating: ***..
(3.46, 13 ratings)
Advanced math for business people: “just enough math” to take advantage of new classes of open source frameworks. Many take college math up to calculus, but never learn how to approach sparse matrices, complex graphs, or supply chain optimizations. This tutorial ties these pieces together into a conceptual whole, with use cases and simple Python code, as a new approach to computational thinking. Read more.

5:00pm

Add to your personal schedule
5:00pm–6:30pm Wednesday, 10/15/2014
Events
Location: North Hall Mezzanine
Average rating: ****.
(4.00, 7 ratings)
Don't miss Startup Showcase, Strata Conference + Hadoop World's live demo program and competition for startups and early-stage companies. The judges will pick winners from 10 finalist companies selected to present at the showcase. This event is part of NYC Data Week. Read more.

Thursday, 10/16/2014

6:30am

Add to your personal schedule
6:30am–7:30am Thursday, 10/16/2014
Events
Location: Central Park
Average rating: **...
(2.33, 6 ratings)
Cloudera invites you to join our 1st annual Hadoop Hustle during Strata + Hadoop World 2014. This event is part of NYC Data Week. Read more.

8:45am

Add to your personal schedule
8:45am–8:55am Thursday, 10/16/2014
Keynotes
Location: 1D
Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.22, 27 ratings)
Strata Program Chairs, Roger Magoulas, Doug Cutting, and Alistair Croll, welcome you to the first day of keynotes. Read more.

8:55am

Add to your personal schedule
8:55am–9:10am Thursday, 10/16/2014
Keynotes
Location: 1D
Mike Olson (Cloudera)
Average rating: ***..
(3.76, 42 ratings)
Mike Olson, CSO and Chairman, Cloudera Read more.

9:10am

Add to your personal schedule
9:10am–9:20am Thursday, 10/16/2014
Keynotes, Sponsored
Location: 1D
M. C. Srivas (Uber)
Average rating: **...
(2.88, 68 ratings)
If you want to know what's coming next in big data, just ask yourself, "what would Google do? Read more.

9:20am

Add to your personal schedule
9:20am–9:30am Thursday, 10/16/2014
Keynotes
Location: 1D
Miriah Meyer (University of Utah)
Average rating: ****.
(4.07, 71 ratings)
Miriah Meyer, Assistant Professor of Computer Science, University of Utah Read more.

9:30am

Add to your personal schedule
9:30am–9:35am Thursday, 10/16/2014
Keynotes, Sponsored
Location: 1D
Ron Kasabian (Intel)
Average rating: ***..
(3.42, 52 ratings)
This talk introduces how Intel is working with scientists and physicians to help improve research, treatment, and drug development for Parkinson’s Disease using data science and enabling the Parkinson's research community to build upon an open platform for big data analytics. Read more.

9:35am

Add to your personal schedule
9:35am–9:40am Thursday, 10/16/2014
Keynotes, Sponsored
Location: 1D
Sharmila Mulligan (ClearStory Data)
Average rating: **...
(2.67, 61 ratings)
Data is an evolving story. It’s not a static snapshot of a point in time insight. With data from internal and external sources constantly updating, we are evolving from rear-view mirror dashboard views into an era of interactive Storytelling. Read more.

9:40am

Add to your personal schedule
9:40am–9:50am Thursday, 10/16/2014
Keynotes
Location: 1D
Amanda Cox (The New York Times )
Average rating: ****.
(4.85, 115 ratings)
Amanda Cox, Graphics Operator, The New York Times Read more.

9:50am

Add to your personal schedule
9:50am–9:55am Thursday, 10/16/2014
Keynotes, Sponsored
Location: 1D
Ben Werther (Platfora)
Average rating: **...
(2.42, 53 ratings)
Spark represents the next-step function leap in what is possible with Hadoop, but what does that mean for business analysts that are swimming in multi-structured data? This presentation discusses the new workflow required so that business analysts can work with massive volumes of multi-structured data to find new insights today, instead of continually having to wait for IT to make big data small. Read more.

9:55am

Add to your personal schedule
9:55am–10:05am Thursday, 10/16/2014
Keynotes
Location: 1D
Tags: fashion
John Rauser (Snapchat)
Average rating: ****.
(4.82, 104 ratings)
There are two essential skills for the data scientist: engineering and statistics. A great many data scientists are very strong engineers but feel like impostors when it comes to statistics. In this talk John will argue that the ability to program a computer gives you special access to the deepest and most fundamental ideas in statistics. Read more.

10:05am

Add to your personal schedule
10:05am–10:20am Thursday, 10/16/2014
Keynotes
Location: 1D
Bob Mankoff (The New Yorker Magazine)
Average rating: ***..
(3.80, 51 ratings)
Bob Mankoff, The New Yorker's cartoon editor, will analyze the lessons we learn from crowdsourced humor. Along the way, he'll explore how cartoons work (and sometimes don't); how he makes decisions about what cartoons to include; and what crowds can tell us about a good joke. Read more.

11:00am

Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Design & Interfaces
Location: 1 E8/1 E9
Jeffrey Heer (Trifacta | University of Washington)
Average rating: ***..
(3.74, 23 ratings)
Interaction and visual design are exacting exercises. Designing for data -- especially in messy and massive forms -- brings a new set of challenges. How can we help people of varying backgrounds effectively transform and understand data at scale? Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Law, Ethics & Open Data
Location: 1 E10/1 E11
Tags: finance
Merici Vinton (OI Engine @ IDEO ), Micheál Keane (Civis Analytics)
Average rating: **...
(2.00, 2 ratings)
An open data in government love story / case study - how a team of techies overcame political and procedural hurdles to change the financial marketplace. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Connected World
Location: 1 E12/1 E13
Tags: geo_local
Bradley Voytek (UC San Diego and Uber, Inc.)
Average rating: ****.
(4.12, 8 ratings)
Uber has created an AI city simulation framework to optimize its dispatching system, minimize user wait times, and maximize driver partner earnings. Based on agent-based and swarm intelligence models, this framework generates plausible optimizations across many interacting, dynamic, non-linear parameters on a city-by-city basis. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Business & Industry
Location: 1 E14/1 E15
Max Shron (Warby Parker), sasha laundy (Warby Parker)
Average rating: ****.
(4.00, 17 ratings)
Business problems don’t reveal themselves neatly as data problems. The data community is obsessed with tools and techniques, but the real challenge is understanding how to solve problems with data. How do we bridge the gap? In this talk, we will teach you a methodology for figuring out the right problems to solve and making sure that the work stays smart. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Hadoop Platform
Location: Hall A 23/24
Marcel Kornacker (Cloudera), Lenni Kuff (Facebook)
Average rating: **...
(2.50, 26 ratings)
Find out how to run real-time analytics over raw data without requiring a manual ETL process targeted at an RDBMS. This talk describes Impala’s approach to on-the-fly data transformation and its support for nested data; examples demonstrate how this can be used to query raw data feeds in formats such as text, JSON and XML, at a performance level commonly associated with specialized engines. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Hadoop & Beyond
Location: 1 E20/1 E21
Michael Stonebraker (Tamr, Inc.)
Average rating: ***..
(3.67, 12 ratings)
The explosion of internal data sources, external public data sources and feeds from the Internet of Things is causing a tsunami of diverse data sources for enterprises. Top-down data-integration tools and data scientist tools won’t scale to meet the demands of the modern enterprise. Learn how a scalable data curation platform can help enterprises connect and enrich their data to leverage it all. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Hadoop in Action
Location: 1 C03/1 C04
Tags: finance
Peter Ferns (Goldman Sachs & Co)
Average rating: ***..
(3.21, 14 ratings)
Goldman Sachs is a leading global investment banking, securities and investment management firm that provides a wide range of financial services. Goldman executes 100's of millions of financial transactions per day, across nearly every market in the world. Learn how Goldman is harnessing knowledge, data and compute power to maintain and increase its competitive edge. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Sponsored
Location: 1 E6/1 E7
Vin Sharma (Intel)
Average rating: *....
(1.75, 4 ratings)
This session will outline Intel’s vision of an E2E Data Analytics Architecture for IoT as well as how we are enabling companies to elevate and transform the way they interact with their customers. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Sponsored
Location: 1 E16/ 1 E17
Jim Scott (MapR Technologies)
Average rating: ***..
(3.20, 5 ratings)
Learn the critical success factors for organizational success with Hadoop and building the right team and skill sets for high performance Hadoop success from a veteran of three successful Hadoop projects. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Sponsored
Location: 1 D03/1 D04
Vaibhav Nivargi (ClearStory Data)
Average rating: **...
(2.20, 5 ratings)
In this session, you will learn why it’s powered by Spark, hear key business use cases from customers across various industries using it and gain understanding of the five fundamentals of speeding disparate data analysis. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 10/16/2014
Data Science
Location: 1D
Claudia Perlich (Dstillery)
Average rating: ****.
(4.60, 5 ratings)
There is a symbiotic relationship between predictive modeling and Big Data. Performance gets better with more data and predictive models demonstrate like few other techniques the value of Big Data. However, there is a surprising paradox: when you need models most, even all the data is not enough or just not suitable. So in the days and age of Big Data there remains an art to predictive modeling. Read more.

11:50am

Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Design & Interfaces
Location: 1 E8/1 E9
Bob Mankoff (The New Yorker Magazine)
Average rating: ****.
(4.60, 5 ratings)
Bob Mankoff, The New Yorker's cartoon editor, will analyze the lessons we learn from crowdsourced humor. Along the way, he'll explore how cartoons work (and sometimes don't); how he makes decisions about what cartoons to include; and what crowds can tell us about a good joke. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Law, Ethics & Open Data
Location: 1 E10/1 E11
Jim Adler (Metanautix)
Average rating: ****.
(4.50, 6 ratings)
Bad press, FTC consent decrees, and White House reports have all put a spotlight on bad data practices. Data scientists and designers have become increasingly aware of how privacy principles should guide their work. So, the geeks have met the wonks. Now, it’s time for the wonks to meet the geeks and use data analytics to keep pace with burgeoning data volumes, velocities, and innovations. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Connected World
Location: 1 E12/1 E13
Tags: ngo
Brett Goldstein (University of Chicago)
Average rating: ****.
(4.50, 4 ratings)
How far can we take open data--and where can it take us? Brett Goldstein, who helped pioneer Chicago’s cutting-edge efforts in open data and analytics as CIO and CDO, will speak on how these act as a force multiplier on government efforts and can lead to smarter and more inclusive policy-making, while enhancing the government’s ability to anticipate and react to the needs of the public. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Business & Industry
Location: 1 E14/1 E15
Denise Asplund (Cisco Systems, Inc)
Average rating: ***..
(3.00, 8 ratings)
This talk highlights William's success, challenges, and experiences creating a data driven operations model into Cisco’s engineering services organization. William highlights the role of data, the need for scale and security, the opportunity for new technology to accelerate business, the role of IT to help guide/partner, and the mind shift and cultural changes along the journey. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Hadoop Platform
Location: Hall A 23/24
Julian Hyde (Hortonworks)
Average rating: ***..
(3.25, 8 ratings)
Hyde shows how to quickly build a SQL interface to a NoSQL system using Optiq. He shows how to add rules and operators to Optiq to push down processing to the source system, and how to automatically build materialized data sets in memory for blazing-fast interactive analysis. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Hadoop & Beyond
Location: 1 E20/1 E21
Joe Hellerstein (UC Berkeley), Sean Kandel (Trifacta)
Average rating: ***..
(3.83, 12 ratings)
Data transformation — traditionally the domain of IT specialists — is emerging as a critical, widespread problem in data analytics. In this session we discuss the advantages of using a domain-specific language for data transformation tasks. We illustrate these issues with Wrangle, a DSL designed for interactive data transformation. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Hadoop in Action
Location: 1 C03/1 C04
Tags: finance
Stephen Lloyd (Transamerica), Vishal Bamba (Transamerica), David Beaudoin (Transamerica)
Average rating: ***..
(3.25, 8 ratings)
Transamerica is a financial services company moving to a more customer centric model using Big Data. Our approach to this effort spans our Insurance, Annuity, and Retirement divisions. We went from a simple proof of concept to establishing Hadoop as a viable element of our enterprise data strategy. We cover core components of our solution and focus on lessons learned from our experience. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Sponsored
Location: 1 E6/1 E7
Peter Schlampp (Platfora), Ed Smith (AutoTrader)
Average rating: **...
(2.80, 5 ratings)
Up to 90% of your data is coming in new forms, in greater size, and at increasing speed. This multi-structured data requires a new workflow, putting the power of Hadoop and Spark into the hands of business analysts. In this session, we will share how Fortune 500 analysts have transformed their workflow by gaining insights into their business once never possible. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Sponsored
Location: 1 E16/ 1 E17
Todd Papaioannou (Splunk)
Average rating: *....
(1.94, 32 ratings)
In this session you will hear from big data experts with real world experience on the architectural patterns and platform integrations used to solve real business problems with data. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Sponsored
Location: 1 D03/1 D04
Jorge A. Lopez (Amazon Web Services)
Average rating: ***..
(3.00, 1 rating)
Shifting workloads from the enterprise data warehouse (EDW) to Hadoop reduces costs, enables you to keep that data longer, and frees up EDW capacity for fast analytics. Check out our live demo and learn a proven framework for offloading workloads from the EDW to Hadoop: Identify & prioritize what to offload; Shift workloads to Hadoop; Optimize & secure your environment; and Visualize new insights. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 10/16/2014
Data Science
Location: 1D
Joseph Adler (Confluent), Hilary Mason (Fast Forward Labs), Scott Nicholson (Poynt), Lucian Lita (Intuit), Roger Magoulas (O'Reilly Media)
Average rating: **...
(2.91, 11 ratings)
In this debate, two teams of the world's best data scientists will debate the following proposition: "If you can't code, you can't be a data scientist." Read more.

12:30pm

Add to your personal schedule
12:30pm–1:45pm Thursday, 10/16/2014
Events
Location: North Hall and Hall 1A
Average rating: **...
(2.78, 9 ratings)
Birds of a Feather (BoF) discussions are a great way to informally network with people in similar industries or interested in the same topics. NOTE: BoFs are happening during lunch, which is not accessible to Expo Plus and Expo Only pass holders. Read more.

1:45pm

Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Data Science
Location: 1 E8/1 E9
Laurie Skelly (Datascope Analytics)
Average rating: **...
(2.71, 14 ratings)
Data scientists wear many hats -- how do you train a ready-for-prime-time data scientist in twelve weeks? We'll share some of the choices and models we used to create the Metis Data Science Bootcamp and select its first cohort of students. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Law, Ethics & Open Data
Location: 1 E10/1 E11
Gilad Rosner (Internet of Things Privacy Forum)
Average rating: ***..
(3.54, 13 ratings)
While the inexorable march of technology does threaten historical notions of privacy, privacy IS very much alive – a shifting, vital conversation society has with itself and its machines. This talk explores the principles of transparency, unlinkability, and intervenability to build a foundation for a design ethos for technologists. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Connected World
Location: 1 E12/1 E13
Tags: geo_local
Mansour Raad (ESRI)
Average rating: ****.
(4.86, 21 ratings)
GeoSpatial BigData and types are special "animals" when it comes to storage, discovery and processing. This session will explore the various non-traditional ways to stream, extract, batch and visualize GeoSpatial Information for deeper geo-insight, such as "Where are the 3 nearest facilities to each of my customers based on current traffic conditions...nationwide ?" Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Business & Industry
Location: 1 E14/1 E15
Tags: fashion
Andrea Burbank (Pinterest)
Average rating: ***..
(3.80, 15 ratings)
Over two years of running A/B testing at Pinterest on millions of users each day, Andrea learned about the nuances that can make or break an experimentation platform. Andrea will discuss how her approach to testing has adjusted over time to avoid critical errors at all levels, from organizational to analytical. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Hadoop Platform
Location: Hall A 23/24
Guy Harrison (Dell Software), David Robson (Dell Software), Kathleen Ting (Cloudera)
Average rating: ***..
(3.71, 7 ratings)
When people think of big data processing, they think of Apache Hadoop, but that doesn't mean traditional databases don't play a role. In most cases users will still draw from data stored in RDBMS systems. Apache Sqoop can be used to unlock that data and transfer it to Hadoop, enabling users with information stored in existing SQL tables to use new analytic tools. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Hadoop & Beyond
Location: 1 E20/1 E21
Fangjin Yang (Imply), Xavier Léauté (Confluent)
Average rating: ***..
(3.17, 6 ratings)
Organizations often showcase the virtues of their data platforms, but rarely share the challenges and decisions faced along the way. Our session describes how we architected our analytics stack around Druid, an open source distributed data store, and how we overcame the challenges around scaling the system, balancing features with cost, and making performance consistent. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Hadoop in Action
Location: 1 C03/1 C04
Tags: finance
Sastry Durvasula (American Express), Kevin Murray (American Express)
Average rating: ***..
(3.71, 7 ratings)
American Express is transforming for the digital age! Learn how we unleashed Big Data into our ecosystem and built on the strength of our core capabilities to remain relevant in a rapidly changing environment. New commerce opportunities and innovative products are being delivered, and the chance to provide actionable insights, social analysis, and predictive modeling is growing exponentially. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Sponsored
Location: 1 E6/1 E7
Nenshad Bardoliwalla (Paxata), Uday Hegde (Useready Inc.), Julia Bardmesser (Citi), O'Reilly Speaker Management (O'Reilly Media)
Average rating: ***..
(3.00, 1 rating)
Today’s unstructured data is raw and complex, but everyone agrees it can provide context and hidden insights when it is easily accessed during the business intelligence lifecycle. . . Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Sponsored
Location: 1 E16/ 1 E17
Jagane Sundar (WANdisco)
Average rating: ****.
(4.50, 2 ratings)
This session will examine the distribution and storage of data in HDFS across multiple datacenters in a single coordinated, Paxos-based file system over a WAN. Efficient use of compute resources in a globally distributed HDFS cluster is also discussed. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Sponsored
Location: 1 D03/1 D04
Mike Hoskins (Actian Corporation)
Average rating: *....
(1.38, 16 ratings)
Big Data and Analytics is still a young space but novel new methods are on the way. Prominent among them is graph analytics. Actian will show radical and innovative graph analytic capabilities, from its investment in SPARQL City. Founded by database legend Barry Zane, SPARQL City and Actian are committed to delivering the industry’s highest performing in memory graph analysis engine. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Data Science
Location: 1D
Chris Harland (Microsoft)
Average rating: ****.
(4.17, 6 ratings)
An increasingly common task for data science is the measurement and attribution of experimental impact. Using examples from healthcare.gov, Microsoft advertising, and Bing experimentation, we will explore the strengths, weaknesses, and pitfalls of techniques for dealing with impact and attribution in scenarios/data in which control experiments were not possible or otherwise not performed. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 10/16/2014
Sponsored
Location: 1 E05
Moderated by:
Alex Gorelik (Waterline Data)
Panelists:
Suresh Srinivas (Hortonworks), Mike Sutten (Kaiser Permanente), John Mount (Win-Vector LLC), Clark Farrey (Capital One), Sunil Soares (Information Asset)
Average rating: ***..
(3.33, 3 ratings)
Companies are deploying Hadoop “data lakes” to provide unprecedented access to data for data science and analytics. However, the advantages of frictionless ingest, flexible schema on read, and lack of data governance, turn into increasingly insurmountable challenges to enable true data self-service, and create a barrier to the enterprise adoption of Hadoop. Read more.

2:35pm

Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Enterprise Adoption
Location: 1 E8/1 E9
Barry Devlin (9sight Consulting)
Average rating: ***..
(3.27, 15 ratings)
“Leave the over-structured, complex Data Warehouse behind. Dive into the pure, sparkling waters of the Data Lake!” I suggest you enjoy the Instagram, but beware the hidden depths. The Data Lake is a misleading metaphor; it will become a watery grave for context, governance, and value. In reality, today's intricate information ecosystem demands a careful blend of architectures and technologies. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Law, Ethics & Open Data
Location: 1 E10/1 E11
Tags: ngo
Stefan Heeke (SumAll.org), Adeen Flinker (SumAll.org)
Average rating: ****.
(4.25, 4 ratings)
The story of using predictive analytics for homelessness prevention in New York City. SumAll.org is currently piloting this approach with the city’s department of homeless services. Predicting at-risk families in a timely manner and micro-targeting social services is a game-changer. SumAll.org is a data analytics nonprofit, dedicated to leveraging the power of data for social innovation. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Connected World
Location: 1 E12/1 E13
Tags: ngo
Pramod Varma (UIDAI)
Average rating: ****.
(4.85, 13 ratings)
Aadhaar, India's Unique Identity Project, is the largest biometric identity system in the world with more than 600 million people. Its strength lies in its design simplicity, sound strategy, and technology backbone issuing 1 million identity numbers and doing 600 trillion biometric matches every day! Pramod Varma, who is the Chief Architect of Aadhaar, shares his experience from this project. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Business & Industry
Location: 1 E14/1 E15
Eugene Kolker (Seattle Children's)
Average rating: **...
(2.50, 8 ratings)
This discussion touches on the human response to analysis results, especially when they do not support long held beliefs and how this effects organizational change. This discussion also focuses on Predictive Analytics best practices, team skills, and a review of what it takes to build a sustainable Predictive Analytics program. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Hadoop Platform
Location: Hall A 23/24
Mithun Radhakrishnan (Yahoo! Inc.)
Average rating: ****.
(4.67, 6 ratings)
The past year has seen the advent of various "low latency" solutions for querying big data such as Shark, Impala, and Presto. The Hive team at Yahoo has spent the past several months benchmarking several versions of Hive (and Tez), with several permutations of file-formats, compression, and query engine features, at various data sizes. In this talk, we present our tests, the results, and findings. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Hadoop & Beyond
Location: 1 E20/1 E21
Lior Abraham (Interana Inc)
Average rating: **...
(2.33, 6 ratings)
Leveraging our experience from working on some of the largest-scale high-growth applications at Facebook and other companies, including building the most popular data analysis tool Scuba, this talk outlines 10 lessons learned, along with best practices towards extracting the most value out of data, while avoiding common pitfalls. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Hadoop in Action
Location: 1 C03/1 C04
Tags: fashion
Chris Wilson (L.L.Bean), Doug Bryan (RichRelevance)
Average rating: ***..
(3.00, 8 ratings)
The accumulation, access and analysis of customer data (“the original Big Data”) are ingrained for L.L.Bean, which has been doing customer modeling since the 1960’s. In line with today’s omnichannel imperative, however, the retailer has embraced a “new Big Data”-driven culture—democratizing data access and tools—in order to sustain its customer-centric philosophy. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Sponsored
Location: 1 E6/1 E7
Eric Frenkiel (MemSQL)
Average rating: ***..
(3.43, 7 ratings)
This session will cover how MemSQL’s hybrid transactional and analytic data processing capabilities and Apache Spark integration enable businesses to build real-time platforms for applications like operational analytics, position monitoring, and anomaly detection. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Sponsored
Location: 1 E16/ 1 E17
Michael O'Connell (TIBCO Software Inc.)
Average rating: ***..
(3.00, 2 ratings)
Join TIBCO Software, an industry leader in infrastructure and analytics software, for a thought leadership discussion to learn how your organization can redefine its data strategy. Transition from a company of Big Data to Fast Data and convert your customers into fans while achieving a competitive advantage.  Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Sponsored
Location: 1 D03/1 D04
Sanjay Radia (Hortonworks)
Average rating: ***..
(3.67, 3 ratings)
In this talk Arun Murthy will share the very latest innovation from the community aimed at accelerating the interactive and realtime capabilities of enterprise Hadoop. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 10/16/2014
Data Science
Location: 1D
Vitaly Gordon (LinkedIn)
Average rating: ***..
(3.82, 11 ratings)
A talk about how the largest professional social network in the world is digitally mapping the global economy to connect talent with opportunity at massive scale. Read more.

4:15pm

Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Enterprise Adoption
Location: 1 E8/1 E9
Monte Zweben (Splice Machine Inc.)
Average rating: ****.
(4.50, 4 ratings)
There is a wave of challengers in the database world focused on the scaling costs of traditional RDBMSs. These potential giant killers have capitalized on explosive data growth and disruptive technologies like distributed computing (e.g., Hadoop and NoSQL). We’ll discuss the new breed of database buyers, the redefinition of “enterprise,” and apply lessons from past database wars. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Law, Ethics & Open Data
Location: 1 E10/1 E11
Tricia Wang (Constellate Data ), Matt LeMay (Constellate Data)
Average rating: *****
(5.00, 1 rating)
This session examines the risks of over-reliance on big data and the need to bring in Thick Data—qualitative methods used by ethnographers. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Connected World
Location: 1 E12/1 E13
Tags: health, care
Brigitte Piniewski (nonaffiliated )
Average rating: *****
(5.00, 1 rating)
This session will help data scientists support healthcare leaders to harmonize health data with Open Source community data commons approaches. This enhances the value of mandated EMR adoption beyond Meaningful Use requirements by creating evidence-based community health intelligence at the pace and point of change, the everyday lives and activities of community members. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Business & Industry
Location: 1 E14/1 E15
Tags: retail
Michael Abbott (Kleiner Perkins Caufield & Byers), Will Moss (Airbnb), Geoff Guerdat (Gilt Groupe), Emil Ong (Lookout)
Average rating: ****.
(4.20, 10 ratings)
In this session, Kleiner Perkins Caufield & Byers General Partner Michael Abbott speaks with Geoff Guerdat of the Gilt Groupe, Will Moss of Airbnb, and Emil Ong of Lookout, to unbox their respective companies and examine the technology, architecture, and innovations they’ve harnessed to deliver superior products and services. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Hadoop Platform
Location: Hall A 23/24
P. Taylor Goetz (Hortonworks )
Average rating: ****.
(4.33, 6 ratings)
We will discuss the basics of scaling, common mistakes and misconceptions, how different technology decisions affect performance, and how to identify and scale around the bottlenecks in a Storm deployment. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Hadoop & Beyond
Location: 1 E20/1 E21
Haoyuan Li (Alluxio)
Average rating: ****.
(4.36, 11 ratings)
An introduction to Tachyon, a memory centric storage system started from UC Berkeley. It enables different frameworks to share data at memory-speed. It is also a major component of Berkeley Data Analytics Stack (BDAS). The project is open source and is deployed at multiple companies. It has more than 30 contributors from over 10 institutions, including Yahoo, Intel, Redhat, Alibaba etc. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Hadoop in Action
Location: 1 C03/1 C04
Matthias Braeger (CERN), Manish Devgan (Software AG)
Average rating: ***..
(3.86, 7 ratings)
CERN, home to the Large Hadron Collider (LHC) is at the forefront of science and technology. Come to this session to learn how projects at CERN are leveraging In-memory data management and Hadoop to derive real-time insights from sensor data helping to manage the technical infrastructure of the Large Hadron Collider (LHC). Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Sponsored
Location: 1 E6/1 E7
Steve McPherson (Amazon Web Services)
Average rating: **...
(2.50, 6 ratings)
Learn how you can architect Amazon Kinesis and Amazon Elastic MapReduce together to create a highly scalable real-time analytics solution which can ingest and process terabytes of data per hour from hundreds of thousands of different concurrent sources. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Sponsored
Location: 1 E16/ 1 E17
Don Pinto (Couchbase)
This session provides a brief overview of Couchbase Server, a document database and its underlying distributed architecture. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Sponsored
Location: 1 D03/1 D04
Dan McClary (Oracle)
Average rating: *****
(5.00, 1 rating)
SQL is the natural language for querying data, but data lives in many places. We discuss the importance of SQL not only on Hadoop, but on relational databases, and noSQL stores. Additionally, we dive deep into the architecture of Big Data SQL, which can access all of these sources in a single query. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 10/16/2014
Data Science
Location: 1D
Brian Granger (Cal Poly San Luis Obispo), Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory)
Average rating: *****
(5.00, 4 ratings)
The IPython Notebook is an open-source, web-based interactive computing environment. The Notebook enables users to author documents that combine live code, descriptive text, mathematical equations, images, videos, and arbitrary HTML. This talk will describe how IPython is evolving to support a wide range of programming languages relevant in data science, including Python, Julia, and R. Read more.

5:05pm

Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Enterprise Adoption
Location: 1 E8/1 E9
Eddie Garcia (Cloudera)
Average rating: ***..
(3.60, 5 ratings)
Recent studies show the vast majority of Hadoop projects are stuck in development, with very few ever reaching production status. And those programs that do convert from pilot to production often view Hadoop as little more than an ETL tool. This session looks at why Hadoop implementations often stall out in the development phase and what companies can do to make Hadoop “production ready.” Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Business & Industry
Location: 1 E10/1 E11
Tags: ngo
Joel Gurin (Center for Open Data Enterprise), Laura Manley (The GovLab at NYU)
Average rating: *****
(5.00, 1 rating)
Open government data on healthcare, finance, education, energy, and other areas has become a major business resource. Joel Gurin, author of Open Data Now and director of the Open Data 500 study, will show how both startups and established companies are putting open data to work. He'll cover Open Data and Big Data, business models for open-data companies, and lessons from a range of case studies. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Business & Industry
Location: 1 E12/1 E13
Tags: retail
Average rating: ****.
(4.75, 4 ratings)
At Etsy, we run dozens of experiments simultaneously and we have terabytes of data generated by the tens of millions of members of our community. We've worked hard to establish a product development process informed by -- and often driven by -- data. In this talk, Nell will discuss the tensions that arise in a data-driven product culture. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Business & Industry
Location: 1 E14/1 E15
Michael Abbott (Kleiner Perkins Caufield & Byers), Michael Stoppelman (Yelp), Siva Subramanian (Box)
Average rating: ****.
(4.00, 2 ratings)
In this session, Kleiner Perkins Caufield & Byers General Partner Michael Abbott speaks with Michael Stoppelman of Yelp and Siva Subramanian of Box to unbox their respective companies and examine the technology, architecture, and innovations they’ve harnessed to deliver superior products and services. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Hadoop Platform
Location: Hall A 23/24
Martin Kleppmann (University of Cambridge)
Average rating: ****.
(4.71, 14 ratings)
Apache Samza is a framework for processing high-volume real-time event streams. In this session we will walk through our experiences of putting Samza into production at LinkedIn, discuss how it compares to other stream processing tools, and share the lessons we learnt about dealing with real-time data at scale. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Hadoop & Beyond
Location: 1 E20/1 E21
Sean Owen (Cloudera)
Average rating: ****.
(4.73, 11 ratings)
Apache Spark is a popular new paradigm for computation on Hadoop. It's particularly effective for iterative algorithms relevant to data science like clustering, which can be used to detect anomalies in data. Curious? Get a taste of Spark MLlib, Scala and k-means clustering in this walkthrough of anomaly detection as applied to network intrusion, using the KDD Cup '99 data set. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Hadoop in Action
Location: 1 C03/1 C04
Tags: finance
Lelanie Moll (FICO), Deb Brooks (FICO), Silaphet Mounkhaty (FICO)
Average rating: ***..
(3.80, 5 ratings)
FICO has been delivering analytic solutions, such as their renowned credit scores, for nearly 60 years. Big data technologies like Hadoop promise FICO analysts the ability to build models much faster, and with greater accuracy than before, but this new generation of tools challenge them to think differently. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Hadoop & Beyond
Location: 1 E6/1 E7
Additional, informal work session with the Spark Team. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Sponsored
Location: 1 E16/ 1 E17
George Corugedo (RedPoint Global)
Average rating: ***..
(3.50, 2 ratings)
Deriving value from data depends on how well companies capture and manage that data. Learn how to create a centralized processing pool where data can be captured, cleansed, linked and structured in a consistent way. Use the scalability and flexibility of Hadoop to create a powerful processing and refinement engine to drive usable information across enterprise data bases and data marts. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Sponsored
Location: 1 D03/1 D04
Average rating: ***..
(3.00, 3 ratings)
Join us for a panel discussion that includes customers, industry experts and partners who are ready to explore the latest advances in Hadoop, from affordability and appliances, to Apache Spark, simplification and security. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 10/16/2014
Data Science
Location: 1D
Juan Miguel Lavista (Microsoft)
Average rating: ***..
(3.43, 7 ratings)
Just in the US, we make over ~40 billion queries every month. From the time we wake up, search engines are one of the top activities we do online, this talk will show some examples on how this data can be used from funny things like determining which city wakes up earlier to more complex scenarios like finding adverse drug interactions. Read more.

5:45pm

Add to your personal schedule
5:45pm–7:15pm Thursday, 10/16/2014
Events
Location: Expo Hall (1C)
Average rating: ***..
(3.67, 3 ratings)
Join your fellow big data enthusiasts at the Strata Conference & Hadoop World Expo Hall Reception on Thursday, October 16. Read more.

8:00pm

Add to your personal schedule
8:00pm–11:00pm Thursday, 10/16/2014
Events
Location: Off Site
Average rating: ****.
(4.60, 15 ratings)
Come join us for an eclectic taste of Hell’s Kitchen cuisine and entertainment. Mix and mingle with fellow attendees at six distinctly different places within a few blocks of each other, including a piano bar, swing dancing, Memphis bbq, cajun creole, southeast Asian, and rock & roll lounge. Read more.

Friday, 10/17/2014

8:45am

Add to your personal schedule
8:45am–8:50am Friday, 10/17/2014
Keynotes
Location: 1D
Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.91, 11 ratings)
Strata Program Chairs, Roger Magoulas, Doug Cutting, and Alistair Croll, welcome you to the second day of keynotes. Read more.

8:50am

Add to your personal schedule
8:50am–9:00am Friday, 10/17/2014
Keynotes
Location: 1D
Eli Collins (Cloudera)
Average rating: ***..
(3.12, 16 ratings)
In this presentation Eli Collins, Cloudera’s Chief Technologist, will discuss how we might both reap the benefits of data while avoiding its perils. Read more.

9:00am

Add to your personal schedule
9:00am–9:15am Friday, 10/17/2014
Keynotes
Location: 1D
Rana el Kaliouby (Affectiva)
Average rating: ****.
(4.22, 37 ratings)
This keynote will share insights from the world’s largest repository of consumer emotions and present the challenges and opportunities that this data presents for machine learning as well as data mining and visualization. Read more.

9:15am

Add to your personal schedule
9:15am–9:25am Friday, 10/17/2014
Keynotes, Sponsored
Location: 1D
Joseph Sirosh (Microsoft)
Average rating: ***..
(3.00, 31 ratings)
Software and the rise of cloud services have given rise to revolutionary new economies – creating new markets for everything from self-published books, music and videos to mobile apps. Only a few years ago, it would have been hard to imagine developers authoring a million apps for smartphones. But that’s history. Read more.

9:25am

Add to your personal schedule
9:25am–9:35am Friday, 10/17/2014
Keynotes
Location: 1D
Tags: fashion
Karen Moon (Trendalytics)
Average rating: ***..
(3.62, 45 ratings)
Karen Moon will discuss the characteristics of unstructured data that makes identifying and synthesizing fashion trends particularly challenging and how getting it right can be a competitive advantage. Read more.

9:35am

Add to your personal schedule
9:35am–9:45am Friday, 10/17/2014
Keynotes
Location: 1D
George Legendre (IJP Architects London)
Average rating: ***..
(3.93, 44 ratings)
In this presentation, George L. Legendre, principal of IJP Architects and faculty at Harvard graduate School of Design, will show how the mathematical equations of pasta define the ultimate taxonomy of the genre. Read more.

9:45am

Add to your personal schedule
9:45am–9:50am Friday, 10/17/2014
Keynotes, Sponsored
Location: 1D
Average rating: **...
(2.68, 34 ratings)
The world is a rapidly changing place, where time flies and technological innovations batter us fast and furiously. Hadoop is just nine years old; and just five years ago had nowhere near the audience, ecosystem, or impact it has now . . . Read more.

9:50am

Add to your personal schedule
9:50am–9:55am Friday, 10/17/2014
Keynotes, Sponsored
Location: 1D
Average rating: **...
(2.57, 28 ratings)
Becoming an organization that can make agile decisions from agile data requires agile analytics Read more.

9:55am

Add to your personal schedule
9:55am–10:05am Friday, 10/17/2014
Keynotes
Location: 1D
Average rating: ****.
(4.78, 50 ratings)
Shankar Vedantam, NPR Science Desk, NPR Read more.

10:05am

Add to your personal schedule
10:05am–10:10am Friday, 10/17/2014
Keynotes, Sponsored
Location: 1D
Paul Zikopoulos (IBM CANADA)
Average rating: **...
(2.97, 29 ratings)
In this session you'll see an application that builds on in-place existing technologies like Hadoop to deliver understandable results. You'll hear a story where analytics at rest was applied to unstructured data using a simple SQL-like development environment, a&findings were promoted to the frontier of the business to score, in real time, monetizable intent, assess reputations & more. . Read more.

10:10am

Add to your personal schedule
10:10am–10:25am Friday, 10/17/2014
Keynotes
Location: 1D
Julia Angwin (ProPublica)
Average rating: ****.
(4.73, 44 ratings)
Julia Angwin discusses how much she has spent trying to protect her privacy, and raises the question of whether we want to live in a society where only the rich can buy their way out of ubiquitous surveillance. Read more.

11:00am

Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Design & Interfaces
Location: 1 E8/1 E9
Nathan Shetterley (Accenture), Joshua Patterson (NVIDIA), Allan Enemark (Accenture), Kathleen Moynahan (Accenture Technology Labs)
Average rating: ****.
(4.67, 3 ratings)
Nathan Shetterley, Josh Patterson, and their team, set out to change the visual identity of the world's largest IT consulting firm in the world. From grass roots public visualization to a global visual literacy curriculum, see how they made Accenture more focused on data visualization. In addition, they will share insights into the business value of data visualization to their firm and clients. Read more.
Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Security
Location: 1 E10/1 E11
Tags: finance
Mike Armstrong (ZestFinance)
Average rating: **...
(2.80, 5 ratings)
Last year, Douglas Merrill, CEO of ZestFinance and former Google CIO, discussed how success in big data analysis requires not just machines and algorithms, but also human analysis, or “data artists". Building on this notion, Mike Armstrong, CMO of ZestFinance, will discuss how companies can find, identify, and correct data inaccuracies. Read more.
Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Machine Data
Location: 1 E12/1 E13
Tags: iot
Alisher Maksumov (GE Software), Jean Lau (GE Software)
Average rating: **...
(2.80, 5 ratings)
Industrial systems produce large volumes of real-time data that can be analyzed using Big Data technologies in the data center environments. In many cases, such data needs to be analyzed at the edge before leaving industrial machines or systems that control them. This is possible if machines have intelligence to process data and make decisions. GE will share such use cases and experience. Read more.
Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Business & Industry
Location: 1 E14/1 E15
Michael Dauber (Amplify Partners), Sunil Dhaliwal (Amplify Partners), Shivon Zilis (Bloomberg Beta), Matthew Ocko (Data Collective), Sam Pullara (Sutter Hill Ventures)
Average rating: ***..
(3.25, 8 ratings)
To anticipate who will succeed and invest wisely, investors spend a lot of time trying to understand the longer-term trends within an industry. In this panel discussion, we’ll consider the big trends in Big Data, asking top-tier VCs to look over the horizon discuss the visions they have two or more years in the future. Read more.
Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Hadoop Platform
Location: Hall A 23/24
nick dimiduk (Hortonworks, Inc), Nicolas Liochon (Scaled Risk)
Average rating: ****.
(4.40, 5 ratings)
This talk examines sources of latency in HBase, detailing steps along the read and write paths. We'll examine the entire request lifecycle, from client to server and back again. We'll also look at the different factors that impact latency, including GC, cache misses, and system failures. Finally, the talk will highlight some of the work done in 0.96+ to improve the reliability of HBase. Read more.
Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Hadoop & Beyond
Location: 1 E20/1 E21
Philip (Flip) Kromer (CSC), Q McCallum (@qethanm)
Average rating: ***..
(3.54, 13 ratings)
What is the lambda architecture, and how do you put it to use for your streaming data? Flip Kromer and Q Ethan McCallum will explain how this works, using a live-updating recommendation engine as the supporting example. Read more.
Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Hadoop in Action
Location: 1 C03/1 C04
Praveen Neppalli Naga (Linkedin Corp), Chi-Yi Kuan (LinkedIn), Jonathan Wu (Linkedin)
Average rating: ***..
(3.88, 8 ratings)
LinkedIn processes enormous amounts of events each day. In this talk, you will learn the background of the data challenges that LinkedIn faced, how the teams came together to construct the solution, and the underlying stack structure powering this solution including an interactive analytics infrastructure and a self-serve data visualization frontend solution at fast scale. Read more.
Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Sponsored
Location: 1 E6/1 E7
Joseph Sirosh (Microsoft)
Average rating: ****.
(4.00, 4 ratings)
The cloud is an amazing game changer for Data Science. This talk will show with demos and real world customer examples the magic that every data scientist can now perform in the cloud... Read more.
Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Sponsored
Location: 1 E16/ 1 E17
Sid Sipes (SAP)
Studies are showing the vast majority of Big Data projects involve 2 or more data platforms. Moving data is costly and must be carefully considered. . . Read more.
Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Sponsored
Location: 1 D03/1 D04
Adam Pilz (SAS)
Average rating: ***..
(3.00, 1 rating)
Predictive modeling is as much art as it is science. The art is in matching your business questions to available data, and then pairing that data with the appropriate statistical techniques. Next comes model refinement, comparison and interpretation. We’ll demonstrate how SAS® and Hadoop work together to turn raw data into valuable information – and how you can visualize it for better decisions. Read more.
Add to your personal schedule
11:00am–11:40am Friday, 10/17/2014
Data Science
Location: 1D
Beau Cronin (Embedding.js)
Average rating: **...
(2.83, 6 ratings)
What does AI mean in 2014, and where is it headed? Every day brings news of purported breakthroughs, and some of the new applications are certainly impressive, but the field has witnessed boom/bust cycles before. What are the challenges that lie ahead this time? This talk will provide an overview of the state of the field, as well as a critical framework for thinking about the years ahead. Read more.

11:50am

Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Design & Interfaces
Location: 1 E8/1 E9
Andrew Hill (Set)
Average rating: ****.
(4.29, 7 ratings)
An important skill of today's data scientists is data communication. Mapping and other types of data visualization have been sufficient to try and demonstrate the trends and patterns these professions find in data. However, there is an important shift happening in the way we consume data that means as a community, we need to think about our ability to turn data into stories. Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Security
Location: 1 E10/1 E11
Jesse Shaw (LexisNexis)
Average rating: ****.
(4.40, 5 ratings)
This session will cover the value that linking algorithms bring to identity risk management, and how to apply linking algorithms, data and super compute capability to the challenge of identity risk management and identity fraud. We will also look at patterns of identity fraud, namely those (stolen) identities that have come back from the dead and how to differ those from real, live identities. Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Machine Data
Location: 1 E12/1 E13
Alasdair Allan (Babilim Light Industries)
Average rating: *****
(5.00, 7 ratings)
The trend towards cloud architectures we've seen over the last few years isn't sustainable. With tens of billions more Internet connected devices arriving over the next few years—far faster than any predicted increase in bandwidth to outside world—data is increasingly going to become a local problem, rather than a cloud problem. Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Business & Industry
Location: 1 E14/1 E15
Tags: fashion
Igor Elbert (Gilt.com)
Average rating: *****
(5.00, 2 ratings)
For a long time Internet retailers have been trying to move items they sell closer to customers. Flash sale site Gilt.com takes it to the extreme: we apply machine learning to predict customers' cravings for fashion products in different geographic regions without purchase history to draw from. Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Hadoop Platform
Location: Hall A 23/24
Jonathan Hsieh (Cloudera, Inc), Lars George (Cloudera)
Average rating: ***..
(3.40, 5 ratings)
Today, there are hundreds of production Apache HBase clusters running either entity-centric or event-based applications. Gathered from known clusters and a survey conducted by Cloudera's development, product, and services teams from their experiences with the nearly 20,000 HBase nodes under management, this talk categorizes these the gamut of use-case into a compact set of application archetypes. Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Hadoop & Beyond
Location: 1 E20/1 E21
Michael Armbrust (Databricks)
Average rating: ****.
(4.53, 15 ratings)
In this talk Michael will describe Spark SQL, the newest component of the Apache Spark stack. A key feature of Spark SQL is the ability to blur the lines between relational tables and RDDs, making it easy for developers to intermix SQL commands that query structured data with complex analytics in imperative or functional languages. Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Hadoop in Action
Location: 1 C03/1 C04
Ryan Goldman (Cloudera), Ryan Brush (Cerner Corporation), Sabrina Dahlgren (Kaiser Permanente), Aashima Gupta (Kaiser Permanente), Michael Thompson (Children's Healthcare of Atlanta)
Average rating: *****
(5.00, 1 rating)
In this panel discussion, individuals representing key stakeholders across the healthcare ecosystem will share the ways they're applying Hadoop to solve big data challenges that will ultimately improve the quality of patient care while driving better healthcare affordability. Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Sponsored
Location: 1 E6/1 E7
Tina Groves (IBM)
This session will describe the kinds of tools and solutions available in the market to tap into text sources. Two use cases will be discussed and short demos used to illustrate a tools and a solution approach. Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Sponsored
Location: 1 E16/ 1 E17
Altan Khendup @madmongol (Teradata Corporation), Ron Bodkin (Google)
Average rating: ****.
(4.00, 1 rating)
Developing Big Data applications for real-world business processes can be complex: method of processing, variety of systems, # of data sources. Large web companies have implemented a generic, scalable, fault-tolerant data processing architecture: LAMDA. We’ll explore this evolving architecture, design principles, layers/components, & use cases/lessons learned from real-world implementations. Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Sponsored
Location: 1 D03/1 D04
Bharath Aleti (Cisco), Samuel Kommu (Cisco Systems)
Average rating: ***..
(3.00, 1 rating)
In this session we talk about how to design, build and manage large scale enterprise Big Data deployments, with its high disk IO apps to in-memory solutions, for both on-premise as well as multi-tenant cloud environments taking holistic view of all the components including compute, network and the software stack Read more.
Add to your personal schedule
11:50am–12:30pm Friday, 10/17/2014
Data Science
Location: 1D
Douglas Moore (Think Big Analytics)
Average rating: **...
(2.92, 12 ratings)
We debunk some popular approaches and attitudes we have encountered over the course of more than 50 real world Big Data implementations. We will describe each anti-pattern and its appeal--but also why they fail, and how to do it right. Read more.

12:30pm

Add to your personal schedule
12:30pm–1:45pm Friday, 10/17/2014
Events
Location: North Hall and Hall 1A
Average rating: *....
(1.67, 3 ratings)
Birds of a Feather (BoF) discussions are a great way to informally network with people in similar industries or interested in the same topics. NOTE: BoFs are happening during lunch, which is not accessible to Expo Plus and Expo Only pass holders. Read more.

1:45pm

Add to your personal schedule
1:45pm–2:05pm Friday, 10/17/2014
Design & Interfaces
Location: 1 E8/1 E9
Leo Meyerovich (Graphistry)
Average rating: ****.
(4.33, 3 ratings)
Shoving 1MM rows of query results into a chart or graph returns illegible results and kills interactivity. Smarter designs, however, will achieve data visibility. Furthermore, running on GPUs turns static designs into interactive tools. We will show how Graphistry does this in production with (a) new client/cloud GPU infrastructure and (b) GPU-accelerated languages like Superconductor. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Security
Location: 1 E10/1 E11
Michelle Dennedy (McAfee, an Intel Company)
Average rating: *****
(5.00, 3 ratings)
People living in the Information Age are faced with a conundrum. They wish to be connected on a series of global, interconnected networks but they also wish to protect their privacy and to be left alone…sometimes. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Machine Data
Location: 1 E12/1 E13
Adi Krishnan (Amazon Web Services)
Average rating: ****.
(4.67, 6 ratings)
A lot of stationary, big data begins its life as small data in rapid motion - think logs, sensors, social data. The pressure is on architects, infra devops, and app developers to harness real-time data, and expose it to the right data processing paradigm. Learn how on AWS, services like Amazon Kinesis, Redshift, and Elastic MapReduce can be composed to deliver a smarter big data infrastructure. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Business & Industry
Location: 1 E14/1 E15
Michael Dauber (Amplify Partners), Renee DiResta (Haven), Matt Turck (FirstMark Capital), James Cham (Bloombergdata), Jake Flomenberg (Accel Partners)
Average rating: ****.
(4.60, 5 ratings)
The Big Data market is busy, with sky-high valuations and a rapid pace of innovation. This panel of data-focused Venture Capitalists will look at how they think about investing in and around the Big Data space—from the kind of deals they’re after, to how they like to work with entrepreneurs and founders. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Hadoop Platform
Location: Hall A 23/24
Chris Nauroth (Hortonworks), Suresh Srinivas (Hortonworks)
Average rating: ****.
(4.25, 4 ratings)
Are you taking advantage of all of Hadoop’s features to operate a stable and effective cluster? Inspired by real-world support cases, this talk discusses best practices and new features to help improve incident response and daily operations. Chances are that you’ll walk away from this talk with some new ideas to implement in your own clusters. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Hadoop & Beyond
Location: 1 E20/1 E21
Hossein Falaki (Databricks Inc.)
Average rating: ***..
(3.92, 24 ratings)
We will demonstrate how to combine visual tools with Spark to apply three specific techniques to visually explore big data using a) summarize and visualize, b) sample and visualize, and c) model and visualize. We will use a real big dataset, such as Wikipedia traffic logs, to demonstrate these techniques in a live demo. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Hadoop in Action
Location: 1 C03/1 C04
Allen Day (MapR Technologies)
Average rating: ***..
(3.75, 4 ratings)
Medicine is undergoing a renaissance made possible by analyzing and creating insights from this huge and growing number of genomes. This session will showcase how ETL and MapReduce can be applied in a clinical session. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Sponsored
Location: 1 E6/1 E7
James Dixon (Pentaho)
Average rating: ****.
(4.00, 1 rating)
Visualizations can be easy on the eyes, until you need to view data at scale. In this session, James Dixon, CTO and Co-Founder of Pentaho will talk about ways of presenting large scale datasets. Using data from the City of Chicago, James will present practical examples that help distill large amounts of data in ways that are easier for users to comprehend. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Sponsored
Location: 1 E16/ 1 E17
sunil venkayala (HP), Indrajit Roy (HP Labs)
Join us to learn how to leverage new Distributed R open source technology from the HP Labs and HP Vertica. Distributed R platform introduces new easy to use distributed programming model and infrastructure for the R language. Distributed R includes out-of-the-box open source parallel R algorithms that can scale for terabytes of data. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Sponsored
Location: 1 D03/1 D04
Stephanie McKinley (Independent Consultant), Xavier Quintuna (Orange), Shirshanka Das (LinkedIn), Charlie Crocker (Autodesk), Anna Dorofiyenko (MarketShare)
Average rating: *****
(5.00, 1 rating)
Agile data transformation uses Hadoop’s schema-on-read capability to manipulate raw data as needed for business purposes. Transforming data can be a barrier to data access and agility— consuming up to 80% of business analyst time. Hear directly from LinkedIn, Autodesk, MarketShare, and Orange about how predictive interactions make agile data transformation a reality on Hadoop. Read more.
Add to your personal schedule
1:45pm–2:25pm Friday, 10/17/2014
Data Science
Location: 1D
Vishal Chowdhary (Microsoft)
Average rating: ***..
(3.67, 6 ratings)
Microsoft Translator currently supports 100+ languages. We constantly improve the translation quality, add new scenarios, all with a constant team size. This session describes a production scale ML architecture using MS Translator as a case study. You will learn the mental model to approach your ML problem and concrete Do’s and Don’ts for the various components of the ML system architecture. Read more.

2:05pm

Add to your personal schedule
2:05pm–2:25pm Friday, 10/17/2014
Connected World, Data Science, Design & Interfaces
Location: 1 E8/1 E9
Lauro Lins (AT&T Labs)
Average rating: **...
(2.75, 4 ratings)
Nanocubes is an open source project that can be used to visually explore large spatiotemporal datasets at interactive rates using a web browser. Read more.

2:35pm

Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Design & Interfaces
Location: 1 E8/1 E9
Average rating: **...
(2.83, 6 ratings)
IDEO's Hybrid team brings all the design tools from IDEO's product design process to work with clients on data oriented projects. The team will share elements of their process and case studies to show how incorporating human-centered techniques from design can improve data as an input to decision making. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Data Science, Security
Location: 1 E10/1 E11
Bahman Bahmani (Stanford University)
Average rating: ***..
(3.60, 5 ratings)
As in a game of chess, successful use of machine learning techniques against adaptive adversaries, such as spammers and intruders, requires designing the learning algorithms having anticipated the opponent’s response to those algorithms. In this talk, we present techniques to design robust machine learning algorithms for adversarial environments and provide clarifying attack-defense examples. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Machine Data
Location: 1 E12/1 E13
Tags: iot
Jodok Batlogg (CRATE Technology GmbH)
Average rating: *****
(5.00, 2 ratings)
After babysitting Hadoop clusters for many years and knowing the limitations really well we had the chance to design and implement the cloud infrastructure for a large connected home platform from scratch. We’ll show how we’ve built that backend with Crate Data and Twitter Storm and why this is a perfect match for this workload. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Business & Industry
Location: 1 E14/1 E15
John Akred (Silicon Valley Data Science), Karim Qazi (Edmunds.com)
Average rating: *****
(5.00, 1 rating)
PDFs are the bane of data science, a jail from which machine-readable data struggles to escape. We'll explain how at Edmunds.com we freed data from diverse auto manufacturer PDFs, applied NLP and entity recognition, and integrated the results into the expert-driven process of defining vehicle models. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Hadoop Platform
Location: Hall A 23/24
Anubhav Dhoot (Cloudera)
Average rating: ****.
(4.88, 8 ratings)
This talk will cover resource management using YARN - the new resource management platform introduced in Hadoop 2.0. It will cover how it achieves effective cluster utilization, fair sharing of resources, and allow different type of applications to utilize the cluster. We will go over the architecture, recent improvements, and things coming down the pipeline. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Hadoop & Beyond
Location: 1 E20/1 E21
Anil Madan (PayPal)
Average rating: ***..
(3.71, 14 ratings)
Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3 Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Hadoop in Action
Location: 1 C03/1 C04
Ailey Crow (Pivotal)
Average rating: ****.
(4.29, 7 ratings)
Automated image processing improves efficiency for a diverse range of applications from defect detection in manufacturing to tumor detection in medical images. We’ll go beyond traditional approaches to image processing, which fail for large image datasets, by leveraging Hadoop for processing a vast number of arbitrarily large images. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Sponsored
Location: 1 E6/1 E7
Average rating: **...
(2.67, 3 ratings)
Leveraging Hadoop data, served to users with advanced visualization in MicroStrategy, Netflix delivers effective, responsive insights quickly. This puts advanced analytics in the hands of business users who make the decisions that help the online entertainment network to outperform their rivals by serving consumers the content they want, how they want it. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Sponsored
Location: 1 E16/ 1 E17
Donald Farmer (Qlik)
Data continues to drive innovation, yet it’s how we interpret and use that data that becomes imperative to success. Using a design approach called Natural Analytics, technology can leverage the way our human curiosity searches and processes information. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Sponsored
Location: 1 D03/1 D04
Pravin Darbare (Western Union), Sumeet Agrawal (Informatica)
Average rating: *****
(5.00, 4 ratings)
Learn how Western Union uses Hadoop with Informatica to parse and integrate Omniture web log files, XML data, and relational transactions data to meet their current and future data analysis needs. Read more.
Add to your personal schedule
2:35pm–3:15pm Friday, 10/17/2014
Data Science
Location: 1D
Tags: fashion
Karen Moon (Trendalytics), Vijay Subramanian (Rent the Runway), Liza Kindred (Lullabot)
Average rating: **...
(2.50, 2 ratings)
Karen Moon, Co-founder and CEO, Trendalytics Read more.

4:15pm

Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Design & Interfaces
Location: 1 E8/1 E9
Trina Chiasson (Tableau Software)
Average rating: ***..
(3.00, 3 ratings)
Text adds clarity to visualizations and helps authors communicate. There are many text elements to consider when making a chart: axis titles, category and data labels, gridline labels, legends, citations, and annotations, to name a few. This talk will dive into the specifics of typography and text placement in information design. Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Security
Location: 1 E10/1 E11
Adam Fuchs (Sqrrl)
Average rating: ****.
(4.10, 10 ratings)
The Internet is a warzone. Any business with a digital presence needs to protect itself from threats that exist in cyberspace. In this presentation, we’ll show you how to build a real-time anomaly detection system using Sqrrl Enterprise and Apache Spark GraphX to monitor and surface advanced persistent threats and malicious actor attacks. Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Machine Data
Location: 1 E12/1 E13
Tags: iot
Damian Black (SQLstream Inc)
Average rating: *****
(5.00, 1 rating)
Born as a solution built for RMS (the Australian Government agency managing and regulating the use of roads in New South Wales), this Internet of Things application for smarter transportation services provides a real-time data hub for transportation sensor networks, network information and traveler information, offering actionable insight into network performance, congestion, and incidents. Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Business & Industry
Location: 1 E14/1 E15
Tags: fashion
Liza Kindred (Third Wave Fashion), David Whittemore (Clothes Horse), Gina Mancuso (LoveThatFitTM (LTF)), Rasmus Thofte (Virtusize)
Panel discussion, moderated by Liza Kindred, Founder, Third Wave Fashion. Panelists include David Whittemore, founder of Clotheshorse; Gina Mancusco, founder of Love That Fit; and Rasmus Thofte, head of North America at Virtusize. Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Hadoop Platform
Location: Hall A 23/24
Jean-Daniel Cryans (Cloudera)
Average rating: ****.
(4.80, 5 ratings)
This presentation will show you how to get your Big Data into Apache HBase as fast as possible. Those 40 minutes will save you hours of debugging and tuning, with the added bonus of having a better understanding of how HBase works. You will learn things like the write path, bulk loading, HFiles, and more. Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Hadoop Platform
Location: 1 E20/1 E21
Greg Rahn (Cloudera)
Average rating: ****.
(4.80, 5 ratings)
In the last two years we've seen the introduction of several open-source SQL engines for Hadoop. There have been numerous marketing claims around SQL-on-Hadoop performance but what should you believe? How do these different engines compare on functionality? This talk will compare and contrast Hive, Impala, and Presto all from an non-vendor, unsponsored, independent point of view. Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Hadoop in Action
Location: 1 C03/1 C04
Daniel Weeks (Netflix)
Average rating: ****.
(4.80, 5 ratings)
Netflix continues evolve its big data architecture in the cloud with performance enhancements and updated OSS offerings. We will share our experiences and selections in file formats, interactive query engines, and instance types. Genie emerges with updates to support YARN applications and we will unveil a new performance visualization tool, Inviso. Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Sponsored
Location: 1 E6/1 E7
It's been twenty years since Red Hat first launched Linux. Since then Red Hat has fueled the rapid adoption of open source technologies. As Big Data transitions into enterprise mode, Red Hat is again poised to facilitate the innovation and communities needed to empower multiple data stakeholders across your organization so you can truly open the possibilities of your data. Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Sponsored
Location: 1 E16/ 1 E17
Moderated by:
Barry Devlin (9sight Consulting)
Panelists:
Sunil Soares (Information Asset), Joseph Dossantos (EMC Consulting ), Jay Zaidi (Fannie Mae)
Average rating: ****.
(4.50, 2 ratings)
If the allure of Big Data is that you can throw it all in the data lake and process it cheaply and quickly, then the catch is how do you know what's in there and how do you govern it?   A Big Data lake needs data governance to create trusted data, ensure consistency, and secure information appropriately. This session will discuss how to start putting a Big Data governance framework in place. Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Sponsored
Location: 1 D03/1 D04
Sid Probstein (Attivio)
Average rating: ****.
(4.00, 1 rating)
Many big data instances overlook human created content, we will discuss the business value and technology that can be used to tap into the power and showcase real life use cases. This content makes up the majority of the information produced by most organizations. Despite this fact, it has been under-used, under-valued and under-analyzed because of legacy technology limitations. Read more.
Add to your personal schedule
4:15pm–4:55pm Friday, 10/17/2014
Data Science
Location: 1D
Cliff Click (0xdata)
Average rating: ****.
(4.50, 4 ratings)
H2O presents the worlds fastest Distributed Parallel GBM. GBM is a ML algorithm used to win many recent Kaggle competitions, and is well known for it's high quality results. Read more.

5:05pm

Add to your personal schedule
5:05pm–5:45pm Friday, 10/17/2014
Business & Industry
Location: 1 E8/1 E9
Tags: fashion
Rachel Kalmar (Sensored)
Average rating: *****
(5.00, 2 ratings)
Whether you're lining up for an Apple Watch, using the heads-up display of Google Glass, or sporting one of the hundreds of activity and sleep trackers, it's clear that wearable technology is exploding. No longer bulky and cumbersome gadgets, today's wearables are fashionably... data-chic. Read more.
Add to your personal schedule
5:05pm–5:45pm Friday, 10/17/2014
Security
Location: 1 E10/1 E11
Tags: finance
Roy Singh (Guavus)
Average rating: **...
(2.40, 5 ratings)
In this session Guavus’ Chief Technology Officer, Roy Singh, will present a framework using an operational intelligence platform based on Apache Spark, for providing a pipeline for anomaly detection, causality analysis, anomaly prediction, and actionable alerts. Read more.
Add to your personal schedule
5:05pm–5:45pm Friday, 10/17/2014
Machine Data
Location: 1 E12/1 E13
Daniel Koffler (Rio Tinto Alcan)
Average rating: *****
(5.00, 1 rating)
Rio Tinto is one of the world’s leading mining companies. Our current technology focus is around using innovation to realize our vision of the Mine of The Future. Join us as we explore how natural resource companies are using Big Data techniques to visualize resource deposits, enable fully autonomous rail road systems and global system monitoring. Read more.
Add to your personal schedule
5:05pm–5:45pm Friday, 10/17/2014
Business & Industry
Location: 1 E14/1 E15
Tags: retail
Majken Sander (TimeXtender)
A look at how we use public governmental data to answer questions about our customers and their behavior; data used by marketing, space management, and product managers. Other government data is used to support the company's sales forecast, which items to purchase, predict amount to be purchased, and determining which items to phase out. All Data driven management - made even easier with Hadoop. Read more.
Add to your personal schedule
5:05pm–5:45pm Friday, 10/17/2014
Hadoop Platform
Location: Hall A 23/24
Uri Laserson (Cloudera)
Average rating: *****
(5.00, 1 rating)
Impala provides the ability to easily analyze large, distributed data sets. This talk will cover the impyla package, which aims to make data science easier with Impala by integrating with Python. The impyla package currently supports programmatically interacting with Impala, running distributed machine learning in Impala, and compiling Python UDFs into assembly instructions via LLVM. Read more.
Add to your personal schedule
5:05pm–5:45pm Friday, 10/17/2014
Hadoop & Beyond
Location: 1 E20/1 E21
David Jonker (Uncharted Software Inc.), Rob Harper (Uncharted)
Average rating: *****
(5.00, 4 ratings)
The widespread adoption of web-based maps provides a familiar set of interactions for exploring large data spaces. Building on these techniques, Tile-based visual analytics provides interactive visualization of billions of points of data or more. This session provides an overview of technical challenges and promise using applications created with the open source Aperture Tiles framework on GitHub. Read more.
Add to your personal schedule
5:05pm–5:45pm Friday, 10/17/2014
Hadoop in Action
Location: 1 C03/1 C04
Sabrina Dahlgren (Kaiser Permanente), Rajiv Synghal (Kaiser Permanente)
Average rating: **...
(2.50, 2 ratings)
Kaiser Permanente is dedicated to improving the quality of healthcare, and big data presents numerous opportunities to drive this mission forward. Read more.
Add to your personal schedule
5:05pm–5:45pm Friday, 10/17/2014
Data Science
Location: 1 E6/1 E7
Josh Levy (Vast)
Average rating: ***..
(3.00, 1 rating)
By reducing friction from deploying models and comparing competing models, data scientists can focus on high-value efforts. At Vast we've experimented with tools and strategies for this while shipping a suite of data products for consumers and agents in the midst of some of life’s biggest purchases. I'll share best practices and lessons learned, and help you free up time for the fun stuff. Read more.