Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Monday, 09/28/2015

9:00am

9:00am–5:00pm Monday, 09/28/2015
Cultivate
Location: Hall B
We’re at the cusp of a new network age. The companies defining it are fast, flat, and flexible. They devour data and focus obsessively on their customers. “Analyze and adapt” is their Standing Operating Procedure. At Cultivate, they’ll tell you how they do it—and how you can, too. Read more.

Tuesday, 09/29/2015

7:00am

7:00am–9:00am Tuesday, 09/29/2015
Location: Break
| 8:00am - 9:00am Coffee Break | 10:30am - 11:00am Morning Break, sponsored by SAS (3D, 1E) (2h)

9:00am

9:00am–5:00pm Tuesday, 09/29/2015
Hardcore Data Science
Location: 1 E10 / 1 E11
Ben Lorica (O'Reilly Media), Reza Zadeh (Matroid | Stanford), David Blei (Columbia University), Anima Anandkumar (UC Irvine), Hussein Mehanna (Facebook), Jennifer Chayes (Microsoft Research), Ben Recht (University of California, Berkeley), Tanzeem Choudhury (Cornell and HealthRhythms), Jenn Wortman Vaughan (Microsoft Research), Adam Marcus (B12), Stefanie Jegelka (M.I.T.), Mikhail Bilenko (Microsoft), Reynold Xin (Databricks)
Average rating: ****.
(4.00, 4 ratings)
All-Day: Strata's regular data science track has great talks with real-world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting... Read more.
9:00am–5:00pm Tuesday, 09/29/2015
Data-driven Business
Location: 1 E14 / 1 E15
Alistair Croll (Solve For Interesting), Farrah Bostic (The Difference Engine), Mark Madsen (Think Big Analytics), krish venkataraman (Syncsort), Amy OConnor (Cloudera), Bill Franks (Teradata Corporation), Jake Kendall (Bill & Melinda Gates Foundation), Tricia Wang (Constellate Data ), Cécile Barbaroux (Schibsted Classified Media), Kristi Marotta (Allstate), Adam Devine (WorkFusion), Rahel Jhirad (Hearst), Alexander White (Next Big Sound), Jana Eggers (Nara Logics), Vincent Dell'Anno (Accenture), Fredrik Backner (Telia Company ), Bill Moschella (Evariant), Florin Trandafir (Nokia)
Average rating: ****.
(4.20, 5 ratings)
All-day: For business strategists, marketers, product managers, and entrepreneurs, Data-Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with big data. It's the missing MBA for a data-driven, always-on business world. Read more.
9:00am–5:00pm Tuesday, 09/29/2015
Data Science & Advanced Analytics
Location: 1 E16 / 1 E17
Garrett Grolemund (RStudio), Yihui Xie (RStudio, Inc.), Nathan Stephens (RStudio, Inc.), Randall Prium (Calvin College)
Average rating: ****.
(4.20, 15 ratings)
From advanced visualization, collaboration, and reproducibility to data manipulation, R Day at Strata covers a raft of current topics that analysts and R users need to pay attention to. The R Day tutorials come from leading luminaries and R committers, the folks keeping the R ecosystem apace of the challenges facing analysts and others who work with data. Read more.
9:00am–12:30pm Tuesday, 09/29/2015
Hadoop Internals & Development
Location: 3D 02/11 Level: Intermediate
Gwen Shapira (Confluent), Jonathan Seidman (Cloudera), Ted Malaska (Capital One), Mark Grover (Lyft)
Average rating: ***..
(3.72, 29 ratings)
Looking for a deeper understanding of how to architect real-time data processing solutions? Then this tutorial is for you. In Part 1 of "Architecture Day," We will build a fraud-detection system, and use it as an example to discuss considerations for building such a system; how you’d integrate various technologies; and why those choices make sense for the use case in question. Read more.
9:00am–12:30pm Tuesday, 09/29/2015
Data Science & Advanced Analytics
Location: 3D 03/10 Level: Advanced
Sean Owen (Cloudera), Juliet Hougland (Cloudera), Sandy Ryza (Clover Health)
Average rating: **...
(2.96, 24 ratings)
In this tutorial, attendees will get a taste of how large-scale data science techniques and technologies developed for the consumer internet can be applied in the world of finance. We will guide an exploration of the relationship between the traffic on Wikipedia pages to the movement of stock prices. Read more.
9:00am–12:30pm Tuesday, 09/29/2015
Data Innovations
Location: 3D 04/09 Level: Intermediate
Jesse Anderson (Big Data Institute), Ewen Cheslack-Postava (Confluent)
Average rating: ***..
(3.25, 12 ratings)
This is a hands-on workshop where you’ll learn how to leverage the capabilities of Kafka to collect, manage, and process stream data for big data projects and general purpose enterprise data integration needs alike. When your data is captured in real-time and available as real-time subscriptions, you can start to compute new datasets in real-time off these original feeds. Read more.
9:00am–12:30pm Tuesday, 09/29/2015
Business & Innovation
Location: 3D 05/08
Marie Beaugureau (O'Reilly Media, Inc. ), Paco Nathan (derwen.ai), Tim Berglund (Confluent), Edd Wilder-James (Google), Matthew Gee (Impact Lab/University of Chicago ), Yael Garten (LinkedIn), Katie Kent (Galvanize)
Average rating: ***..
(3.78, 9 ratings)
Whether starting a data science program, reaching the breaking point with your current data technology, or figuring out what the competition is up to, these sessions will give you a bird's-eye view of data technologies, techniques, and data-driven organizations. Read more.
9:00am–5:00pm Tuesday, 09/29/2015
Data Science & Advanced Analytics
Location: 1 E6 / 1 E7 Level: Intermediate
Average rating: ***..
(3.63, 19 ratings)
This hands-on, beginner-friendly tutorial provides a quick start to building intelligent business applications using machine learning. Learn about machine learning basics, feature engineering, recommender systems, and deep learning. The program includes hands-on portions to build and deploy large-scale machine learning applications. Read more.
9:00am–5:00pm Tuesday, 09/29/2015
Data Science & Advanced Analytics
Location: 1 E12/ 1 E13
Travis Oliphant (Anaconda), Peter Wang (Anaconda), Kyle Kelley (Netflix), Andrew Odewahn (O'Reilly Media), Paige Bailey (Microsoft), Jeff Reback (Continuum Analytics), Andy Terrel (NumFOCUS), Bryan Van de Ven (Continuum Analytics), Sarah Bird (Aptivate), James Powell (NumFOCUS), Phil Cloud (Continuum), Jason Grout (Bloomberg LP), Chris Colbert (Anaconda Powered by Continuum Analytics), Owen Zhang (DataRobot), Peter Prettenhofer (DataRobot), Damon McDougall (UT Austin), Michael Droettboom (Space Telescope Science Institute), Jim Crist (Continuum Analytics), Benjamin Zaitlen (Anaconda), Andreas Mueller (NYU, scikit-learn)
Average rating: ***..
(3.50, 10 ratings)
Python has become an increasingly important part of the data engineer and analytic tool landscape. Pydata at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including IPython Notebook, NumPy/matplotlib for visualization, SciPy, scikit-learn, and how to scale Python performance, including how to handle large, distributed data sets. Read more.
9:00am–12:30pm Tuesday, 09/29/2015
Design, User Experience, & Visualization
Location: 3D 06/07 Level: Intermediate
Brian Suda (optional.is)
Average rating: ****.
(4.29, 7 ratings)
The term data vizualization can mean anything from charts and graphs to infographics to big data and everything in between. In this tutorial, we’ll look at the basics of how to design with data, specifically using the industry standard D3 library. By the end, you'll be able to create data vizualizations with your own data sets. Read more.
SOLD OUT
9:00am–5:00pm Tuesday, 09/29/2015
Training
Location: 3D 01/12
Laurent Weichberger (OmPoint Innovations, LLC)
Average rating: ***..
(3.86, 7 ratings)
This three-day curriculum features advanced lectures and hands-on technical exercises for Spark usage in data exploration, analysis, and building big data applications. Read more.
SOLD OUT
9:00am–5:00pm Tuesday, 09/29/2015
Training
Location: 1B 03
Brandon MacKenzie (IBM), John Rollins (IBM), Jacques Roy (IBM), Chris Fregly (PipelineAI), Mokhtar Kandil (IBM)
Average rating: **...
(2.50, 12 ratings)
In this three-day course, you will: * Learn how to use machine learning, text analysis, and real-time analytics to solve frequently encountered, high-value business problems, * Understand data science methodology and end-to-end work flow of problem solution including data preparation, model building and validation, and model deployment, * Use Apache Spark and other tools for analytics. Read more.
9:00am–5:00pm Tuesday, 09/29/2015
Training
Location: 1B 04
Nathan Neff (Cloudera)
Average rating: ****.
(4.25, 4 ratings)
Cloudera University’s three-day course for designing and building big data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). Read more.
9:00am–5:00pm Tuesday, 09/29/2015
Cultivate
Location: Hall B
We’re at the cusp of a new network age. The companies defining it are fast, flat, and flexible. They devour data and focus obsessively on their customers. “Analyze and adapt” is their Standing Operating Procedure. At Cultivate, they’ll tell you how they do it—and how you can, too. Read more.
SOLD OUT
9:00am–5:00pm Tuesday, 09/29/2015
Spark & Beyond
Location: 1 E19/ 1 E 20/ 1 E21 Level: Intermediate
Anthony D. Joseph (UC Berkeley | Databricks)
Average rating: ***..
(3.32, 50 ratings)
Spark Camp provides a day long hands-on intro to the Spark platform including the core API, Spark SQL, Spark Streaming, MLlib, GraphX, and more. We will cover each Spark component through a series of technical talks targeted at developers who are new to Spark -- intermixed with hands-on lab work. Read more.
9:00am–5:00pm Tuesday, 09/29/2015
Data-driven Business
Location: 1 E18
Roger Magoulas (O'Reilly Media), Roger Chen (Computable Labs), Ari Gesher (Kairos Aerospace), Hilary Mason (Cloudera Fast Forward Labs), Eva Ho (Susa Ventures), Matthew Tamayo-Rios (Kryptnostic), Ann Johnson (Interana), Gary Marcus (Geometric Intelligence), Shivon Zilis (Bloomberg Beta), Jacomo Corbo (QuantumBlack), Peter Brodsky (HyperScience), Cack Wilhelm (Scale Venture Partners), Alex Rice (HackerOne), Chris Wake (Spire Global, Inc.), Harper Reed (Modest), Dennis Mortensen (x.ai), Rajiv Maheswaran (Second Spectrum), Jessica Stauth (Quantopian)
Average rating: ***..
(3.93, 14 ratings)
This is a day to learn about the data innovations that have the potential to blindside even the most careful organizations. Aimed at decision makers, the Innovation + Growth program focuses on how data-oriented startups, academics, and venture capitalists approach innovation and the potential to disrupt incumbent business models. Read more.

1:30pm

1:30pm–5:00pm Tuesday, 09/29/2015
Production Ready Hadoop
Location: 3D 02/11 Level: Intermediate
Tom White (Cloudera), Ryan Blue (Cloudera)
Average rating: ****.
(4.40, 5 ratings)
In the second (afternoon) half of the Architecture Day tutorial, attendees will build a data application from the ground up. As a part of the tutorial, we will demonstrate how Kite codifies the best practices from the Hadoop Architecture Day morning session. Read more.
SOLD OUT
1:30pm–5:00pm Tuesday, 09/29/2015
Spark & Beyond
Location: 3D 03/10 Level: Intermediate
Stephen O'Sullivan (Data Whisperers), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)
Average rating: ***..
(3.38, 24 ratings)
What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads. Read more.
1:30pm–5:00pm Tuesday, 09/29/2015
IoT & Real-time
Location: 3D 04/09 Level: Advanced
Patrick McFadin (DataStax)
Average rating: ****.
(4.53, 15 ratings)
This tutorial is all about managing large volumes of data coming at your data center fast and continuously. If you don't have a strategy, then allow me to help. Amazing Apache Project software can make this problem a lot easier to deal with. Spend a few hours and learn about how each part works, and how they work together. Your users will thank you. Read more.
1:30pm–5:00pm Tuesday, 09/29/2015
Data-driven Business
Location: 3D 05/08 Level: Intermediate
Scott Kurth (Silicon Valley Data Science), Edd Wilder-James (Google)
Average rating: ***..
(3.53, 17 ratings)
Big data and data science have great potential for accelerating business, but how do you reconcile the opportunity with the sea of possible technologies? Conventional data strategy has little to guide us, focusing more on governance than on creating new value. In this tutorial, we explain how to create a modern data strategy that powers data-driven business. Read more.
1:30pm–5:00pm Tuesday, 09/29/2015
Spark & Beyond
Location: 3D 06/07 Level: Intermediate
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Average rating: ***..
(3.88, 17 ratings)
Apache Drill is an open source distributed SQL engine for Hadoop, NoSQL databases, and other services. Drill's unique schema-free JSON data model enables self-service data exploration and analysis by eliminating the need to define/maintain schemas and transform data. This is a comprehensive hands-on tutorial that will enable you to start exploring and analyzing your data in place, wherever it is. Read more.

5:00pm

5:00pm–6:30pm Tuesday, 09/29/2015
Events
Location: 3E
Average rating: *****
(5.00, 3 ratings)
Grab a drink, mingle with fellow Strata + Hadoop World participants, and see the latest technologies and products from leading companies in the data space. Read more.

6:30pm

6:30pm–8:00pm Tuesday, 09/29/2015
Events
Location: Javits North
Average rating: ****.
(4.00, 1 rating)
What new companies are at the leading edge of the data space? Meet some of the best, most innovative founders as they demonstrate their game-changing ideas at the Startup Showcase. Read more.

Wednesday, 09/30/2015

6:30am

6:30am–7:30am Wednesday, 09/30/2015
Events
Location: Hudson River Park
Average rating: *****
(5.00, 2 ratings)
Please join Cloudera and O'Reilly Media for the Data Dash run / walk, held in conjunction with Strata + Hadoop World in New York 2015. Read more.

8:45am

8:45am–8:50am Wednesday, 09/30/2015
Location: Javits North
Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: ****.
(4.05, 60 ratings)
Program Chairs Roger Magoulas, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

8:50am

8:50am–9:05am Wednesday, 09/30/2015
Location: Javits North
Mike Olson (Cloudera)
Average rating: ***..
(3.98, 60 ratings)
Mike Olson, CSO and Chairman, Cloudera Read more.

9:00am

SOLD OUT
9:00am–5:00pm Wednesday, 09/30/2015
Training
Location: 3D 01/12
Laurent Weichberger (OmPoint Innovations, LLC)
Average rating: ***..
(3.67, 0 ratings)
This three-day curriculum features advanced lectures and hands-on technical exercises for Spark usage in data exploration, analysis, and building big data applications. Read more.
SOLD OUT
9:00am–5:00pm Wednesday, 09/30/2015
Training
Location: 1B 03
Brandon MacKenzie (IBM), John Rollins (IBM), Jacques Roy (IBM), Chris Fregly (PipelineAI), Mokhtar Kandil (IBM)
Average rating: **...
(2.60, 0 ratings)
In this three-day course, you will: * Learn how to use machine learning, text analysis, and real-time analytics to solve frequently encountered, high-value business problems, * Understand data science methodology and end-to-end work flow of problem solution including data preparation, model building and validation, and model deployment, * Use Apache Spark and other tools for analytics. Read more.
9:00am–5:00pm Wednesday, 09/30/2015
Training
Location: 1B 04
Nathan Neff (Cloudera)
Average rating: ****.
(4.50, 0 ratings)
Cloudera University’s three-day course for designing and building big data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). Read more.

9:05am

9:05am–9:15am Wednesday, 09/30/2015
Location: Javits North
AnnMarie Thomas (School of Engineering and Schulze School of Entrepreneurship, University of St. Thomas)
Average rating: ***..
(3.64, 72 ratings)
Unusual collaborations can often lead to new ways of taking, and analyzing data. This talk looks at lessons learned from working with chefs, circus performers, and preschoolers. Read more.

9:15am

9:15am–9:25am Wednesday, 09/30/2015
Sponsored
Location: Javits North
Joseph Sirosh (Microsoft)
Average rating: ****.
(4.46, 85 ratings)
Join Microsoft’s Joseph Sirosh for a behind-the-scenes sneak peek into the creation of the viral phenomenon How-Old.net. He'll cover how it got to 50 million users in 7 days, the unexpected big data challenges that came with it, and the surprising learnings they had about people and systems. Read more.

9:25am

9:25am–9:30am Wednesday, 09/30/2015
Sponsored
Location: Javits North
Ron Kasabian (Intel), Michael Draugelis (Penn Medicine)
Average rating: ***..
(3.63, 79 ratings)
Even in this era of intense medical breakthroughs, many illnesses still evade accurate and timely diagnosis. Clinicians' must often rely on static diagnostic guidelines, that result in late care and too many false alarms. Half of all heart failure patients can go undiagnosed. Read more.

9:30am

9:30am–9:35am Wednesday, 09/30/2015
Sponsored
Location: Javits North
Tim Howes (ClearStory Data)
Average rating: ***..
(3.42, 67 ratings)
This keynote unveils why rapid modernization of BI is taking place, the business use cases driving it, and what’s essential in next-generation solutions. Read more.

9:35am

9:35am–9:40am Wednesday, 09/30/2015
Sponsored
Location: Javits North
Jim McHugh (Cisco)
Average rating: ***..
(3.72, 61 ratings)
IoE, IoT, and big data – three topics you hear and read about often in our various industries. Let’s quickly look at these market and technology dynamics, and see how they are each in their own way ’democratizing’ data access and analysis, resulting in new businesses, technologies, and improved community solutions throughout the world. Read more.

9:40am

9:40am–9:50am Wednesday, 09/30/2015
Location: Javits North
Joy Johnson (AudioCommon)
Average rating: ***..
(3.98, 84 ratings)
Joy Johnson, VP, Mobile, AudioCommon Read more.

9:50am

9:50am–10:00am Wednesday, 09/30/2015
Location: Javits North
David Boyle (MasterClass)
Average rating: ***..
(3.76, 76 ratings)
Are creative businesses the last battleground for data-driven decision making? Drawing lessons from successes and failures in the music industry, book publishing, and TV, David Boyle will argue for a negotiated settlement in the war between data and creative, and show how long-term and mutually beneficial peace can work. Read more.

10:00am

10:00am–10:10am Wednesday, 09/30/2015
Location: Javits North
DJ Patil (White House Office of Science and Technology Policy)
Average rating: ***..
(3.95, 100 ratings)
DJ Patil, U.S. Chief Data Scientist at White House Office of Science and Technology Policy Read more.

10:10am

10:10am–10:25am Wednesday, 09/30/2015
Location: Javits North
Katherine Milkman (Wharton School at the University of Pennsylvania)
Average rating: ****.
(4.09, 75 ratings)
Katherine will discuss recent behavioral science research suggesting how a number of simple, inexpensive tools can be used to encourage improved decisions. Read more.

10:25am

10:25am–10:30am Wednesday, 09/30/2015
Location: Javits North
Ben Lorica (O'Reilly Media)
Average rating: ***..
(3.66, 58 ratings)
Ben Lorica, Program Director, O'Reilly Media. Read more.

10:30am

10:30am–10:45am Wednesday, 09/30/2015
Location: Javits North
Jeff Jonas (IBM)
Average rating: ****.
(4.44, 84 ratings)
Jeff Jonas, IBM Fellow; Chief Scientist, Context Computing Read more.

11:20am

11:20am–12:00pm Wednesday, 09/30/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Intermediate
Wes McKinney (Two Sigma Investments)
Average rating: ***..
(3.70, 10 ratings)
Many data science and data analytics applications are written in Python or R, but developing and deploying these applications at scale or in production is a pain point for many users. We will discuss our new efforts to bridge the gap between familiar in-memory data tools and distributed data management systems using Python and Impala. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Intermediate
Billy Newport (Goldman Sachs)
Average rating: ***..
(3.85, 34 ratings)
The combination of data, technology, and analytics creates previously impossible business intelligence opportunities. How well companies can capture and manage their data so that it can be easily and consistently queried will be a key differentiator in deriving commercial value from data. Learn how Goldman is developing an enterprise platform to unify and manage data across the firm. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Hadoop Use Cases
Location: 1 E16 / 1 E17 Level: Intermediate
Greg Rahn (Cloudera)
Average rating: ***..
(3.80, 5 ratings)
The flexibility and simplicity of JSON have made it one of the most common formats for data. Data engines need to be able to load, process, and query JSON and nested data types quickly and efficiently. There are multiple approaches to processing JSON data, each with trade offs. In this session we’ll compare and contrast the approaches taken by systems such as Hive, Drill, BigQuery, and others. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Intermediate
Joe Hellerstein (UC Berkeley)
Average rating: ****.
(4.22, 18 ratings)
As the Hadoop ecosystem grows more complex, there is widespread desire for open metadata solutions: common ground for collaboration across users, and interoperability across software solutions. We motivate a new class of open metadata services for big data, via science and enterprise use cases. We also set out challenges for a new class of "meta-on-use" approaches fit for agile analytics. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Patrick Wendell (Databricks)
Average rating: ***..
(3.86, 22 ratings)
In the last year Spark has seen substantial growth in adoption as well as the pace and scope of development. This talk will look forward and discuss both technical initiatives and the evolution of the Spark community. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
IoT & Real-time
Location: 3D 02/11 Level: Intermediate
Gwen Shapira (Confluent), Jeff Holoman (Cloudera)
Average rating: ****.
(4.33, 21 ratings)
Kafka provides the low latency, high throughput, high availability, and scale that financial services firms require. But can it also provide complete reliability? In this session, we will go over everything that happens to a message - from producer to consumer, and pinpoint all the places where data can be lost - if you are not careful. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Design, User Experience, & Visualization
Location: 3D 03/10 Level: Non-technical
Richard Brath (Uncharted Software), Rob Harper (Uncharted)
Average rating: ****.
(4.31, 13 ratings)
Direct visual exploratory analysis of big data yields insights that are otherwise overlooked. By plotting all the data, patterns that can be obscured by traditional visualization methods are preserved. This presentation highlights the power of visualizing whole data sets through examining a market order book and identifying pricing strategies. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Law, Ethics, & Open Data
Location: 3D 04/09 Level: Non-technical
Evan Selinger (Rochester Institute of Technology), Jules Polonetsky (Future of Privacy Forum)
Average rating: ***..
(3.75, 4 ratings)
Ethical concerns about the use of personal information in new ways has led to calls for the creation of consumer subject review boards, which could evaluate, approve, or monitor out-of-context uses of information absent user consent. This conversation between a philosopher and lawyer will address how organizations can use existing ethical frameworks to create practical accountability mechanisms. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Production Ready Hadoop
Location: 3D 05/08 Level: Intermediate
Jairam Ranganathan (Cloudera)
Average rating: ***..
(3.77, 13 ratings)
Apache Hadoop was designed when cloud models were in their infancy. Despite this fact, Hadoop has proven remarkably adept at migrating its architecture to work well in the context of the cloud, as production workloads migrate to a cloud environment. This talk will cover several topics on adapting Hadoop to the cloud. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Sponsored
Location: 1 E6 / 1 E7
Robert Novak (Cisco)
Average rating: ***..
(3.67, 3 ratings)
Big data has moved beyond the bleeding-edge, early-adopter stage. If you're not using it now, you will be soon. But big data deployments are not a cookie-cutter, one-size-fits-all effort. Cisco Big Data Consulting Systems Engineer Robert Novak will present real-world deployment stories and use cases for big data on Cisco UCS, especially (but not exclusively) around Hadoop environments. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Sponsored
Location: 1 E14
Matt Winkler (Microsoft)
Average rating: ***..
(3.57, 7 ratings)
At Microsoft, we process exabytes of data to run our own businesses. Learn how you can process big data in the cloud at massive scale with no hardware to deploy, software to tune/configure, and infrastructure to manage. We’ll also talk about overcoming common obstacles in big data adoption such as a high learning curve, cost of implementation, tuning infrastructure, and providing security. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Sponsored
Location: 1 E15
Vin Sharma (Intel)
Average rating: ***..
(3.50, 2 ratings)
To accelerate enterprise deployment of big data analytics, Intel and partners introduced an open source trusted analytic platform-as-a-service for data scientists and app developers to build and deploy advanced analytics applications at cloud scale. Join us and discover how you can customize and develop your own big data solutions with this platform. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Non-technical
Melissa Santos (Big Cartel)
Average rating: ***..
(3.14, 14 ratings)
Over the last year, my team has gone from being a Hadoop Infrastructure team that was constantly fixing problems and cleaning up messes, to declaring ourselves to be a Data Platform team, expanding into investigating new tools, teaching coworkers about big data, and consulting with other teams about how to meet their data needs. Read more.
11:20am–12:00pm Wednesday, 09/30/2015
Sponsored
Location: 3D 06/07
Ali Tore (ClearStory Data)
Average rating: ***..
(3.20, 5 ratings)
In this session, you will learn why organizations are embarking on a mission to understand the “now” of their businesses, what they are doing with their internal and external data to drive continuous insights, and how their businesses benefit from these insights. Read more.

12:00pm

12:00pm–1:15pm Wednesday, 09/30/2015
Events
Location: 3A & 3B
Average rating: **...
(2.60, 5 ratings)
Birds of a Feather (BoF) discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

1:15pm

1:15pm–1:35pm Wednesday, 09/30/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Non-technical
Russell Jurney (Data Syndrome)
Average rating: ***..
(3.50, 2 ratings)
The talk covers the development of the O'Reilly Media Report, "Mapping big data: A data driven market report." Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Intermediate
Tags: media
Claudia Perlich (Dstillery)
Average rating: ****.
(4.15, 20 ratings)
This talk takes a provocative stand: many metrics we cherish lose their value because the granularity of modern data collection enables us to identify and optimize toward hidden signals that used to be noise, and now come to the forefront. One such metric is the click-through rate in advertising, but the mechanism is ubiquitous and we should pay close attention to the mechanism at work. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Intermediate
Lenni Kuff (Facebook), Nong Li (Cloudera), Stephen Romanoff (Capital One )
Average rating: ****.
(4.05, 21 ratings)
Hadoop is supremely flexible, but with that flexibility comes integration challenges. In this talk, we introduce a new service that eliminates the need for components to support individual file formats, handle security, perform auditing, and implement sophisticated IO scheduling and other common processing that is at the bottom of any computation. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Advanced
Roy Ben-Alta (Amazon Web Services)
Average rating: ***..
(3.77, 13 ratings)
Amazon Kinesis is a fully managed service for real-time streaming big data ingestion and processing. This talk explores Kinesis concepts in detail, including best practices for scaling your core streaming data ingestion pipeline. We then discuss building and deploying Kinesis processing applications using capabilities like Kinesis Client Libraries, AWS Lambda, and Amazon EMR (via Spark). Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Hossein Falaki (Databricks Inc.)
Average rating: ***..
(3.65, 26 ratings)
R is the favorite language of many data scientists. In addition to a language and runtime, R is a rich ecosystem of libraries for a wide range of use cases from statistical inference to data visualization. However, handling large or distributed data with R is challenging. Hence R is used along with other frameworks and languages by most data scientist. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
IoT & Real-time
Location: 3D 02/11 Level: Intermediate
Charles Givre (Deutsche Bank)
Average rating: ****.
(4.40, 5 ratings)
Many people are acquiring smart devices, and yet do not have an understanding of the data these devices gather about them and what can be done with this data if it is aggregated over time. The talk will demonstrate what data several popular devices—including the Nest Thermostat and a few others—gather and show what can be learned about an individual from this data. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Design, User Experience, & Visualization
Location: 3D 03/10 Level: Non-technical
Alex Kelly (General Motors), Kim Le (General Motors)
Average rating: ***..
(3.00, 3 ratings)
This session will demonstrate how data enables people to overcome their disabilities and live to their fullest. We will also point out critical underlying flaws of data interpretation (due to human bias), and offer action items for us to make the data world more inclusive, efficient, and connected. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Law, Ethics, & Open Data
Location: 3D 04/09 Level: Intermediate
Jake Porway (DataKind), Cathy O'Neil (Weapons of Math Destruction), Vladimir Dubovskiy (DonorsChoose.org), Kamalesh Rao (DataKind)
Average rating: *****
(5.00, 2 ratings)
No matter how good the intentions, ethical questions are inherent in the work of using data for social good. How are organizations navigating ethical pitfalls in order to make an impact? The key is protecting the humanity behind the numbers. In this series of talks, we'll learn how organizations are dealing with ethical considerations inherent in projects that aim to use data for good. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Production Ready Hadoop
Location: 3D 05/08 Level: Intermediate
Jonathan Hsieh (Cloudera, Inc), Dima Spivak (StreamSets)
Average rating: ***..
(3.79, 14 ratings)
With the number of production Apache HBase clusters increasing, there is greater demand for running multiple applications on single clusters, for data reliability and availability, and for developers to better test their applications. We’ll lay out how these new demands can be addressed using multi-tenant, multi-cluster, or multi-container deployments, including the use of Docker. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Sponsored
Location: 1 E6 / 1 E7
Tags: iot
Sarah Aerni (Pivotal)
Average rating: ****.
(4.11, 9 ratings)
The promise of IoT is that it will forever change the way people and businesses interact with the world. Using illustrative use cases, Pivotal will demonstrate the fundamental concepts required to drive true impact from these connected devices. We will cover which models are most appropriate, what considerations around data access and processing are critical, and which tools available. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Sponsored
Location: 1 E14
Moderated by:
Andrew Brust (Datameer)
Panelists:
Jeff Jarrell (American Airlines), Ryan Wright (Kelley Blue Book), Kendell Timmers
Average rating: **...
(2.91, 11 ratings)
Beyond the euphoria of what big data can do, and the stress that comes from feeling that you’re not doing enough, how can you really get started? What are some concrete things you can do and some reasonable results you can expect? This panel, featuring real customers who are technology implementation leaders, will help you answer these questions. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Sponsored
Location: 1 E15
Anthony Dina (Dell)
Average rating: ***..
(3.50, 2 ratings)
The only guarantee in life is change. That’s exactly what makes the world interesting and innovative, and that’s exactly what the large internet properties are counting on: to disrupt traditional businesses with an always-on, data-centric business model. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Non-technical
Tags: health
Aaron Kimball (Zymergen, Inc.)
Average rating: ***..
(3.82, 11 ratings)
Zymergen has industrialized the process of genome engineering to build microbes that produce chemicals at scale. High-throughput microbe development is driven by integrating machine learning and open source software for complex data storage, search, and bioinformatics. See how we built this futuristic vision for synthetic biology, and learn how NoSQL can power massive scale experimentation. Read more.
1:15pm–1:55pm Wednesday, 09/30/2015
Sponsored
Location: 3D 06/07
Moderated by:
Robert Eve (Cisco)
Panelists:
Robert Novak (Cisco), Nenshad Bardoliwalla (Paxata)
Average rating: ***..
(3.33, 6 ratings)
As big data becomes a pervasive force in the enterprise, many of our fundamental ideas around how to optimize compute, storage, network, and resource management are being stretched. Read more.

1:35pm

1:35pm–1:55pm Wednesday, 09/30/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Non-technical
Lauralea Banks Edwards (Washington State University)
Average rating: ****.
(4.00, 2 ratings)
This presentation identifies some of the areas in data creation and analytics where we perpetuate the simplistic representation of the world. It uses queer theory to demonstrate alternative ways of creating and analyzing data to take non-normative cases into consideration. Read more.

2:05pm

2:05pm–2:25pm Wednesday, 09/30/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Intermediate
Joy Thomas (Apigee), Jagdish Chand (Apigee)
Average rating: ***..
(3.30, 10 ratings)
Customer journey analytics systems of large corporations must handle a great volume of events on a daily basis. Apriori aggregation used by early systems often caused signal loss due to ever-changing customer activity rates. We will present a new method that identifies paths inherent in raw cross-channel data, and that captures traffic patterns via nodes of interest across all channels of data. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Intermediate
Joe Caserta (Caserta Concepts), Elliott Cordo (Caserta Concepts, LLC)
Average rating: ****.
(4.00, 7 ratings)
A global record company and a force in the music business partnered with award-winning data innovation consulting firm Caserta Concepts to re-architect its core data platform, with a data framework based on AWS, EMR, Redshift, and other big data technologies. This session presents the architecture, technologies, and techniques used to achieve an agile data ingestion and analytics platform. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Intermediate
Todd Lipcon (Cloudera)
Average rating: ***..
(3.44, 18 ratings)
This session will investigate the trade-offs between real-time transactional access and fast analytic performance in Hadoop, from the perspective of storage engine internals. We will discuss recent advances, evaluate benchmark results from current generation Hadoop technologies, and propose potential ways ahead for the Hadoop ecosystem to conquer its newest set of challenges. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Intermediate
Haoyuan Li (Alluxio)
Average rating: ***..
(3.94, 17 ratings)
Tachyon is a memory-centric fault-tolerant distributed storage system, which enables reliable file sharing at memory-speed. It is open source and is deployed at multiple companies. In addition, Tachyon has more than 80 contributors from over 30 institutions. In this talk, we present Tachyon's architecture, performance evaluation, and several use cases we have seen in the real world. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Tags: health
Timothy Danford (Tamr, Inc.)
Average rating: ***..
(3.52, 23 ratings)
A revolution in DNA sequencing technology has led to exponential growth in the genomics data available to discover new drugs, diagnose patients, and understand the fundamental biology of human disease. Existing bioinformatics tools will have difficulty scaling to meet the challenges posed by this growth. Learn about next-generation tools for bioinformatics and genomics using Spark and Parquet. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
IoT & Real-time
Location: 3D 02/11
Karthik Ramasamy (Streamlio)
Average rating: ***..
(3.93, 14 ratings)
This talk will present the design and implementation of a new system, called Heron, that is now the de facto stream data processing engine inside Twitter. Share our experiences in running Heron in production. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Alan Hannaway (7digital)
Average rating: ***..
(3.50, 6 ratings)
7digital power a variety of music services with a diverse range of territories, devices and access models. They have been helping services transform the listening experience through visualising their data. Paul will demonstrate visualisations on listening bounce rate and content classification, giving examples of how these creative solutions to conveying information have helped engage people... Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Law, Ethics, & Open Data
Location: 3D 04/09 Level: Intermediate
Jake Porway (DataKind), Bob Filbin (Crisis Text Line), danah boyd (Microsoft Research | Data & Society)
Average rating: *****
(5.00, 4 ratings)
No matter how good the intentions, ethical questions are inherent in the work of using data for social good. How are organizations navigating ethical pitfalls in order to make an impact? The key is protecting the humanity behind the numbers. In this series of talks, we'll hear from four speakers on how they are dealing with ethical considerations inherent in projects that aim to use data for good. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Production Ready Hadoop
Location: 3D 05/08 Level: Intermediate
Siwei Zhu (Scribd), Kevin Perko (Scribd)
Average rating: ***..
(3.17, 12 ratings)
With the explosion of big data open source technologies, companies can now build a powerful data warehouse. But as they reach scale, they’ll find that patching together numerous projects requires building their own tools to manage the data pipeline. In this presentation we will talk about the tools you’ll likely need to build in-house to make your data infrastructure manageable. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Sponsored
Location: 1 E6 / 1 E7
Eric Frenkiel (MemSQL), Noah Zucker (Novus Partners), Ian Hansen (Digital Ocean), Michael DePrizio (Akamai Technologies)
Average rating: **...
(2.25, 4 ratings)
In-memory is no longer just a trend: it’s an imperative, for high volume, real-time data workloads. With the relational, distributed MemSQL database, modern enterprises are unlocking value from gigabytes and terabytes of data. Learn about some of latest applications and deployments of in-memory technology from Akamai Technologies, Novus, and Digital Ocean. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Sponsored
Location: 1 E14
Vishal Bamba (Transamerica), Murthy Mathiprakasam (Informatica)
Average rating: ****.
(4.25, 4 ratings)
In this session, learn how leading customers have built a unified big data fabric on top of Hadoop, using technologies like Informatica to repeatably deliver trusted data assets to a large community of data consumers, for a multi-dimensional view of customers. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Sponsored
Location: 1 E15
Peter Schlampp (Platfora), Chris Kudelka (Riot Games)
Average rating: ***..
(3.67, 6 ratings)
League of Legends has more than 67 million players per month. The company needed an analytics solution that would work well with their push-model data pipeline. In this session, data engineer Chris Kudelka will discuss how their game designers use Riot's data pipeline and Platfora to measure and validate player-focused changes like improvements to game servers and client performance. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Intermediate
Jaipaul Agonus (FINRA)
Average rating: ***..
(3.71, 14 ratings)
This presentation is a real-world case study about moving a large portfolio of batch analytical programs that process 30 billion or more transactions every day, from a proprietary MPP database appliance architecture to the Hadoop ecosystem in the cloud, leveraging Hive, Amazon EMR, and S3. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
Sponsored
Location: 3D 06/07
Robby Dick (BMC Software)
Average rating: ***..
(3.00, 5 ratings)
This session describes how organizations are managing Hadoop and big data workflows with an enterprise workflow solution that provides a graphical user interface for managing all the complex components of the enterprise application fabric. They gain SLA management, forecasting and change impact analysis, auditing, reporting, and self-service via mobile devices. Read more.
2:05pm–2:45pm Wednesday, 09/30/2015
DJ Patil (White House Office of Science and Technology Policy)
Average rating: **...
(2.81, 16 ratings)
DJ Patil, U.S. Chief Data Scientist at White House Office of Science and Technology Policy Read more.

2:25pm

2:25pm–2:45pm Wednesday, 09/30/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Intermediate
Albert Bifet (Télécom ParisTech), Silviu Maniu (Huawei)
Average rating: ***..
(3.62, 16 ratings)
Real-time analytics are becoming increasingly important due to the large amount of data that is being created continuously. Drawing from our experiences in Huawei Noah's Ark Lab, we present StreamDM, a new open source data mining and machine learning library designed on top of Spark Streaming. We will show its advanced methods, and how easily it can be used and extended. Read more.

2:55pm

2:55pm–3:35pm Wednesday, 09/30/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Non-technical
Tags: media
Juan Huerta (Dow Jones)
Average rating: ****.
(4.25, 20 ratings)
In this presentation I will describe the way in which Data Science is helping the Wall Street Journal produce better journalism strategies, personalize our subscribers’ experience, and optimize revenue and overall customer engagement. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Data-driven Business
Location: 1 E10 / 1 E11
Michael Dauber (Amplify Partners), Shivon Zilis (Bloomberg Beta), Matthew Ocko (Data Collective), Roger Chen (Computable Labs), Jerry Chen (Greylock)
Average rating: ***..
(3.78, 9 ratings)
To anticipate who will succeed and invest wisely, investors spend a lot of time trying to understand the longer-term trends within an industry. In this panel discussion, we’ll consider the big trends in big data, asking top-tier VCs to look over the horizon and discuss the visions they have two or more years in the future. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Advanced
Zhe Zhang (LinkedIn), Weihua Jiang (Intel)
Average rating: ****.
(4.29, 7 ratings)
In this session, attendees will learn how erasure coding (HDFS-7285) can greatly reduce the storage overhead of HDFS without sacrificing data reliability. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Advanced
Eric Schmidt (Google)
Average rating: ***..
(3.62, 16 ratings)
Big data processing is challenged by four conflicting desires: latency, accuracy, simplicity, and cost. Google Cloud Dataflow intelligently merges the desired unified and open sourced programming model, backed by a fully managed cloud service. Dataflow enables developers to answer questions with the right level of latency and accuracy, with low operational overhead regardless of size/complexity. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Non-technical
Tags: iot
Håkan Jonsson (Sony Mobile Communications)
Average rating: ***..
(3.33, 6 ratings)
In this talk we will show how Sony Mobile uses large scale analytics on Spark to generate insights to Lifelog users about themselves and the population, and how we use analytics to build a user lifecycle model that allows us to take actions toward increased user engagement and retention. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
IoT & Real-time
Location: 3D 02/11 Level: Intermediate
Jim Scott (MapR Technologies)
Average rating: ***..
(3.50, 6 ratings)
With the move to real-time data analytics and machine learning, streaming applications are becoming more relied upon than ever before. Discover how to build and deploy a globally scalable streaming system. This includes producing messages in one data center and consuming them in another data center, as well as how to make the guarantees that nothing is ever lost. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Design, User Experience, & Visualization
Location: 3D 03/10 Level: Non-technical
Tags: geospatial
Andrew Hill (Textile)
Average rating: ****.
(4.80, 5 ratings)
You no longer need to be a remote sensing specialist to leverage real-time geospatial data from space. You don't need to be an expert to harvest social media on the cheap. Geospatial data analysis is a mixing pot that brings together your private data and streams of data from all over. We will talk about how we are bringing this mixing pot together for the future of understanding data. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Law, Ethics, & Open Data
Location: 3D 04/09 Level: Non-technical
Mike Lee Williams (Cloudera Fast Forward Labs)
Average rating: ****.
(4.85, 13 ratings)
Because of the way sentiment analysis algorithms are trained, they systematically amplify the voices of those who express themselves unsubtly and aggressively. I will extrapolate from this observation to show the ways in which supervised machine learning has the potential to amplify social and economic privilege. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Production Ready Hadoop
Location: 3D 05/08 Level: Intermediate
Michael Segel (Segel & Associates.)
Average rating: ***..
(3.62, 8 ratings)
Today's Hadoop Cluster now has multiple single points of failures. This talk focuses on identifying these failings and how to mitigate them. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Sponsored
Location: 1 E6 / 1 E7
Alex Loffler (TELUS)
Average rating: ****.
(4.00, 4 ratings)
Security teams study many months and years of data for baselining and incident forensics, but IT operations may only want to store weeks or months of data to analyze for operational insights. And the two different needs can be difficult to reconcile. Learn how TELUS's security analysts provide value to both teams. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Sponsored
Location: 1 E14
Bill Porto (RedPoint Global)
Average rating: **...
(2.80, 10 ratings)
This session covers why continual, adaptive optimization is a key to success with real world machine learning models. Bill will detail the applicability of machine learning tools with the pros/cons of each. Learn how to optimize processes to drive more predictable outcomes from business decisions. Tools for automating access to changing data and removal of noise and error will also be reviewed. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Sponsored
Location: 1 E15
Jonathan Gray (Cask)
Average rating: ****.
(4.25, 4 ratings)
Data lakes represent a new data architecture that provides enterprises with the scale and flexibility required for big data: unbounded storage for unbounded questions. While Hadoop is the de facto standard for implementing data lakes today, significant time and effort are still required. This talk introduces Cask Hydrator, a new open source data lake framework and drag-and-drop UI built on CDAP. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Intermediate
Arvind Prabhakar (StreamSets)
Average rating: ***..
(3.67, 9 ratings)
Modern data infrastructures operate on vast volumes of continuously produced data generated by independent channels. Enterprises such as consumer banks that have many such channels are starting to implement a single view of customers that can power all customer touchpoints. In this session we present an architectural approach for implementing such a solution using a customer event hub. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
Sponsored
Location: 3D 06/07
Sheetal Dolas (Hortonworks)
Businesses are moving from large-scale batch data analysis to large-scale real-time data analysis. Apache Storm has emerged as one of the most popular platforms for this purpose. This talk covers proven design patterns for real-time stream processing. They have been vetted in large-scale production deployments that process tens of billions of events/day and tens of terabytes of data/day. Read more.
2:55pm–3:35pm Wednesday, 09/30/2015
DJ Patil (White House Office of Science and Technology Policy)
Average rating: **...
(2.71, 7 ratings)
DJ Patil, U.S. Chief Data Scientist at White House Office of Science and Technology Policy Read more.

4:35pm

4:35pm–5:15pm Wednesday, 09/30/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Intermediate
Marcel Kornacker (Cloudera), Josh Wills (Cloudera), Alexander Behm (Cloudera)
Average rating: ***..
(3.25, 12 ratings)
In this talk, we will explain how data scientists use nested data structures to increase analytic productivity. We will use two well-known relational schemas - TPC-H and Twitter - to demonstrate how to simplify data science workloads with nested schemas. Also, we will outline best practices for converting flat relational schemas into nested ones, and give examples of data science-style analysis. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Intermediate
Tags: media
Adam Kelleher (Buzzfeed)
Average rating: ***..
(3.20, 10 ratings)
At BuzzFeed, a technology and media company, the question of “virality of content via sharing” dominates. Now, for the first time since the company was founded in 2006, data scientists can identify ways pieces of content spread across multiple social networks. In this paper, we present a close look into the way BuzzFeed defines and analyzes the virality of content. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Advanced
Alan Gates (Hortonworks)
Average rating: ***..
(3.64, 11 ratings)
Hadoop gives the ability to keep all data together for shared use and analysis. People use Apache HBase for fast updates and low latency data access and Apache Hive for analytics. To improve sharing of this data, users need to be able to access their transactional and analytic data through one tool. This talk will cover work in the Hive, HBase, and Phoenix communities to deliver on this promise. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Intermediate
Martin Kleppmann (University of Cambridge)
Average rating: ****.
(4.14, 14 ratings)
Even the best data scientist can't do anything if they cannot easily get access to the necessary data. Simply making the data available is Step 1 toward becoming a data-driven organization. In this talk, we'll explore how Apache Kafka can replace slow, fragile ETL processes with real-time data pipelines, and discuss best practices for data formats and integration with existing systems. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Holden Karau (Google)
Average rating: ****.
(4.17, 18 ratings)
This session explores best practices of creating both unit and integration tests for Spark programs as well as acceptance tests for the data produced by our Spark jobs. We will explore the difficulties with testing streaming programs, options for setting up integration testing with Spark, and also examine best practices for acceptance tests. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
IoT & Real-time
Location: 3D 02/11 Level: Intermediate
Hari Shreedharan (Cloudera), Anand Iyer (Cloudera)
Average rating: ***..
(3.17, 6 ratings)
Over the past year, Spark Streaming has emerged as the leading platform to implement IoT and similar real-time use cases. This session includes a brief introduction to Spark Streaming’s micro-batch architecture for real-time stream processing, as well as a live demo of an example use case that includes processing and alerting on-time series data (such as sensor data). Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Moderated by:
Sean Power (Watching Websites)
Panelists:
Joy Johnson (AudioCommon), Mike Rosenthal (Mick Management), Rishi Malhotra (Saavn)
Average rating: ****.
(4.00, 2 ratings)
This panel brings together founders and technologists who live on the cutting edge of music science. We’ll look at the “Turing problems” of digital entertainment, as well as how providers strike a balance between human curation and machine optimization. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Law, Ethics, & Open Data
Location: 3D 04/09 Level: Non-technical
Jay Margalus (MapR), Mike Emerick (MapR)
Average rating: ****.
(4.67, 6 ratings)
Who will watch the watchmen? This session will cover data integrity problems in open government introduced by the human element. We’ll then explore possible methodologies that will allow us to derive value from open government data, while still keeping a skeptical eye on the validity of the data itself. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Production Ready Hadoop
Location: 3D 05/08 Level: Intermediate
Prat Moghe (Cazena)
Average rating: ****.
(4.50, 2 ratings)
Hadoop’s ability to handle large amounts of varied data has been a driving force behind the explosion of big data. Many organizations’ ambitions to become more data-driven, however, are held back by a shortage of resources as well as the time and expense needed to purchase and set up hardware and software infrastructure. The cloud offers a natural alternative to overcome these barriers. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Sponsored
Location: 1 E6 / 1 E7
Matthew Derda (Pepsi), Douglas Stradley (Trifacta)
Average rating: ***..
(3.92, 13 ratings)
Pepsi analyst Matthew Derda and Trifacta Director Customer Success Doug Stradley discuss why data wrangling is critical to empowering analysts to efficiently access, and incorporate, diverse big data sources for organizational analysis. Get first-hand examples where traditional ETL and scripting approaches fall short, and why “self-service” approaches are critical to big data initiatives. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Sponsored
Location: 1 E14
Jon Haddad (The Last Pickle)
Average rating: ***..
(3.00, 3 ratings)
Everyone knows that Python isn’t suitable for massive scale analytics, right? Wrong. Spark 1.3 introduced data frames, which allow for high performance Spark batch jobs, streaming, and machine learning over massive datasets. In this talk you’ll learn how to combine Cassandra, a highly scalable, always-on OLTP data store, with PySpark, a framework for distributed computation. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Sponsored
Location: 1 E15
Alex Gorelik (Waterline Data), Jim Kaskade (Janrain), David Tabacco (Merck & Co., Inc.), David Paige (Cox Automotive)
Average rating: ****.
(4.20, 5 ratings)
This talk is about the best practices approach to accelerate data discovery while complying with security and data governance needs. Learn how to implement an automated and governed inventory of your data assets. Open up your data lake with secure self-service to find and understand data quickly. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Intermediate
Amar Arsikere (infoworks.io)
Average rating: **...
(2.56, 16 ratings)
Enterprise data warehouses have become a large cost center. As their data volumes grow, enterprises want to move their warehouses on to Hadoop. But it is not an easy task. How do you solve this problem? The speakers have designed and deployed large scale data warehouses on Hadoop. In this talk, they will examine the technical underpinnings of their solution with a real-world example. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Sponsored
Location: 3D 06/07
Bruce Reading (VoltDB)
Average rating: **...
(2.33, 3 ratings)
You have 10 milliseconds. Less than the blink of an eye, the beat of a heart – that’s how much time you have to ingest fast streams of data, perform analytics on the streams, and take action. Ten milliseconds to win a customer, 10 milliseconds to make a sale, 10 milliseconds to save a life – it’s not much time. Read more.
4:35pm–5:15pm Wednesday, 09/30/2015
Data-driven Business
Location: Hall B
David Boyle (MasterClass)
Average rating: ***..
(3.00, 4 ratings)
Drawing lessons from successes and failures in the music industry, book publishing and TV, David Boyle will share five lessons that are essential if you’re to use data to make a difference in creative businesses. Read more.

5:25pm

5:25pm–6:05pm Wednesday, 09/30/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Intermediate
Raphael Lee (Airbnb), Victor Vazquez (Airbnb)
Average rating: ****.
(4.00, 8 ratings)
More users than ever are accessing web applications from multiple devices. When logged-out users receive mixed experiment treatments, weird and wacky results can start appearing in your experiment analyses. Find out what we've learned about this problem at Airbnb and how our data scientists and engineers teamed up to solve it. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Intermediate
Moderated by:
Michael Abbott (Stanford University)
Panelists:
Jooseong Kim (Pinterest), Sven Junkergård (Zephyr Health), Calvin French-Owen (Segment), Peter Reinhardt (Segment), Andrew First (Lean Plum), Shiva Shivakumar (Urban Engines)
Average rating: ***..
(3.88, 8 ratings)
Most people are familiar with the basic principles driving today’s hottest big data and enterprise companies. But what’s really going on underneath the hood? In this session, Kleiner Perkins Caufield & Byers General Partner Michael Abbott unboxes a variety of startups in the space to examine the technology, architecture, and innovations they’ve harnessed to deliver superior products and services. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Intermediate
Monte Zweben (Splice Machine Inc.), John Leach (Splice Machine)
Average rating: ****.
(4.00, 8 ratings)
Even after 25 years, the TPC-C benchmark still sets the standard for online transaction processing (OLTP) database benchmarking. It has traditionally been the arena for RDBMSs like Oracle Database, IBM DB2, and Microsoft SQL Server to do battle. Now, for the first time, a Hadoop database has successfully completed TPC-C benchmarks. Can it change the equation for OLTP workload price/performance? Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Intermediate
Yonik Seeley (Cloudera)
Average rating: ****.
(4.27, 11 ratings)
This talk will cover how search and Solr have become a critical part of the Hadoop stack, and have also emerged as one of the highest performing solutions for analytics over big data. We'll also cover new analytics capabilities in Solr that marry full-text search, faceted search, statistics, and grouping, joining into a powerful engine for powering next-generation big data analytics applications. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Sandy Ryza (Clover Health)
Average rating: ***..
(3.80, 5 ratings)
How much can you expect to lose? The financial statistic Value at Risk seeks to answer this question, but is computationally intensive to estimate. At Cloudera, we’ve assisted several organizations in using Spark to compute VaR and other financial statistics. The talk, which walks through a basic VaR calculation, aims to give a feel for what it is like to approach financial modeling with Spark. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
IoT & Real-time
Location: 3D 02/11
Ian Eslick (VitalLabs)
Average rating: ***..
(3.50, 2 ratings)
Capturing and integrating device-based and other health data for research is frustratingly difficult. We explain the open source technology frame​work for capturing and routing device-based health data for use by healthcare providers and for access, via a trusted analytic container, to ​​researchers we developed, working with O’Reilly Media and support from the Robert Wood Johnson Foundation.​ Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Hugh McGrory (datavized)
Average rating: **...
(2.89, 9 ratings)
Data is all science, no art. Think of a film that inspired or moved you. Now imagine the filmmaker decided that instead of making the film, they would present the material to you in the form of a graph or a chart. That’s where we are with data. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Law, Ethics, & Open Data
Location: 3D 04/09 Level: Non-technical
Steven Totman (Cloudera), Sam Heywood (Cloudera), Nick Curcuru (Mastercard)
Average rating: ***..
(3.83, 6 ratings)
Technology offers amazing big data use cases, but according to Gartner it's important to avoid "crossing the creepy line." Governance and security experts from Cloudera and MasterCard discuss the legal and ethical usage of big data. Ethical behavior drives trust - they are inseparably linked. For customers to trust and continue to do business with us requires an ethical data usage framework. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Production Ready Hadoop
Location: 3D 05/08 Level: Intermediate
Ted Dunning (MapR)
Average rating: ***..
(3.70, 10 ratings)
I will deconstruct a real-world database schema into the corresponding NoSQL design. Along the way, we will see how the number of tables drops by nearly 5x and the ease of understanding the design increases by a similar degree. In spite of radical changes, the resulting denormalized and nested data can still be queried with SQL by using Apache Drill. These methods are practical and easy to apply. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Sponsored
Location: 1 E6 / 1 E7
Anant Chintamaneni (BlueData)
Average rating: ****.
(4.29, 7 ratings)
Hadoop multi-tenancy is becoming a must-have – in order to accommodate multiple lines of business, multiple concurrent Hadoop jobs, multiple versions of Hadoop, multiple applications, security isolation, and more. This session will discuss these requirements and share recommendations on how to deploy a secure multi-tenant Hadoop environment with simplicity, agility, and low management overhead. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Sponsored
Location: 1 E14
Oreilly_BSchmarzo Bill (EMC Consulting)
Average rating: ****.
(4.20, 5 ratings)
Bill Schmarzo, EMC CTO of Global Services, and author of “Big Data: Understanding How Data Powers Big Business," will utilize a workshop approach to help you identify where and how to integrate data and analytics into your business strategies. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Sponsored
Location: 1 E15
Samuel Cozannet (Canonical)
Average rating: ****.
(4.00, 2 ratings)
Whether you’re a large enterprise or a startup, successfully competing with modern, nimble, fast-moving companies like Uber or Airbnb can only be done with modern, model-driven development environments and big data solutions. Infrastructure shouldn’t restrict the interactions between relational data and big data. Development shouldn’t slow analytics. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Intermediate
Alan Choi (Cloudera)
Average rating: ***..
(3.00, 16 ratings)
Many workloads are being migrated from data warehouses to Hadoop; but without a good methodology, the migration process can be challenging. In this talk, we’ll discuss such a methodology in detail: from cluster sizing, to query tuning, to production readiness. Read more.
5:25pm–6:05pm Wednesday, 09/30/2015
Sponsored
Location: 3D 06/07
Eric Brewer (Google)
Average rating: ****.
(4.00, 3 ratings)
In this talk, we will describe a Cloud-optimized deployment model for Spark and Hadoop, and explore how these tools and Cloud-native services complement each other to form the most productive and efficient data processing platform. Read more.

6:05pm

6:05pm–7:05pm Wednesday, 09/30/2015
Events
Location: 3E
Average rating: ****.
(4.50, 4 ratings)
Quench your thirst with vendor-hosted libations and snacks while you check out all the exhibitors in the Expo Hall. Read more.

8:00pm

8:00pm–10:30pm Wednesday, 09/30/2015
Events
Location: The High Line/Meatpacking District
Average rating: ****.
(4.33, 3 ratings)
LOCATIONS: Tao Downtown Nightclub: 389 W. 16th St. • Avenue: 116 10th Ave. • Gaslight: 400 W. 14th St. • Catch NYC, 3rd Floor: 21 9th Avenue • The Penthouse, Red Room, & The Garden at The Park NYC: 118 10th Avenue Read more.

Thursday, 10/01/2015

8:45am

8:45am–8:50am Thursday, 10/01/2015
Location: Javits North
Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.68, 25 ratings)
Strata + Hadoop World Program Chairs Roger Magoulas, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

8:50am

8:50am–9:00am Thursday, 10/01/2015
Location: Javits North
Doug Wolfe (CIA)
Average rating: ***..
(3.65, 34 ratings)
In his ten-minute keynote, CIA Chief Information Officer Douglas Wolfe discusses how data science is a true team sport, and how the rapid evolution of this field continually improves the impact of the CIA mission. Read more.

9:00am

SOLD OUT
9:00am–5:00pm Thursday, 10/01/2015
Training
Location: 3D 01/12
Laurent Weichberger (OmPoint Innovations, LLC)
This three-day curriculum features advanced lectures and hands-on technical exercises for Spark usage in data exploration, analysis, and building big data applications. Read more.
SOLD OUT
9:00am–5:00pm Thursday, 10/01/2015
Training
Location: 1B 03
Brandon MacKenzie (IBM), John Rollins (IBM), Jacques Roy (IBM), Chris Fregly (PipelineAI), Mokhtar Kandil (IBM)
Average rating: *****
(5.00, 0 ratings)
In this three-day course, you will: * Learn how to use machine learning, text analysis, and real-time analytics to solve frequently encountered, high-value business problems, * Understand data science methodology and end-to-end work flow of problem solution including data preparation, model building and validation, and model deployment, * Use Apache Spark and other tools for analytics. Read more.
9:00am–5:00pm Thursday, 10/01/2015
Training
Location: 1B 04
Nathan Neff (Cloudera)
Average rating: ****.
(4.00, 0 ratings)
Cloudera University’s three-day course for designing and building big data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). Read more.
9:00am–9:10am Thursday, 10/01/2015
Location: Javits North
Daniel Goroff (Alfred P. Sloan Foundation)
Average rating: ***..
(3.63, 35 ratings)
It is easy to make "false discoveries" when analyzing big data. It is harder to draw causal conclusions that are reliable and reproducible, especially when private or proprietary information is involved. Recent mathematical ideas, like differential privacy, offer new ways of reaching robust conclusions while provably protecting personal information. Read more.

9:10am

9:10am–9:20am Thursday, 10/01/2015
Location: Javits North
Jack Norris (MapR Technologies)
Average rating: ***..
(3.59, 32 ratings)
The big data dividend refers to the ongoing, significant profits that are derived by running data-driven applications. This session will include examples of applications by leading companies, and provide insights into how developers and organizations can realize big data dividends from a new class of scalable applications with continuous analytics. Read more.

9:20am

9:20am–9:25am Thursday, 10/01/2015
Sponsored
Location: Javits North
Ben Werther (Platfora)
Average rating: ***..
(3.73, 45 ratings)
The traditional BI and analytics tools of the last decade have made it difficult for users to work directly with their data. With the latest innovations in big data discovery platforms, a new role has emerged: the citizen data scientist. In this keynote, Ben will share Platfora’s research behind the importance of this emerging role so that companies can become truly data-driven. Read more.

9:25am

9:25am–9:30am Thursday, 10/01/2015
Sponsored
Location: Javits North
Paul Kent (SAS)
Average rating: ***..
(3.70, 40 ratings)
Imagine the possibilities of having all of your data in one place – at a reasonable cost – with the computing potential to learn from relationships between data in all domains. Advanced analytics and Hadoop are changing the way organizations approach big data. Hear tips from the future and learn about key patterns emerging from a wide cross section of Hadoop journeys. Read more.

9:30am

9:30am–9:45am Thursday, 10/01/2015
Location: Javits North
Farrah Bostic (The Difference Engine)
Average rating: ***..
(3.54, 57 ratings)
Farrah Bostic, Founder, The Difference Engine Read more.

9:45am

9:45am–9:50am Thursday, 10/01/2015
Sponsored
Location: Javits North
Average rating: ***..
(3.60, 45 ratings)
IBM fellow and director, Watson Content Services, IBM Read more.

9:50am

9:50am–10:00am Thursday, 10/01/2015
Location: Javits North
Jake Porway (DataKind)
Average rating: ****.
(4.35, 63 ratings)
Jake Porway, founder and executive director of DataKind, unveils five keys for successful data science for good projects, based on the organization's three years of work rallying thousands of volunteers worldwide to give back. Read more.

10:00am

10:00am–10:20am Thursday, 10/01/2015
Location: Javits North
Maciej Ceglowski (Pinboard.in)
Average rating: ****.
(4.60, 94 ratings)
Big data is a bit like nuclear energy: while full of promise, it generates residue that is difficult to dispose of, poses risks for those who store it, and leaves the industry one major incident away from scaring the public off the technology entirely. Read more.

10:20am

10:20am–10:40am Thursday, 10/01/2015
Location: Javits North
Maria Konnikova (The New Yorker | Mastermind)
Average rating: ****.
(4.05, 78 ratings)
What do you do when you find a momentary break in your otherwise endless barrage of tasks? In this talk, Maria argues for the vital importance of recapturing the seeming nothingness of boredom, of harnessing the pauses of life for their creative potential. It is in boredom that the truly deep questions and discoveries lie. Read more.

10:40am

10:40am–10:45am Thursday, 10/01/2015
Location: Javits North
Roger Magoulas (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.56, 36 ratings)
Program Chairs Roger Magoulas, Doug Cutting, and Alistair Croll, close out the Strata + Hadoop World keynotes. Read more.

11:20am

11:20am–12:00pm Thursday, 10/01/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Intermediate
Robert Grossman (University of Chicago)
Average rating: ****.
(4.25, 16 ratings)
Large datasets have large numbers of anomalies, and the challenge is not just identifying anomalies but rank ordering them to create alerts, so that data scientists can examine the most interesting ones. We discuss three case studies that integrate machine learning and data engineering, and extract six techniques for identifying anomalies and ranking ordering them by their potential significance. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Non-technical
Jeremy King (Walmart Global eCommerce)
Average rating: ****.
(4.33, 15 ratings)
Two years ago Walmart eCommerce moved from a small Hadoop cluster to a big one (250 modes) and has since used Hadoop to consolidate 10 different websites, including Sam’s Club online, into one website. Walmart eCommerce stores use all the incoming data in one central Hadoop cluster, which is driving the company’s focus to provide personalized, best-in-class customer experiences. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Intermediate
Henry Robinson (Cloudera), Zuo Wang (Wanda), Arthur Peng (Intel)
Average rating: ***..
(3.71, 7 ratings)
Columnar data formats such as Apache Parquet promise much in terms of performance, but need help from modern CPUs to fully realize all the benefits. In this talk we'll show how the combination of the newest SIMD instruction sets, and an open-source columnar file format, can provide an enormous performance advantage. Our example system will be Impala, Parquet, and Intel's AVX2 instruction set. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Intermediate
Tags: media, featured
Kurt Brown (Netflix)
Average rating: ****.
(4.87, 31 ratings)
The Netflix Data Platform is a constantly evolving, large scale infrastructure running in the (AWS) cloud. We are especially focused on performance and ease of use, with initiatives including Presto integration, Spark, and our big data portal and API. This talk will dive into the various technologies we use, the motivations behind our approach, and the business benefits we get. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Advanced
Tathagata Das (Databricks)
Average rating: ****.
(4.20, 15 ratings)
As the adoption of Spark Streaming in the industry is increasing, so is the community's demand for more features. Since the beginning of this year, we have made significant improvements in performance, usability, and semantic guarantees. In this talk, I discuss these improvements, as well as the features we plan to add in the near future. Read more.
11:20am–12:00pm Thursday, 10/01/2015
IoT & Real-time
Location: 3D 02/11 Level: Non-technical
Michael Hausenblas (Red Hat)
Average rating: *....
(1.00, 1 rating)
By 2020, researchers estimate there will be 100 million internet connected devices. To process this data in real time—whether from mobile phones or jet engines—will be the new normal. How are companies today adapting to this new real time stream of data? Read more.
11:20am–12:00pm Thursday, 10/01/2015
Design, User Experience, & Visualization
Location: 3D 03/10 Level: Non-technical
Jeffrey Heer (Trifacta | University of Washington), Jock Mackinlay (Tableau)
Average rating: ****.
(4.17, 12 ratings)
The talk will focus on considerations for designing data visualizations for data profiling required in data preparation; and considerations for designing data visualizations for later exploratory analysis and consumption phases of the overall analysis process. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Security & Governance
Location: 3D 04/09 Level: Intermediate
Jenelle Bray (LinkedIn)
Average rating: *****
(5.00, 1 rating)
LinkedIn’s Security Data Science group uses various reputation systems as input to models designed to stop fraud and abuse. This session will discuss how we build these reputation systems and compare instantaneous online reputation scores to more complex offline systems. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Ask Me Anything
Location: 3D 05/08 Level: Advanced
Gwen Shapira (Confluent), Jonathan Seidman (Cloudera), Ted Malaska (Capital One), Mark Grover (Lyft)
Average rating: ****.
(4.67, 6 ratings)
Join the authors of Hadoop Application Architectures for an open Q/A session on considerations and recommendations for architecture and design of applications using Hadoop. Talk to us about your use-case and its big data architecture, or just come to listen in. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Sponsored
Location: 1 E6 / 1 E7
Alexander Barclay (UnitedHealthcare Shared Services)
Average rating: ***..
(3.83, 6 ratings)
UnitedHealth Group has long been defined by our innovative approach to health care, and our approach to IT and analytics is no different. With the goal of making health care more affordable by identifying fraud, waste, and abuse activities, this session will provide details on how we leveraged Hadoop for payment integrity analytics to identify thousands of high-risk providers and claims. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Sponsored
Location: 1 E14
Michele Goetz (Forrester Research), Chuck Yarbrough (Pentaho)
Forrester Research Principal Analyst Michele Goetz discusses findings from Delivering Governed Data for Analytics at Scale, a June 2015 commissioned study conducted by Forrester Consulting on behalf of Pentaho on the topic of data governance and delivery. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Sponsored
Location: 1 E15
Paul Kent (SAS)
Average rating: ***..
(3.33, 3 ratings)
Imagine the possibilities of having all of your data in one place – at a reasonable cost – with the computing potential to learn from relationships between data in all domains. Advanced analytics and Hadoop are changing the way organizations approach big data.Hear tips from the future and learn about key patterns emerging from a wide cross section of Hadoop journeys. Perhaps they’ll inspire yours. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Intermediate
Haden Land (Lockheed Martin IS&GS), Jason Loveland (Lockheed Martin)
Average rating: ****.
(4.75, 4 ratings)
Lockheed Martin builds unmanned and manned human space systems, which require systems that are tested for all possible conditions – even for unforeseen situations. We present a test system that is a learning system built on big data technologies, that supports the testing of the Orion Multi-Purpose Crew Vehicle being designed for long-duration, human-rated deep space exploration. Read more.
11:20am–12:00pm Thursday, 10/01/2015
Sponsored
Location: 3D 06/07
Average rating: **...
(2.00, 1 rating)
Financial institutions use data such as streaming news feeds and proprietary data for insight. One company is taking filings from 130 countries and data from 500,000 equity instruments to create real-time applications. Data integration is essential for information to be trusted in these applications. Explore an architecture designed to capture all data and ensure it is trusted. Read more.

12:00pm

12:00pm–1:15pm Thursday, 10/01/2015
Events
Location: 3A & 3B
Average rating: ***..
(3.67, 3 ratings)
Birds of a Feather (BoF) discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

1:15pm

1:15pm–1:35pm Thursday, 10/01/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Intermediate
Average rating: ***..
(3.18, 11 ratings)
Reaching 100,000,000 antivirus users was a big challenge for Avira, but we managed to achieve the goal. The challenge that arises now is to convince our users to stay with us, by offering the best possible experience to each one of them. In this presentation we will share the entire flow of the user churn prevention, from building custom surveys to using machine learning algorithms. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Intermediate
Tags: health
Joe Klobusicky (Geisinger Health System), Ali Habib (Northwestern Feinberg School of Medicine), Ekaterina Volkova (Cornell University)
Average rating: **...
(2.80, 5 ratings)
Pharmaceutical companies follow a highly structured process for the approval of medications. From a financial viewpoint, the binary occasion of a drug’s passage offers a rare scientific opportunity: a well-defined, recurrent, and critical event spanning over multiple companies. We will show that integrating multiple datatypes uncovers how drug passage influences the market, and vice versa. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Advanced
Thomas Phelan (BlueData)
Average rating: ****.
(4.25, 12 ratings)
This session will delve into the multiple different meanings of "virtualized HDFS." It will lead an investigation into the abstraction of the HDFS protocol in order to permit any storage device to deliver data to a Hadoop application in a performance critical environment. It will include a discussion and assessment of the work in this area done by projects such as Tachyon and MemHDFS. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Intermediate
Neha Narkhede (Confluent)
Average rating: ****.
(4.50, 14 ratings)
Often the hardest step in processing streams is being able to collect all your data in a structured way. We present Copycat, a framework for data ingestion that addresses some common impedance mismatches between data sources and stream processing systems. Copycat uses Kafka as an intermediary, making it easy to get streaming, fault-tolerant data ingestion across a variety of data sources. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Tags: media, featured
Daniel Weeks (Netflix)
Average rating: ****.
(4.52, 23 ratings)
The Big Data Platform team at Netflix continues to push big data processing in the cloud with the addition of Spark to our platform. Recent enhancements to Spark allow us to effectively leverage it for processing against a 10+ petabyte warehouse backed by S3. We will share our experiences and performance of production jobs along with the pains and gains of deploying Spark at scale on YARN. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
IoT & Real-time
Location: 3D 02/11 Level: Non-technical
Tags: iot
Yan Zhang (Microsoft)
Average rating: ***..
(3.50, 4 ratings)
This talk introduces the landscape and challenges of predictive maintenance applications in the industry, illustrates how to formulate (data labeling and feature engineering) the problem with three machine learning models (regression, binary classification, multi-class classification), and showcases how the models can be conveniently trained and compared with different algorithms. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Design, User Experience, & Visualization
Location: 3D 03/10 Level: Non-technical
Tags: featured
Michael Freeman (University of Washington)
Average rating: ****.
(4.61, 23 ratings)
Data-driven decision-making can only be properly executed when the decision makers understand both the underlying data, and the types of manipulations that have been applied to it. In this session, we’ll explore what exactly we "do" to data (aggregation, "cleaning," statistical modeling, machine learning), and how to visually communicate about the processes and implications of our work. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Security & Governance
Location: 3D 04/09 Level: Intermediate
Sam Heywood (Cloudera), Nick Curcuru (Mastercard), Ritu Kama (Intel)
Average rating: ****.
(4.29, 7 ratings)
Hadoop is widely used thanks to its ability to handle volume, velocity, and variety of data. However, this flexibility and scale presents challenges for securing and governing this data. To avoid your company making the front pages over a data breach, experts from MasterCard, Intel, and Cloudera share the Hadoop Security Maturity Model phase 0-4 and steps to get your cluster ready for a PCI audit. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Ask Me Anything
Location: 3D 05/08 Level: Intermediate
Patrick Wendell (Databricks), Reynold Xin (Databricks)
Average rating: ***..
(3.00, 1 rating)
Join the Spark team for an informal question and answer session. Spark committers from Databricks will be on hand to field a wide range of detailed questions. Even if you don’t have a specific question, join in to hear what others are asking. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Sponsored
Location: 1 E6 / 1 E7
Average rating: ***..
(3.50, 2 ratings)
If you’re struggling with determining which implementation of SQL on Hadoop can meet your analytics needs, you’re not alone. Join us for a discussion on how YP.com, a leading local marketing solutions provider in the U.S. dedicated to helping local businesses and communities grow, uses HP Vertica for SQL on Hadoop to solve their organization’s big data challenges. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Sponsored
Location: 1 E14
Emma McGrattan (Actian)
Average rating: ****.
(4.33, 6 ratings)
Can Hadoop now handle your enterprise analytic workloads? Actian SVP of Engineering Emma McGrattan will describe the various solutions that comprise the SQL on Hadoop landscape, identify the features that are important for those modernizing their enterprise analytic workloads on Hadoop, and describe the successes that Actian customers have had in moving their BI and Analytic workloads to Hadoop. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Sponsored
Location: 1 E15
Nidhi Aggarwal (Tamr, Inc.)
Average rating: ****.
(4.00, 2 ratings)
Enterprises find it far too costly and time-consuming to locate all of the data relevant to analysis. Data is so fragmented that most enterprises lack even a basic inventory of all sources and attributes -- an enormous constraint on getting return on your big data investment. Tamr Catalog solves this by creating an inventory of all enterprise metadata in a central, platform-neutral place. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Intermediate
Sriranjan Manjunath (Saavn Inc), Rahul Saxena (Saavn)
Average rating: **...
(2.00, 1 rating)
Saavn is the leading music streaming service in the South Asian market. This talk will focus on how we are leveraging data to adapt to very specific demands on the market. We will demonstrate how Hadoop, Kafka, and Storm came together to help us solve some of the challenges. Read more.
1:15pm–1:55pm Thursday, 10/01/2015
Sponsored
Location: 3D 06/07
Jeff Pollock (Oracle), Chris Lynskey (Oracle)
Average rating: ****.
(4.00, 1 rating)
In this session you’ll learn how Oracle has leveraged Spark-based machine learning (ML), natural language processing (NLP), and data graph semantics (Linked Open Data) to create the simplest and most powerful big data discovery and big data preparation tools in the market. Read more.

1:35pm

1:35pm–1:55pm Thursday, 10/01/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Intermediate
Thomas Wiecki (Quantopian)
Average rating: ***..
(3.94, 16 ratings)
Probabilistic programming has already revolutionized machine learning and will have a similar impact on the emerging field of data science. By automating the inference process, it dramatically increases the number of people who can build complex Bayesian models custom-made to the specific problem at hand; and makes experts vastly more effective in devising new machine learning methods. Read more.

2:05pm

2:05pm–2:25pm Thursday, 10/01/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Intermediate
Ihab Ilyas (University of Waterloo | Tamr)
Average rating: ***..
(3.50, 8 ratings)
Machine learning tools offer promise in helping solve data curation problems. While the principles are well-understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Leveraging data semantics and domain-specific knowledge is key in delivering the optimizations necessary for truly scalable ML curation solutions. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Intermediate
Vivek Farias (Celect)
Average rating: ***..
(3.33, 9 ratings)
How can a retailer discover that expensive handbags have a large upside in Lancaster, PA, a fact that doesn't fit demographic stereotypes? The answer lies in understanding customer choice, that what a customer buys is constrained and influenced by what they're offered. Explore a new approach to machine learning, which models customer choice patterns and preferences from sparse transactional data. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Intermediate
Ravi Prakash (Altiscale)
Average rating: ***..
(3.83, 6 ratings)
The HDFS File Browser now has improved accessibility and is easier to use! Hadoop 2.4.0 introduced a new UI for file browsing with WebHDFS. This feature set has been expanded to include write operations and file uploads. Authentication issues have been addressed and the file browser is now configured with HttpFS. We'll present a demonstration and overview of possible configuration requirements. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Non-technical
Carlos Guestrin (Apple | University of Washington )
Average rating: ***..
(3.78, 9 ratings)
As companies increase the number of deployments of machine learning-based applications, the number of models that need to be monitored grow at a tremendous pace. In this talk, we outline some of the key challenges in large-scale deployments of machine learning models, then describe a methodology to manage such models in production to mitigate the technical debt. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Christopher Nguyen (Arimo), Vu Pham (Adatao, Inc), Michael Bui (Adatao, Inc.)
Average rating: ****.
(4.14, 7 ratings)
Deep learning algorithms have been used in many real-world applications, such as computer vision, machine translation, and fraud detection. We'll present an overview of the system architecture, the training and running of Deep Learning models such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) on Spark with Tachyon, including the use of GPUs to improve execution time. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
IoT & Real-time
Location: 3D 02/11 Level: Intermediate
Ankur Gupta (Bitwise Inc.)
Average rating: ***..
(3.44, 9 ratings)
Using an open source technology stack, we implemented a solution for real-time analysis of sensor data from mining equipment. We will share the technical architecture used to show the tools we implemented for real-time complex event processing, why we implemented Spark instead of Storm, some of the challenges faced, benchmarks achieved, and tips for easy integration. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Design, User Experience, & Visualization
Location: 3D 03/10 Level: Non-technical
Margit Zwemer (LiquidLandscape)
Average rating: ***..
(3.40, 5 ratings)
Linked Immersive Visualization Environments (LIVE) is a framework that my startup, LiquidLandscape, has developed for combining multiple, high-volume data visualizations (d3, WebGL, WebVR) to provide comprehensive situational awareness for financial markets. We will discuss architecture and design challenges of visualizing real-time data at speed and scale, with lots of visual examples. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Security & Governance
Location: 3D 04/09 Level: Intermediate
Peter Guerra (Booz Allen Hamilton)
Average rating: ***..
(3.10, 10 ratings)
Combining data in Hadoop for the purpose of data discovery often runs into barriers from the security group because of legal or corporate policy. This talk will discuss the challenges with implementing data governance in big data systems, a design pattern for addressing those challenges within an organization, and a recent case study. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Ask Me Anything
Location: 3D 05/08 Level: Advanced
Miklos Christine (Databricks), Kathleen Ting (Cloudera), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera)
Average rating: ****.
(4.25, 4 ratings)
Join the instructors of the all-day tutorial "Apache Hadoop operations for production systems," as they field a wide range of detailed questions. Even if you don’t have a specific question, join in to hear what others are asking. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Sponsored
Location: 1 E6 / 1 E7
Ron Bodkin (Google)
Average rating: ***..
(3.80, 5 ratings)
While schema on read is powerful, it’s just a first step on the journey to understanding effective ways of working with data in new big data systems. In this talk we highlight new patterns of working with data. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Sponsored
Location: 1 E14
Jagane Sundar (WANdisco)
Average rating: ***..
(3.50, 2 ratings)
This talk explores the actual behavior of eventual consistent systems aka mostly inconsistent systems, while presenting a paxos algorithm alternative. We’ll highlight the Amazon use case and various fixes made to S3 in order to enable Hadoop workflows, and alternatives offered by Cassandra, then explore Paxos as an alternative to such inconsistent systems for Hadoop Storage and HBase solutions. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Sponsored
Location: 1 E15
Nirmal Ranganathan (Rackspace)
Average rating: ***..
(3.40, 5 ratings)
All of us involved in big data are working to decrease time to insights. We're building Spark on Yarn clusters with Hadoop ecosystem components, and there are clear benefits to this implementation. However, there are other use cases that may benefit from a more streamlined stack. Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Non-technical
Raymond Collins (TE Connectivity), Scott Sokoloff (Orderup)
Average rating: ***..
(3.00, 2 ratings)
Scott and Ray will discuss a real-life use case from a large manufacturing company, where data was produced in remote factories faster than it could be sent through the internet. This session is an interactive discussion around how to resolve the issue of "big data, small internet." Read more.
2:05pm–2:45pm Thursday, 10/01/2015
Sponsored
Location: 3D 06/07
Charlie Crocker (Autodesk)
Average rating: ***..
(3.83, 6 ratings)
Building design software for industries from engineering to construction, manufacturing to media, meant Autodesk needed to architect its analytics platform to handle massive amounts of data. Learn how Autodesk uses open-source technologies like Kafka and Hadoop and integrates them with solutions like Splunk, Google BigQuery, and Tableau to achieve data insights at scale. Read more.

2:25pm

2:25pm–2:45pm Thursday, 10/01/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Non-technical
Tags: featured
Allen Downey (Olin College of Engineering)
Average rating: ****.
(4.69, 13 ratings)
Bayesian methods are well-suited for business applications because they provide concrete guidance for decision-making under uncertainty.  But many data science teams lack the background to take advantage of these methods.  In this presentation I will explain the advantages and suggest ways for teams to develop skills and add Bayesian methods to their toolkit. Read more.

2:55pm

2:55pm–3:35pm Thursday, 10/01/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Non-technical
Tags: geospatial
Brett Goldstein (University of Chicago)
Average rating: ****.
(4.80, 5 ratings)
Spatial analytics is often hampered by the arbitrary choice of units, allowing local heterogeneity to obscure true patterns. A new “smart clustering” technique lets us use large quantities of open municipal data to literally redraw city maps to reflect facts on the ground, not administrative boundaries. This talk will explain what smart clusters are and the promise they hold for urban science. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Intermediate
Karen Rubin (Quantopian)
Average rating: ****.
(4.20, 5 ratings)
Karen Rubin has spent the last nine months exploring “What would happen if you invested in women CEOs?" In doing so, she has developed an investment algorithm that invests in the women-led companies of the Fortune 1000. Based on a simulation run from 2002-2014, this algorithm would have outperformed the S&P 500 by more than 200%. In this talk she will share her algorithm and results. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Business & Innovation
Location: 1 E16 / 1 E17
Alistair Croll (Solve For Interesting), Joseph Adler (Facebook), Margaret Dawson (Red Hat), Joseph Sirosh (Microsoft), Evan Prodromou (Fuzzy.io)
Average rating: ****.
(4.00, 5 ratings)
Data has gravity. Jim Gray once said that, “compared to the cost of moving bytes around, everything else is free,” and because of what this means for the economics of computing, the more data you have, the more it wants to be near other data. That means all big data systems, eventually, will live in centralized cloud environments. On the other hand, different data is processed in different ways. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Intermediate
Tags: geospatial
Ryan Smith (DigitalGlobe)
Average rating: ***..
(3.75, 4 ratings)
MrGeo is a geospatial toolkit designed to provide raster-based geospatial capabilities that can be performed at scale by leveraging the Hadoop ecosystem. This session will provide an overview of the MrGeo design for storing and processing large-scale raster datasets in the cloud, highlight core operations, and present performance benchmarks for some example operations on open data sets. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Dean Wampler (Lightbend)
Average rating: ****.
(4.33, 6 ratings)
Apache Spark is often seen as a replacement for MapReduce in Hadoop systems, but Spark clusters can also be deployed and managed by Mesos. This talk explains how to use Mesos for Spark applications. We'll examine the pros and cons of using Mesos vs. Hadoop YARN as a data platform, and discuss practical issues when running Spark on Mesos. We'll even discuss how to combine the two with Myriad. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
IoT & Real-time
Location: 3D 02/11 Level: Intermediate
Fangjin Yang (Imply), Gian Merlino (Imply)
Average rating: ***..
(3.75, 8 ratings)
The maturation and development of open source technologies has made it easier than ever for companies to derive insights from vast quantities of data. In this session, we will cover how to build a real-time analytics stack using Kafka, Samza, and Druid. This combination of technologies can power a robust data pipeline that supports real-time ingestion and flexible, low-latency queries. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Design, User Experience, & Visualization
Location: 3D 03/10 Level: Non-technical
Peter Olson (IDEO), David Boardman (IDEO)
Average rating: ****.
(4.33, 6 ratings)
The experience of data extends beyond capturing, storing, and presenting it. Data can help shape customer journeys through products, change the way organizations communicate, and be either a source of confusion or tool for communication. This talk will focus on how design thinking can be applied to data, and how data design can be applied to a wide array of consumer and organizational experiences. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Security & Governance
Location: 3D 04/09 Level: Intermediate
Andrew Wang (Cloudera)
Average rating: ****.
(4.10, 10 ratings)
Encryption is a requirement for many business sectors dealing with confidential information. To meet these requirements, transparent, end-to-end encryption was added to HDFS. This protects data while it is in-flight and at-rest, and can be used compatibly with existing Hadoop apps. We will cover the design and implementation of transparent encryption in HDFS, as well as performance results. Read more.
SOLD OUT
2:55pm–3:35pm Thursday, 10/01/2015
Ask Me Anything
Location: 3D 05/08 Level: Intermediate
John Akred (Silicon Valley Data Science), Julie Steele (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science)
Average rating: ****.
(4.00, 5 ratings)
Join the team behind the tutorial “Developing a modern enterprise data strategy," as they field a wide range of detailed questions. Even if you don’t have a specific question, join in to hear what others are asking. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Sponsored
Location: 1 E6 / 1 E7
Reiner Kappenberger (HP Security Voltage)
Average rating: ***..
(3.67, 3 ratings)
Building a strategy and methodology that protects sensitive data is vital in securing your big data systems and enterprise assets. Learn how people protect big data in Hadoop, and understand how protecting the information is possible without removing the value of the data, or paying a performance penalty. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Sponsored
Location: 1 E14
Ashish Verma (Deloitte)
Average rating: ****.
(4.50, 8 ratings)
The Internet of Everything (IoT) continues to give rise to new business models in the Retail, Industrial Manufacturing, Healthcare, Insurance, Medical device manufacturers, Telecommunications, and Technology industries. Learn what those efforts are and how to capitalize on these opportunities for your clients. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Sponsored
Location: 1 E15
Matt Yanchyshyn (Amazon Web Services)
Average rating: ****.
(4.33, 9 ratings)
Want to get ramped up on how to use Amazon's big data web services and launch your first big data application on AWS? Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Intermediate
Jonathan Gray (Cask)
Average rating: ****.
(4.40, 5 ratings)
Hadoop has evolved into a rich collection of technologies that enable a broad range of use cases. However, the technology innovation has outpaced the skills of most developers. The open-source Cask Data Application Platform (CDAP) project was initiated to close this developer gap. In this session, we will show how three different organizations utilized CDAP to deliver solutions on Hadoop. Read more.
2:55pm–3:35pm Thursday, 10/01/2015
Sponsored
Location: 3D 06/07
Phil Kim (Capital One Labs)
Average rating: ***..
(3.83, 6 ratings)
Capital One is on a mission to Change Banking for Good. Join Capital One as we take you through the journey of the Data Lab. How did we get started? What have we learned about mingling disciplines such as human centered design, full stack engineering, and data science? And how are we taking an entrepreneurial approach to develop successful solutions that deliver real impact? Read more.

4:35pm

4:35pm–5:15pm Thursday, 10/01/2015
Data Science & Advanced Analytics
Location: 1 E8 / 1 E9 Level: Intermediate
Bar Ifrach (Airbnb)
Average rating: ****.
(4.00, 5 ratings)
This talk describes the development of a machine learning model that infers Airbnb host preferences for accommodation requests based on their past behavior. The model is used to surface likely matches more prominently on Airbnb’s search results. In our A/B testing the model showed about a 3.75% increase in booking conversion, resulting in many more trips on Airbnb. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Non-technical
Micha Gorelick (Fast Forward Labs)
Average rating: **...
(2.71, 7 ratings)
It's 2015. We understand the technology - how to build functional data pipelines, analytics, and reporting. We have algorithms. We understand the culture issues of how to build a data-driven organization. This talk is about how to use these assets to imagine and create previously impossible products. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Data Science & Advanced Analytics
Location: 1 E16 / 1 E17 Level: Non-technical
Vasant Dhar (NYU)
Average rating: ***..
(3.50, 2 ratings)
Financial markets emanate massive amounts of data from which machines can, in principle, learn to invest with minimal initial guidance from humans. I contrast human and machine strengths and weaknesses in making investment decisions. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Data Innovations
Location: 1 E18 / 1 E19 Level: Intermediate
Venky Ganti (Alation)
Average rating: ***..
(3.00, 1 rating)
Recommendation engines are cognitive computing applications. Their algorithms “learn” from experience. What if a recommendation engine could help analysts sort through big data? Building a query recommendation engine is complex. We’ll share some of the technical challenges and learnings from building a cognitive application in daily use today, by analyst teams from eBay to Square. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Tags: media
Sridhar Alla (Comcast), Jan Neumann (Comcast)
Average rating: ***..
(3.67, 12 ratings)
Comcast uses Hadoop as the big data platform in several areas of its business. Their use cases have evolved in recent years and include personalization, clickthru analytics, modeling, and customer support initiatives, all adding up to billions of dollars in revenue. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
IoT & Real-time
Location: 3D 02/11 Level: Intermediate
Susanna Pirttikangas (University of Oulu)
Average rating: *****
(5.00, 2 ratings)
Oulu Smart City has a lively living lab tradition; we continuously collect data and expand our ecosystem of companies, research institutes, city officials, and citizens, and develop data-intensive services on top of the ecosystem. We present real use cases implementing big data platforms and development of higher level distributed reasoning and machine learning to exploit our data lake. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Design, User Experience, & Visualization
Location: 3D 03/10 Level: Non-technical
Pamela Pavliscak (SoundingBox)
Average rating: ***..
(3.40, 5 ratings)
Our understanding of happiness is becoming more nuanced, and much of that new knowledge relies on data from social media, quantified self apps, and large datasets. This session will look at the lessons we can learn from happiness data to design positive experiences with technology. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Security & Governance
Location: 3D 04/09 Level: Intermediate
Steven Totman (Cloudera), Mark Donsky (Okera), Kristi Cunningham (Capital One), Ben Harden (CapTech Consulting)
Average rating: ****.
(4.14, 7 ratings)
Moderator: Steve Totman, Big Data Evangelist at Cloudera Panelist: Kristi Cunningham, VP Enterprise Data Management at Capital One Panelist: Susan Meyer, Business Leader - Fraud Management Solutions at MasterCard Worldwide Panelist: Ben Harden, Managing Director at Captech Panelist: Mark Donsky, Navigator Product Manager at Cloudera Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Ask Me Anything
Location: 3D 05/08 Level: Intermediate
Todd Lipcon (Cloudera), JD Cryans (Cloudera), David Alves (Cloudera), Mike Percy (Cloudera), Dan Burkert (Cloudera), Michael Crutcher (Cloudera)
Average rating: ****.
(4.50, 2 ratings)
Ask the panel questions about Kudu and the tradeoffs between real-time transactional access and fast analytic performance. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Sponsored
Location: 1 E6 / 1 E7
Tim Estes (Digital Reasoning)
Average rating: ****.
(4.67, 3 ratings)
Cognitive computing has made the transition from a theoretical technology into one that is having a transformative impact on business and our daily lives. In this session, Tim Estes, CEO and founder of Digital Reasoning, will explore how key enabling technologies, such as artificial intelligence and natural language processing, have made this possible. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Sponsored
Location: 1 E14
Join us to learn about how SAP HANA Vora can be used as a stand-alone or in concert with SAP HANA platform to extend enterprise-grade analytics to Hadoop clusters and provide enriched, interactive analytics on Hadoop. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Sponsored
Location: 1 E15
Michał Iwanowski (DeepSense.io), Piotr Niedźwiedź (deepsense.io)
With Spark becoming the rising star of cluster computing comes the prospect of putting it to use as a platform for end-to-end data science. At DeepSense.io we have built an intuitive interface to take Spark to the next level of usability. By introducing a layer that provides code-free UX and simplified resource management, Spark is brought even closer to the concepts known in data science. Read more.
4:35pm–5:15pm Thursday, 10/01/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Intermediate
Rosaria Silipo (KNIME.com AG)
Average rating: *....
(1.00, 2 ratings)
In this project, we re-engineered a few barely-usable legacy solutions from the past, and made them viable again by exploiting the speed and performance of Hadoop platform-based execution. Read more.

5:15pm

5:15pm–6:15pm Thursday, 10/01/2015
Events
Location: South Concourse
Average rating: *****
(5.00, 1 rating)
Join attendees, speakers, and exhibitors as we end the conference on a sweet note with some ice cream. Read more.

6:15pm

6:15pm–6:30pm Thursday, 10/01/2015
Location: TBD
Average rating: ***..
(3.67, 3 ratings)
Mobile App Test Read more.