Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Speaker Slides & Video

Presentation slides will be made available after the session has concluded and the speaker has given us the files. Check back if you don't see the file you're looking for—it might be available later! (However, please note some speakers choose not to share their presentations.)

In a landmark partnership, IBM and Twitter are combining advances in analytics, cloud and cognitive computing in a manner that has the potential to transform how institutions understand customers, markets and trends. Adam Kocoloski, CTO of IBM Cloud Data Services and co-founder of Cloudant will explain how when it comes to gaining insights from Big Data, the future is brighter than we know.
Moderated by:
Roman Shaposhnik (Pivotal Inc.)
In the wake of the Open Data Platform initiative announced earlier this week, Roman Shaposhnik, Director of Open Source strategy at Pivotal and a VP of Apache Software Foundation Incubator will talk about how a well-defined, fully validated ODP common core platform is going to address some of the biggest customer pain points around rapid evolution and standardization in the big data area
Alan Gates (Hortonworks)
Slides:   external link
Starting in Hive 0.14, insert values, update, and delete have been added to Hive SQL. In addition, ACID compliant transactions have been added so that users get a consistent view of data while reading and writing. This talk will cover the intended use cases, architecture, and performance of insert, update, and delete in Hive.
Rosie Atkins (Groupon)
Slides:   1-PDF 
30% of restaurants fail in the first year, so why would anyone go into the business? Most restaurateurs will tell you that it’s an act of love. They love hospitality; they love sharing great food; they love creating a place where people come together to share something special. Almost none of them tell you you that they go into business based on data.
Kathleen Ting (Cloudera), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera), Miklos Christine (Databricks)
Slides:   1-PDF    2-PDF    3-PDF 
Hadoop is emerging as the standard for big data processing & analytics. However, as usage of the Hadoop clusters grow, so do the demands of managing and monitoring these systems. In this tutorial, attendees will get an overview of all phases for successfully managing Hadoop clusters, with an emphasis on production systems.
Mark Grover (Cloudera), Jonathan Seidman (Cloudera), Gwen Shapira (Confluent), Ted Malaska (Blizzard)
Slides:   1-PDF    external link
Are you looking for a deeper understanding of how to integrate components in the Apache Hadoop ecosystem to implement data management and processing solutions? Then this tutorial is for you. We'll provide a clickstream analytics example illustrating how to architect solutions with Apache Hadoop along with providing best practices and recommendations for using Hadoop and related tools.
Irina Borisova (Chegg), Asim Mathur (eBay)
Slides:   1-PPTX 
In this talk we are addressing the following aspects of machine translation development at eBay: - leveraging huge amounts of transactional and behavioral data for development and evaluation of our MT systems; - adapting evaluation metrics to reflect the eBay buyer experience and measuring translation quality and impact on the shopping experience of our international users.
Slides:   1-PDF 
In far too many organizations, data scientists and designers work in silos, and quibble about who’s more important. This is a huge missed opportunity. At Intuit, we are reimagining how our data and design teams to work together to fuel innovation and surpass Intuit’s business goals. I will walk through methods we are using to bridge these two wildly different groups and share stories of success.
Kurt Brown (Netflix)
Slides:   external link
The Netflix Data Platform is a constantly evolving, large scale infrastructure running in the (AWS) cloud. We are especially focused on performance and ease of use, with initiatives including Presto integration, Spark, and our Big Data Portal and API. This talk will dive into the various technologies we use, the motivations behind our approach, and the business benefits we get.
Reena Tiwari (Cisco Systems Inc.)
Slides:   1-PPTX 
In many organizations, Marketing may be the most impacted by the advent of big data with new data on prospects and customers. New channels, new data types and sources, and new technologies … how did Cisco bring these all together to see a different view of customers?
Eden Medina (Indiana University, Bloomington)
We are often told that past holds lessons on how to approach the present, but we rarely look to older technologies for inspiration. Rarer still do we look at the historical experiences of less industrialized nations to teach us about the technological problems of today.
Ellen Friedman (Independent)
Slides:   1-PPTX 
Big data stories reveal fundamental concepts about emerging technologies, their potential impact on society and decisions that drive successful projects. Using real world examples, this talk shows key insights that inform critical choices about new technologies, including time series database tools and scalable machine learning algorithms, used to address important business and research problems.
Eric Frenkiel (MemSQL)
Slides:   1-PPTX 
This session will cover approaches to building real-time pipelines with MemSQL, Hadoop, and Spark, including: How Novus built the premier financial portfolio management platform using MemSQL as a real-time data store and query engine Introduction to the MemSQL Spark connector Strategies for integrating Spark and Hadoop with real-time systems for transaction processing and operational analytics
Tom White (Cloudera), Joey Echeverria (Rocana), Ryan Blue (Cloudera)
Slides:   1-PDF    2-PDF 
In the second (afternoon) half of the Architecture Day tutorial, attendees will apply the best practices they learned in the morning session to build a data application for sessionizing user data.
Fangjin Yang (Imply), Vadim Ogievetsky (Imply)
Slides:   1-PDF 
The maturation of big data technologies has enabled numerous organizations to derive insights from vast quantities of data. The next set of challenges we face involve building applications that allow us to visualize, navigate, and interpret this data. Creating intuitive user interfaces is often a cumbersome process requiring complex data transformations, integrations, and queries.
Jonathan Dinu (Zipfian Academy)
Slides:   external link,   2-ZIP 
The best insight you produce is only as good as your ability to explain it. As data scientists and engineers, our task is not only to execute robust analyses, but also to convince decision-makers to act on data. Through an example-driven approach, attendees will examine features of great graphics, techniques of effective visualization, and learn to use D3.js to create their own data narrative.
Jeffrey Heer (Trifacta | University of Washington)
Keynote with Jeffrey Heer, Co-Founder, Trifacta
Eric Frenkiel (MemSQL)
MemSQL CEO Eric Frenkiel will discuss the need for simplicity in enterprise data architecture, the convergence of transactions and analytics, and what is required to operationalize Spark and Hadoop in the enterprise.pipelines by integrating their technology with Hadoop, and Spark.
Joseph Sirosh (Microsoft)
Join Microsoft’s Joseph Sirosh for a surprising conversation about a farmer's dilemma, a professor's ingenuity and how cloud, data and devices came together to fundamentally re-imagine an age old way of doing business.
Eric Colson (Stitch Fix)
Slides:   1-PDF 
Even the most data-driven organizations still incorporate “art” into their decision-making process. Values, culture, social norms, and biases influence decisions as much as the data. This isn’t always a bad thing—data can sometimes fail to tell the whole story. And, by combining data with the intellectual assets that reside in the heads of employees we can create new capabilities.
DJ Patil (White House Office of Science and Technology Policy)
Data Science, where are we going? What impact can we expect?
Laura Fennell (Intuit), Bill Loconzolo (Intuit)
Slides:   1-PPTX 
When your company stores some of the most sensitive customer data that exists, how do you build game changing big data innovations while maintaining customer trust and loyalty? Combine the two groups responsible for that vision--legal and data science--and unite them toward a common goal! We'll discuss how Intuit turned the typical data-legal model on its head to boost data-driven innovation.
Ajit Gaddam (VISA)
Slides:   1-PDF 
Vendors and pundits suggest plug-n-play options for Hadoop security - do this and in <20 mins, your petabytes of data is now secure. What happens when PowerPoint approaches fail in a real-world enterprise deployment? In this session, we will review techniques that worked, controls that completely failed, and create business processes we had to stand up.
Julie Rodriguez (Sapient Global Markets)
Slides:   1-ZIP 
Designing data visualizations presents us with unique and interesting challenges: how to tell a compelling story; how to deliver important information in a forthright, clear format; and how to make visualizations beautiful and engaging. In this talk, Julie will share a few disruptive designs and connect those back to vizipedia, her compiled data visualization library.
Poppy Crum (Dolby Laboratories | Stanford University)
Our experience of the sensory world does not need to be constrained by our physical limitations. When navigating the environment our senses interact to perceive a robust non-veridical experience. Understanding these interactions and being able to define them perceptually and algorithmically allows technological developments that can facilitate sensory enhancement and optimization.
Eddie Garcia (Cloudera)
Open data is quickly gaining momentum and when applied as data for good, it becomes a much more powerful concept that we should all consider as good data stewards. Organizations to cities are starting to share data like traffic conditions or climate sensors and allowing others to use this open data to improve quality of life.
Sheetal Dolas (Hortonworks)
Slides:   1-PPTX 
Businesses are moving from large-scale batch data analysis to large-scale real-time data analysis. Apache Storm has emerged as one of the most popular platforms for the purpose. This talk covers proven design patterns for real time stream processing. Patterns that have been vetted in large-scale production deployments that process 10s of billions of events/day and 10s of terabytes of data/day.
Lutz Finger (LinkedIn)
Slides:   1-PPTX 
Data is changing our world. Predictions using massive data not only have improved many products. At the same time, they have, in some industries, disrupted business models and created new ones. What does an organization need to do to generate a new competitive advantage out of data?
Alonzo Canada (Interana)
Slides:   1-PDF 
Data products are poised to go mainstream, but only if they are designed well. Most data products are designed by developers for developers. This talk discusses methods from Stanford's D.School used by companies like Yahoo!, Samsung, and Audi to design break-out products. These principles can help developers get beyond technology and design products for everyday users.
Prith Banerjee (Schneider Electric)
Slides:   1-PPTX 
Dr. Prith Banerjee, Managing Director of Global Technology Research and Development, Accenture , will present the Accenture Tech Vision 2015 and discuss how organizations are driving value from big data.
Mark Madsen (Third Nature)
Slides:   1-PDF 
Storytelling is not about raising someone’s IQ, it’s about raising their blood pressure. Stories engage emotions rather than intellect, making “storytelling with data” a poor metaphor for data visualization when our goal is to communicate clearly.
Ryan Michaluk (Allstate), Alexander Gray (Skytree, Inc.)
Slides:   1-PPTX 
Allstate’s foundation is data. We extract value from our data by applying machine learning to make data-driven decisions. In this session, we discuss Allstate’s drive for better business results by using machine learning on Hadoop.
Kirk Borne (George Mason University )
Slides:   1-PPT 
I will introduce USA’s next big astronomy project (LSST) and describe how this telescope requires massive data stream analytics – to discover and respond to exotic rapidly changing events in the Universe. I will discuss parallels between big data astronomy and Decision Science-as-a-Service for Business, Cybersecurity Information and Event Management, and Marketing Automation using Hadoop.
Vida Ha (Databricks), Holden Karau (IBM)
Slides:   external link
Writing efficient Spark programs requires a deeper understanding of Spark internals. In this talk, we present practical tips for writing better Spark programs for the beginner or intermediate Spark programmer.
Moderated by:
Arnab Chakraborty (Accenture)
Alexander Prinz (Lufthansa Airlines), Reena Tiwari (Cisco Systems Inc.)
Slides:   1-PPTX 
This panel discussion will focus on how organizations can find value, equity and business opportunities in their data supply chain. The modern enterprise data supply chain allows organizations to move, manage and mobilize an ever-increasing amount of data across the organization for consumption by people and things.
Eric Sammer (Rocana)
Slides:   1-ZIP 
While we frequently talk about how to build interesting products on top of machine and event data, the reality is that collecting, organizing, providing access to, and managing this data is where most people get stuck. In this session, we’ll describe one such system, in detail, handling terabytes an hour of event-oriented data, providing real time streaming, search, and SQL access to data.
John Russell (Cloudera), Alan Choi (Cloudera)
Slides:   1-ZIP    2-PDF    3-PDF 
Impala is the massively parallel analytic database delivering interactive performance on Hadoop. In this half-day tutorial, we'll walk you through hands-on exercises, taking you from zero to up and running with Impala.
Jay Kreps (Confluent)
Slides:   1-PPTX 
What happens if you take everything that is happening in your company--every click, every impression, every database change, every application log--and make it all available as a real-time stream of well structured data? Companies such as LinkedIn have done this experiment and this talk will describe how this changes the way data is thought about and put to use in an organization.
Amr Awadallah (Cloudera, Inc.)
As Hadoop and the surrounding projects & vendors mature, their impact on the data management sector is growing. Amr will talk about his views on how that impact will change over the next five years. How central will Hadoop be to the data center of 2020? What industries will benefit most? Which technologies are at risk of displacement or encroachment?
Ross Fubini (Canaan Partners), Ari Gesher (Palantir Technologies), Wei Zheng (Trifacta), Omer Trajman (ScalingData), Sylvain Le Borgne (Havas Media)
Slides:   1-ZIP 
Big Data is existing it's buzz word phase and we are seeing applications which use big data infrastructure to power every day lives. This is a discussion from the front lines with panelists from industry and startups describing real deployed application powered by big data, but which are happy to be hiding the elephant behind beautiful interfaces.
Julien Le Dem (Dremio)
Slides:   1-PDF 
Parquet is a columnar format designed to be efficient and interoperable across the hadoop ecosystem. Its integration in most processing frameworks and serialization models makes it easy to use in existing ETL and processing pipelines, while giving flexibility of choice on the query engine.
Anil Gadre (MapR)
To get value out of today’s big and fast data, organizations must evolve beyond traditional analytic cycles that are heavy with data transformation and schema management. . .
Michael Greene (Intel)
The exponential growth of digitally stored data and the transition of data science from academia to real world applications hold the promise of improving nearly every aspect of our lives.
Lisa Hammitt (Salesforce)
Wearables contribute to Big Data and the insights are already realizing significant gains in key industries.
Slides:   1-PPT 
Entirely new industries are forming as the result of business model innovations. But discovering these disruptive ideas is, still, largely a matter of trial and error. We need faster, more effective ways of testing out new business model designs.
Spencer Herath (Accenture), Aaron Benz (Accenture)
Slides:   1-PPTX 
HBase can be a good solution for hierarchical time series data. And we can access the data using both R and Python. This case study is a sanitized version of a solution we brought to a client that provided real business value—without requiring significant investment or time. We show how to move to a simple, scalable NoSQL solution without alienating the scientists who work with the data.
Matei Zaharia (Databricks)
As the Apache Spark userbase grows, the developer community is working to adapt it for ever-wider use cases. 2014 saw fast adoption of Spark in the enterprise and major improvements in its performance, scalability and standard libraries.
Tom White (Cloudera), Joey Echeverria (Rocana), Ryan Blue (Cloudera)
Slides:   1-PDF 
If you have Hadoop questions, bring them to Ryan, Joey, and Tom. They’ll explain the Hadoop ecosystem, as well as how to get started with Hadoop using the Kite SDK.
Randy Guck (Dell Software)
Slides:   1-PPTX 
Not all big data problems require big cluster solutions. Doradus OLAP compresses data into compact shards, yielding fast analytical queries using little disk even for big data sets. Learn how Doradus leverages OLAP techniques, columnar storage, and Cassandra to yield sophisticated query features while using amazingly little disk space.
Robert Grossman (University of Chicago)
Slides:   1-PDF 
Finding anomalies is essential for a wide range of applications, including cybersecurity, event detection and health and status monitoring. Anomaly techniques that scale successfully to large datasets tend to integrate machine learning with good data engineering. We discuss three case studies and extract eight techniques that have proved effective for detecting anomalies in large scale systems.
Adam Jorgensen (Pragmatic Works)
Slides:   1-PPT 
Retail buyers are the backbone of the industries’ profitability. These individuals drive organizational goals with their performance. Many decisions are made by intuition and “gut” feeling, where predictive analytics would have made significant improvements in outcomes. This session takes real world experiences and shows how to transform retail performance through data driven buying decisions.
Jike Chong (Simply Hired)
Slides:   1-PPTX 
Learn how tools based on nation-wide job market data can help both students and institutions improve outcomes from the job market level down to curriculum and course choice.
Andreas Mueller (NYU, scikit-learn), Jennifer Klay (Cal Poly San Luis Obispo), Peter Wang (Continuum Analytics), Travis Oliphant (Continuum Analytics, Inc.), Andy Terrel (Bold Metrics), Matthew Rocklin (Continuum), William McKinney (Cloudera), Stefan van der Walt (UC Berkeley), Jonathan Frederic (IPython), Kyle Kelley (Netflix)
Slides:   1-PDF 
Python has become an increasingly important part of the data engineer and analytic tool landscape. Pydata at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including iPython Notebook, NumPy/matplotlib for visualization, SciPy, scikit-learn, and how to scale Python performance, including how to handle large, distributed data sets.
Ted Dunning (MapR Technologies), Ellen Friedman (Independent)
Slides:   1-PDF 
What’s important about a technology is what you can use it to do. We’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and we’d like to relay what worked well for them and what did not. . .
Lance Olson (Microsoft)
Slides:   1-PPTX 
In this session, we will show you how easy it is to spin up a 32 node Storm cluster and give all attendees a free unlimited 30-day pass to deploy your own Hadoop cluster on Microsoft Azure.
Nasser Manesh (Altiscale, Inc.)
Slides:   1-PPTX 
In this from-the-trenches, DevOps-focused talk we explore operational issues in running Hadoop on top of Docker containers in a production, multi-tenant setup. With Hadoop's native Docker support still in the works and Docker being more of a development tool, a production deployment of the two together is like swimming in treacherous waters... Here's a lantern and a lifeboat to the rescue.
Slides:   1-PPTX    2-PDF 
Learn how SAS applications use YARN in order to be a good citizen in a busy Hadoop cluster. Best practices and customer examples for several different user scenarios will be shared and discussed.
Slides:   1-PPT 
This talk discuss how to do realtime analytics with a SQL like query language. We will discuss role of Complex Event Processing in realtime analytics, and then discuss a scalable CEP engine that let users write their queries using declarative SQL like CEP query language, but let them execute those queries using a graph of CEP nodes deployed on top of Apache Storm
Gary Davis (McAfee, a division of Intel Security)
Slides:   1-PPTX 
Consumers are widely adopting wearable technology – Deloitte predicts there will be 100 million wearable cameras, smartwatches, fitness trackers and other gadgets on the market by 2020. With this mass adoption of wearable devices, comes a new data ecosystem that must be protected. Embracing the protection of this new, intricate data ecosystem is imperative to the success of wearable industry.
Cait O'Riordan (Shazam)
Slides:   1-PDF 
As the number of ways to discover and listen to music increases, Shazam's data becomes even more powerful in predicting music tastes/fashions. Labels/artists/radio stations increasingly look to Shazam to predict what the next big hit or summer smash will be. Shazam also uses its usage data to create new product opportunities.
Pamela Peele (UPMC)
Slides:   1-PPTX 
Big data is the sexy new frontier for many businesses but it’s expensive to stand up in an organization and expensive to buy from an external vendor. What is the most fundamental way to demonstrate that data science matters to the organization? This session covers the meaningful data consumption metric that every data science group needs to track.
Joerg Blumtritt (Datarella)
Slides:   1-PDF 
Each smartphone generates huge heaps of data - up to hundreds of megabytes per day. Apart from location, all sorts of information on behavior and environmental conditions are seamlessly collected in the backgroud of our devices. We will show how to harvest the data and how to tell the story of our everyday lives from the billions of data points that pile up continuously.
Brian Ulicny (Thomson Reuters )
Slides:   1-PPTX 
As the leading source of intelligent information, Thomson Reuters delivers must-have insight to the world’s financial and risk, legal, tax and accounting, intellectual property and science and media professionals, supported by the world’s most trusted news organization.
Emma McGrattan (Actian)
Slides:   1-PDF 
In this session you will hear of some of the fascinating use cases for SQL in Hadoop based on real-world customer examples. You will learn some of the innovative techniques that have emerged to overcome limitations of the Hadoop platform that enable features one expects in a proven mature database.
Alysa Z. Hutnik (Kelley Drye & Warren LLP), Lauri Mazzuchetti (Kelley Drye)
Slides:   1-PPT 
Privacy laws as to a company’s obligations on data collection, use, disclosure are changing rapidly. Failing to understand how the laws affect a company’s personal data assets can result in media exposes, regulatory investigations, Congressional hearings and lawsuits. This session will provide guidance on “privacy by design” compliance and practical tips to avoid becoming a target of scrutiny.
Jim Scott (MapR Technologies, Inc.)
Slides:   1-PDF 
Processing data from social media streams and sensors devices in real-time is becoming increasingly prevalent and there are plenty open source solutions to choose from. To help practitioners decide what to use when we compare three popular Apache projects allowing to do stream processing: Apache Storm, Apache Spark and Apache Samza.
Leah Hunter (Tech Journalist)
Slides:   1-PPTX 
People and startups altering the fabric of things through hardware, data science, and entrepreneurial vision. The shape and business of IoT is shifting. Learn about key startups making technological advances and surprising intellectual leaps. We aren't yet indistinguishable from magic. But these people are getting us there.
Solomon Hsiang (UC Berkeley)
Advances in data science empower leaders to make better decisions for society. By using new kinds of information unavailable during the last several millennia of government, we can avoid mistakes of the past. We will discuss how data and statistical inference are informing how we manage the global climate rationally, a defining policy challenge for our generation.
Josh Baer (Spotify), Rafal Wojdyla (Spotify)
Slides:   1-PDF 
There's many confusing and painful things about setting up and operating a 900 node Hadoop cluster used as the centerpiece in many of Spotify's Big Data initiatives, we'll go over a few interesting stories and frustrations which have influenced the direction of our architectural choices and the lessons we've learned from them.
Slides:   1-PPTX    2-PPTX 
The most frustrating part of data science is when customers don’t “get it”: endless revisions, recommendations not implemented, or data products not adopted. Exciting new research in neurology, cognitive psychology, and behavioral economics have a lot to say about why. We’ll explore the findings and implications for designing more successful “human-data interfaces.”
Michelangelo D'Agostino (Civis Analytics)
Slides:   1-PDF 
If we want to use data to understand human behavior and to design successful interventions to change that behavior, social scientists and data scientists will need to work together. However, the two often approach problems differently and speak strikingly different languages. This talk will present success stories and tips for productive collaboration between social scientists and data scientists.
Monte Zweben (Splice Machine Inc.)
Slides:   1-PPTX 
Once just the realm of Java jockeys and data scientists, Hadoop has become a mainstream tool for business analysts with the rapid proliferation of SQL-on-Hadoop solutions. But there are pitfalls that can plague implementations as IT teams get their first exposure to production Hadoop environments. We’ll discuss the most common pitfalls companies face and how to get around them.
Anirudh Todi (Twitter Inc.)
Slides:   1-PDF 
Twitter's users generate tens of billions of tweet views per day. Aggregating these events in real time - in a robust enough way to incorporate into our products - presents a massive scaling challenge. In this talk I'll introduce TSAR (the TimeSeries AggregatoR), a robust, flexible, and scalable service for real-time event aggregation designed to solve this problem and a range of similar ones
Patrick Wendell (Databricks)
Slides:   1-PPTX 
Apache Spark is a popular engine for large scale analytics. This talk will give insights into tuning and debugging a production Spark deployment. It will start with details about Spark internals and an overview of the runtime behavior of a Spark application. I'll explain how to diagnose performance bottlenecks and get the best performance out of Spark jobs.
India Swearingen (United Way of the Bay Area)
Slides:   external link
Social service organizations have a tough job when it comes to using data to drive social impact. With “world saving” goals and large scale impact, it's crucial these organizations leverage a variety of data streams and do more with less. But, pulling multiple data streams and leveraging partners can be a tricky one, this session walks through some ins-and-outs using United Way as one example.
Richard Williamson (Silicon Valley Data Science)
Slides:   1-PPTX 
Getting the full value from data often requires the combination of stream processing on new events combined with large scale historical analysis. While both these activities are served by Spark’s execution framework, leveraging multiple persistence layers is key to efficiently and extensibly enabling these use cases.
Danyel Fisher (Microsoft Research), Miriah Meyer (University of Utah)
Slides:   external link
We lots of things "data visualization," from a news interactive, to spreadsheets, to an infographic counting calories. These surface similarities hide deep differences in what it means to interact with data. In this talk, we cross disciplines—from data science to design—to enliven our techniques and encourage us to try new methods for creating visualizations.
Anne Johnson (Credit Suisse)
Slides:   1-PPTX 
As the Global Head of Investment Risk, Anne Johnson of Credit Suisse takes data quality very seriously. A single misplaced number can put billions of dollars of client assets’ at risk. Find out some of the challenges that Anne and her team face in governing the integrity of their data and the new ways they are thinking about data integration and quality.
Dean Wampler (Lightbend)
Slides:   1-PDF 
Spark is an open-source computation platform for Big Data. All the major Hadoop vendors have embraced Spark as a replacement for MapReduce, because Spark offers much better performance, a more powerful and productive API, and support for event processing. Spark's secrets for success are the Scala programming language and Functional Programming. We'll explore why.
Ted Dunning (MapR Technologies)
Slides:   1-PPTX 
YARN and MESOS are often positioned as competitors for managing datacenter resources, but in reality they work together to seamlessly share datacenter resources. Why force IT to choose between these two great technologies, when we can show you how they work in concert.
Kathleen Ting (Cloudera), Miklos Christine (Databricks)
Slides:   1-PDF 
The next generation of MapReduce, YARN, has widely touted job throughput and Apache Hadoop cluster utilization benefits. Less known are the pitfalls littering the migration path to YARN. Learn from our extensive field experience to avoid those pitfalls and get your YARN cluster configured right the first time.