Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Sponsored conference sessions

Wednesday, March 30

9:25am–9:35am Wednesday, 03/30/2016
Location: Grand Ballroom 220
Tags: real-time
Jack Norris (MapR Technologies)
Average rating: ***..
(3.49, 77 ratings)
Big data is not limited to reporting and analysis; increasingly, companies are differentiating themselves by acting on data in real time. But what does "real time" really mean? Jack Norris discusses the challenges of coordinating data flows, analysis, and integration at scale to truly impact business as it happens. Read more.
9:35am–9:40am Wednesday, 03/30/2016
Location: Grand Ballroom 220
Ian Andrews (Pivotal)
Average rating: ***..
(3.76, 68 ratings)
Pivotal’s Ian Andrews explores why delivering information in context is the key to competitive differentiation in the digital economy. Read more.
11:00am–11:40am Wednesday, 03/30/2016
Location: LL20 B
Tags: real-time, iot
Yvonne Quacken (Siemens), Allen Hoem (Teradata)
Average rating: ****.
(4.50, 4 ratings)
Yvonne Quacken and Allen Hoem explore the business and technical challenges that Siemens faced capturing continuous data from millions of sensors across different areas and explain how Teradata Listener helped Siemens simplify this data-capture process with a single, central service to ingest multiple real-time data streams simultaneously in a reliable fashion. Read more.
11:00am–11:40am Wednesday, 03/30/2016
Location: LL21 A
Jagane Sundar (WANdisco)
Average rating: ***..
(3.00, 3 ratings)
Jagane Sundar discusses the unique challenges of hybrid big data deployments and outlines strategies to address them. Read more.
11:00am–11:40am Wednesday, 03/30/2016
Location: 210 B/F
Tags: real-time
Eric Frenkiel (MemSQL), JR Cahill (Kellogg)
Average rating: **...
(2.86, 7 ratings)
To win in the on-demand economy, businesses must embrace real-time analytics. Eric Frenkiel demos an enterprise approach to data solutions for predictive analytics. Eric is joined by JR Cahill, who outlines Kellogg's approach to advanced analytics with MemSQL, including moving from overnight to intraday analytics and integrating directly with business intelligence tools like Tableau. Read more.
11:00am–11:40am Wednesday, 03/30/2016
Location: 230 B
Mario Inchiosa (Microsoft), Roni Burd (Microsoft)
Average rating: ***..
(3.50, 8 ratings)
Hadoop is famously scalable, as is cloud computing. R, the thriving and extensible open source data science software. . .not so much. Mario Inchiosa and Roni Burd outline how to seamlessly combine Hadoop, cloud computing, and R to create a scalable data science platform that lets you explore, transform, model, and score data at any scale from the comfort of your favorite R environment. Read more.
11:50am–12:30pm Wednesday, 03/30/2016
Location: LL20 B
Average rating: ***..
(3.00, 2 ratings)
An interactive panel, hosted by Dell's Armando Acosta, explores how business units have taken advantage of Hadoop's strengths to quickly identify and implement solutions that deal with massive amounts of data to deliver valuable results across the business. Read more.
11:50am–12:30pm Wednesday, 03/30/2016
Location: LL21 A
Wei Zheng (Trifacta), Mohan Sadashiva (Waterline Data), Mark Donsky (Okera)
Average rating: ***..
(3.00, 5 ratings)
Wei Zheng, Mohan Sadashiva, and Mark Donsky explain how data-wrangling tools not only enable users to work with a variety of new or complex sources of data in Hadoop but also ensure that the data lineage and metadata created through the process are appropriately catalogued and made available to others in the organization. Read more.
11:50am–12:30pm Wednesday, 03/30/2016
Location: 210 B/F
Nidhi Aggarwal (Tamr, Inc.)
Average rating: ***..
(3.00, 2 ratings)
Data scientists have career-making opportunities to use more diverse datasets to deliver bigger business returns. Nidhi Aggarwal demonstrates how Tamr, a machine-driven, human-guided approach to finding, integrating, and preparing data, enables new levels of insight into corporate spend over previous analytics tools—in one case identifying new savings opportunities worth more than $100M. Read more.
11:50am–12:30pm Wednesday, 03/30/2016
Location: 230 B
Tags: iot
Chris Rawles (Pivotal)
Average rating: ***..
(3.80, 10 ratings)
The Internet of Things (IoT) continues to provide value and hold promise for both the consumer and enterprise alike. To succeed, an IoT project must concern itself with how to ingest data, build actionable models, and react in real time. Chris Rawles describes approaches to addressing these concerns through a deep dive into an interactive demo centered around classification of human activities. Read more.
1:50pm–2:30pm Wednesday, 03/30/2016
Location: LL20 B
Tags: real-time, iot
John Hugg (VoltDB)
Average rating: ****.
(4.00, 2 ratings)
In the race to pair streaming systems with stateful systems, the winners will be stateful systems that process streams natively. These systems remove the burden on application developers to be distributed systems experts and enable new applications to be both powerful and robust. John Hugg describes what’s possible when integrated systems apply a transactional approach to event processing. Read more.
1:50pm–2:30pm Wednesday, 03/30/2016
Location: LL21 A
Emma McGrattan (Actian)
Average rating: ***..
(3.80, 5 ratings)
Hadoop can bring great value to businesses but also big headaches. Some solutions that provide SQL access to Hadoop data mean changing your business processes to overcome limitations in the technologies. Emma McGrattan explains how users can unlock tremendous business value through SQL-driven Hadoop solutions. Emma outlines what should be on your checklist and the pitfalls to avoid. Read more.
1:50pm–2:30pm Wednesday, 03/30/2016
Location: 210 B/F
Keith Manthey (Dell EMC)
Average rating: ***..
(3.71, 7 ratings)
Many companies have created extremely powerful Hadoop use cases with highly valuable outcomes. The diverse adoption and application of Hadoop is producing an extremely robust ecosystem. However, teams often create silos around their Hadoop, forgetting some of the hard-learned lessons IT has gained over the years. Keith Manthey discusses one such often overlooked feature—governance. Read more.
1:50pm–2:30pm Wednesday, 03/30/2016
Location: 230 B
Don Perigo (GE Power)
Average rating: ***..
(3.93, 15 ratings)
Applying big data to an internal business use case is challenging and requires expertise and focus. Even harder is scaling it out across a global enterprise. Don Perigo explains how GE Power Services has been able to deliver results in an uncertain world by leveraging big data and scaling its platform across a global employee base that spans over 25 countries. Read more.
2:40pm–3:20pm Wednesday, 03/30/2016
Location: LL20 B
Carlos Guestrin (Dato Inc.)
Average rating: ****.
(4.20, 5 ratings)
Machine learning is a hot topic. Recommenders, sentiment analysis, churn and click-through prediction, image recognition, and fraud detection are at the core of intelligent applications. However, developing these models is laborious. Carlos Guestrin shares a new approach to leverage massive amounts of data and applied machine learning at scale to create intelligent applications. Read more.
2:40pm–3:20pm Wednesday, 03/30/2016
Location: LL21 A
Wei Wang (Hortonworks), Scott Gnau (Hortonworks)
Average rating: ****.
(4.00, 1 rating)
Join Hortonworks to discuss transformational use cases from Hortonworks customers that manage data in motion and data at rest. Hortonworks's Wei Wang and Scott Gnau explore the modern data applications being built and deployed in 2016 that are driving new frontiers in information technology. Read more.
2:40pm–3:20pm Wednesday, 03/30/2016
Location: 210 B/F
Grega Kespret (Celtra Inc.)
Average rating: ****.
(4.50, 2 ratings)
Celtra provides a platform for customers like Porsche and Fox to create, track, and analyze digital display advertising. Celtra's platform processes billions of ad events daily to give analysts fast and easy access to reports and ad hoc analytics. Grega Kešpret outlines Celtra’s data-pipeline challenges and explains how it solved them by combining Snowflake's cloud data warehouse with Spark. Read more.
2:40pm–3:20pm Wednesday, 03/30/2016
Location: 230 B
Dave Wells (Paxata), Nenshad Bardoliwalla (Paxata), Travis Ringger (PwC), Conrad Mulcahy (K2 Intelligence)
Average rating: **...
(2.75, 8 ratings)
In a conversation moderated by Nenshad Bardoliwalla, analytic leaders Conrad Mulcahy, Travis Ringger, and Dave Wells share real-world data-preparation challenges and discuss new technologies, including Spark-powered machine learning, latent semantic indexing, statistical pattern recognition, and text analytics techniques, that accelerate the ability to transform data into usable information. Read more.
4:20pm–5:00pm Wednesday, 03/30/2016
Location: LL20 B
Amit Walia (Informatica), Badhrinath Krishnamoorthy (Cognizant)
Average rating: **...
(2.50, 2 ratings)
Amit Walia, chief product officer of Informatica, hosts a discussion with industry experts on how big data management can enable organizations to deliver faster, more flexible, and more repeatable big data projects while ensuring security and governance. Learn how organizations are using big data management to be more successful with their big data initiatives. Read more.
4:20pm–5:00pm Wednesday, 03/30/2016
Location: LL21 A
Mok Choe (TD Bank Group ), Paul Barth (Podium Data)
Average rating: ***..
(3.67, 9 ratings)
Learn how TD Bank is creating the bank of the future through IT 3.0. Central to this is business agility, fueled by secure, self-service access to enterprise and market data. Mok Choe and Paul Barth detail the fundamentals for success in this transformation, which started with rapid consolidation of hundreds of data sources onto a Hadoop enterprise data provisioning platform. Read more.
4:20pm–5:00pm Wednesday, 03/30/2016
Location: 210 B/F
Patrick Hall (SAS), Paul Kent (SAS)
Average rating: ***..
(3.60, 10 ratings)
Although it’s been around for decades, machine learning is currently thriving, and organizations are looking to benefit from it. Patrick Hall and Paul Kent offer 10 crucial tips to know before venturing into the mix—a personal survival guide from the creators of a solution that was there in the beginning and continues to drive the industry today. Read more.
4:20pm–5:00pm Wednesday, 03/30/2016
Location: 230 B
Bob Hansen (HPE)
Average rating: ***..
(3.00, 1 rating)
Bob Hansen outlines the latest innovations from HPE for SQL on Hadoop. Read more.
5:10pm–5:50pm Wednesday, 03/30/2016
Location: LL20 B
Partha Seetala (Robin Systems)
Average rating: ***..
(3.50, 2 ratings)
Containers have taken the world by storm by radically transforming the way applications are built and deployed. But many fail to appreciate how powerful containers can be for performance-sensitive data applications. Partha Seetala explains how containers can help you "virtualize" your mission-critical enterprise applications, simplify application life cycles, and increase data-center efficiency. Read more.
5:10pm–5:50pm Wednesday, 03/30/2016
Location: LL21 A
Sudipto Dasgupta (Infosys Limited), Ganesan Pandurangan (Infosys Limited)
Sudipto Dasgupta and Ganesan Pandurangan offer a case study of a large multinational imaging and electronics company that migrated accounts receivable reports to the Hadoop-based open source Infosys Information Platform, which implemented dynamic age bucketing capabilities and reduced the number of end-user views from over 400 to 50. Read more.
5:10pm–5:50pm Wednesday, 03/30/2016
Location: 210 B/F
Tags: real-time
Siva Raghupathy (Amazon Web Services), Manjeet Chayel (Amazon Web Services)
Average rating: ****.
(4.50, 6 ratings)
Analyzing real-time streams of data is becoming increasingly important to remain competitive. Siva Raghupathy and Manjeet Chayel guide attendees through some of the proven architectures for processing streaming data using a combination of cloud and open source tools such as Apache Spark. Watch a live demo and learn how you can easily scale your applications with Amazon Web Services. Read more.
5:10pm–5:50pm Wednesday, 03/30/2016
Location: 230 B
Sandy Steier (1010data), Dennis Gleeson (1010data)
Average rating: ****.
(4.00, 1 rating)
Sandy Steier and Dennis Gleeson explain how the promise of easy data sharing and collaborative analysis—on petabyte-scale data—can fundamentally change business culture in the same way that the Internet has changed our consumer culture. Read more.

Thursday, March 31

9:15am–9:25am Thursday, 03/31/2016
Location: Grand Ballroom 220
Joseph Sirosh (Microsoft), kai miller (Stanford University)
Average rating: ****.
(4.33, 94 ratings)
Joseph Sirosh offers a fascinating look into how brains connected with sensors to the cloud and machine learning could revolutionize a field of medicine. Read more.
9:40am–9:45am Thursday, 03/31/2016
Location: Grand Ballroom 220
Bob Rogers (Intel)
Average rating: ***..
(3.35, 66 ratings)
Bob Rogers, Intel's chief data scientist for big data solutions, demonstrates the power of the question in analytics. Learn how different types of data, from cubes of structured data to live video streams from mobile systems, combine with analytical technology to inform the questions that can be answered. Read more.
9:55am–10:00am Thursday, 03/31/2016
Location: Grand Ballroom 220
Average rating: ***..
(3.49, 49 ratings)
As the volume and variety of data continue to grow, organizations have the opportunity to transform their industries and professions, but companies are grappling with how to deliver innovation. Adam Kocoloski shares his experience around this market shift and challenges attendees to join his mission of contributing to the community and investing in the power of open source and the cloud. Read more.
11:00am–11:40am Thursday, 03/31/2016
Location: LL20 B
Average rating: ****.
(4.00, 5 ratings)
Did you know Apache Spark is helping transform industries, companies, and your everyday life? David Taieb and Mythili Venkatakrishnan demonstrate two use cases of how Apache Spark is being used to harness valuable insights from complex data across cloud and hybrid environments. Read more.
11:00am–11:40am Thursday, 03/31/2016
Location: LL21 A
Kevin Goode (Inmar)
Inmar handles 3.7 billion transactions annually. Kevin Goode explains Inmar's transformation, starting in 2012, from a business-services company to a data-driven enterprise using Hadoop. Read more.
11:00am–11:40am Thursday, 03/31/2016
Location: 210 B/F
Tags: real-time
Steve Wooledge (MapR Technologies)
Average rating: ***..
(3.67, 3 ratings)
In order to remain competitive, you need to be able to respond to changing conditions in the moment. New stream-based technologies allow you to build applications that incorporate low-latency processing so you can stream data immediately or whenever you’re ready. Steve Wooledge explores how new streaming technologies make this approach work and how they can be applied in many industries. Read more.
11:00am–11:40am Thursday, 03/31/2016
Location: 230 B
Bob Rogers (Intel)
Average rating: *****
(5.00, 1 rating)
Join Bob Rogers, Intel’s chief data scientist for big data solutions, and special guests to see how Intel’s open source Trusted Analytics Platform has accelerated and simplified the development of powerful analytics that are changing the game. Read more.
11:50am–12:30pm Thursday, 03/31/2016
Location: LL21 A
Ben Sharma (Zaloni)
Average rating: ****.
(4.00, 12 ratings)
When building your data stack, the architecture could be your biggest challenge. Yet it could also be the best predictor for success. With so many elements to consider and no proven playbook, where do you begin to assemble best practices for a scalable data architecture? Ben Sharma offers lessons learned from the field to get you started. Read more.
11:50am–12:30pm Thursday, 03/31/2016
Location: 210 B/F
Peter Prettenhofer (DataRobot), Owen Zhang (DataRobot)
Average rating: ****.
(4.44, 9 ratings)
Effective and efficient model selection and tuning is crucial for building machine-learning systems, but large-scale machine-learning problems require us to rethink the model-selection and tuning process. Peter Prettenhofer and Owen Zhang outline the tradeoffs we need to make and demonstrate how to efficiently search and tune complex machine-learning pipelines in MLlib. Read more.
11:50am–12:30pm Thursday, 03/31/2016
Location: 230 B
Chuck Yarbrough (Pentaho), Mark Burnette (Pentaho, a Hitachi Group Company)
Average rating: **...
(2.75, 4 ratings)
A major challenge in today’s world of big data is getting data into the data lake in a simple, automated way. Coding scripts for disparate sources is time consuming and difficult to manage. Developers need a process that supports disparate sources by detecting and passing metadata automatically. Chuck Yarbrough and Mark Burnette explain how to simplify and automate your data ingestion process. Read more.
1:50pm–2:30pm Thursday, 03/31/2016
Location: LL20 B
Kaz Sato (Google), Amy Unruh (Google)
Average rating: ****.
(4.21, 14 ratings)
Kazunori Sato and Amy Unruh explore how you can use TensorFlow to drive large-scale distributed machine learning against your analytic data sitting in Google BigQuery, with data preprocessing driven by Dataflow (now Apache Beam). Kazunori and Amy dive into practical examples of how these technologies can work together to enable a powerful workflow for distributed machine learning. Read more.
1:50pm–2:30pm Thursday, 03/31/2016
Location: LL21 A
Tags: real-time
TJ Potter (Lucidworks )
Average rating: *****
(5.00, 2 ratings)
Solr has been adopted by all major Hadoop platform vendors as the de facto standard for big data search. Timothy Potter introduces an open source project that exposes Solr as a SparkSQL datasource. Timothy offers common use cases, access to open source code, and performance metrics to help you develop your own large-scale search and discovery solution. Read more.
1:50pm–2:30pm Thursday, 03/31/2016
Location: 210 B/F
Average rating: **...
(2.00, 1 rating)
The Defense Advanced Research Projects Agency (DARPA) is synonymous with transformational change, developing the seeming impossible into the practical. Matthew van Adelsberg demonstrates how collaborative teams of SMEs, data scientists, and engineers have been organized to achieve “DARPA hard” results for nearly a decade and offers insights into how companies can do the same. Read more.
1:50pm–2:30pm Thursday, 03/31/2016
Location: 230 B
Tags: real-time
Matt Olson (CenturyLink)
Software-defined networking (SDN) and network functions virtualization (NFV) hold tremendous potential to enable efficiency and flexibility in service delivery, but SDN/NFV environments are also highly complex and multilayered. Matt Olson explains why effective support for SDN/NFV services requires leveraging the tremendous amount of service and data streaming from the platform. Read more.
2:40pm–3:20pm Thursday, 03/31/2016
Location: LL20 B
Tags: real-time
Average rating: ****.
(4.00, 1 rating)
Join the SAP team for a demonstration of how OLAP on Hadoop and real-time query federation help unify enterprise and big data, using SAP's new big data solution, SAP HANA Vora. Amit Satoor and Balalji Krishna explore real-world use cases where instant insights from a combination of operational and Hadoop data impact core business operations Read more.
2:40pm–3:20pm Thursday, 03/31/2016
Location: LL21 A
Joe Goldberg (BMC Software)
Joseph Goldberg discusses the attributes required of a batch management platform that can accelerate development by enabling programmers to generate workflows as code, support continuous deployment with rich APIs and lightweight workflow-scheduling infrastructure, and optimize production with comprehensive enterprise operational capabilities like SLA management and full log and output management. Read more.
2:40pm–3:20pm Thursday, 03/31/2016
Location: 210 B/F
Martin Yip (VMware), Justin Murray (VMware)
Average rating: ***..
(3.00, 2 ratings)
Martin Yip and Justin Murray explore the benefits of virtualization of Hadoop on vSphere and delve into three different examples of real-world deployments—at small, medium, and large scales—to demonstrate how enterprises are currently deploying Hadoop differently on virtual machines. Read more.
2:40pm–3:20pm Thursday, 03/31/2016
Location: 230 B
Jeff Pohlmann (Oracle)
Jeff Pohlmann explores the skills, challenges, and solutions necessary to turn big data into big results. Learn more effective ways to increase productivity and decrease costs, aid in the allocation of key personnel and resources, better determine the true sentiment of customers, determine the impact of changing processes on production, and help solve a host of other needs. Read more.