Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Sponsored sessions

11:00am11:40am Wednesday, March 15, 2017
Location: LL20 B
Average rating: ***..
(3.67, 3 ratings)
James Burkhart explains how Uber supports millions of analytical queries daily across real-time data with Apollo. Read more.
11:00am11:40am Wednesday, March 15, 2017
Location: LL21 A
Jason Slepicka (DataScience)
Average rating: ****.
(4.33, 3 ratings)
Apache Spark has become the go-to system for servicing ad hoc queries, but the Catalyst optimizer still lacks many of the pushdown optimizations necessary to take advantage of native database features. Jason Slepicka explains how DataScience replaced Catalyst with Apache Calcite to achieve performance improvements of two orders of magnitude when querying SQL and NoSQL databases with Spark. Read more.
11:00am11:40am Wednesday, March 15, 2017
Location: 210 B/F
Erin Banks (Dell EMC)
Average rating: **...
(2.50, 2 ratings)
A recent study suggests that 44 % of businesses are unsure what to do about big data. Erin Banks explains how big data analytics can help transform your business and ensure your data provides the greatest value to you, covering best business practices to help you achieve insights from your analytics, extract value from your data, and drive business change. Read more.
11:00am11:40am Wednesday, March 15, 2017
Location: 230 B
Sasi Kuppannagari (Intel Corporation)
Sasi Kuppannagari explores the innovative sports analytics solutions Intel is creating, such as using computer vision and big data analytics for athlete performance optimization. Read more.
11:50am12:30pm Wednesday, March 15, 2017
Location: LL21 A
Ben Sharma (Zaloni)
Average rating: ****.
(4.50, 8 ratings)
When building your data stack, architecture could be your biggest challenge—yet it could also be the best predictor for success. With so many elements to consider and no proven playbook, where do you begin when assembling a scalable data architecture? Ben Sharma shares real-world lessons and best practices to get you started. Read more.
11:50am12:30pm Wednesday, March 15, 2017
Location: LL20 B
Darren Chinen (Malwarebytes), Sujay Kulkarni (Malwarebytes), Manjunath Vasishta (Malwarebytes)
Darren Chinen, Sujay Kulkarni, and Manjunath Vasishta demonstrate how to use a Lambda architecture to provide real-time views into big data by combining batch and stream processing, leveraging BMC’s Control-M as a critical component of both batch processing and ecosystem management. Read more.
11:50am12:30pm Wednesday, March 15, 2017
Location: 210 B/F
Nitin Bandugula (MapR Technologies)
Average rating: **...
(2.67, 3 ratings)
Machine-learning algorithms can improve predictions and optimize business operations across industry verticals, but building and scoring models still presents a significant computational challenge requiring massive training data and complex pipelines. Nitin Bandugula outlines the benefits of implementing a microservices-based architecture to support a machine-learning model-scoring workflow. Read more.
11:50am12:30pm Wednesday, March 15, 2017
Location: 230 B
Jonathan Gray (Cask)
Average rating: *****
(5.00, 1 rating)
Hadoop and Spark provide scale and flexibility at a low cost compared to data warehouses, but the messy and diverse nature of big data results in undesirable complexities and inefficiencies. Jonathan Gray explores the standardization, automation, and deep integration technologies that allow users to focus on application logic and insights rather than infrastructure and integration. Read more.
1:50pm2:30pm Wednesday, March 15, 2017
Location: 210 B/F
Mark Burnette (Pentaho, a Hitachi Group Company)
Average rating: **...
(2.67, 3 ratings)
Mark Burnette outlines five keys to success with data lakes and explores several real-world data lake implementations that are changing the world. Read more.
1:50pm2:30pm Wednesday, March 15, 2017
Location: 230 B
Ethan Zhang (VoltDB)
Continuous queries on streaming data play a vital role in fast data applications, providing always up-to-date results based on the most recent data. Ethan Zhang offers an overview of VoltDB, a NewSQL distributed database that supports continuous queries three orders of magnitude faster with materialized views, highlighting a transparent, automatic, and incremental-view maintenance approach. Read more.
1:50pm2:30pm Wednesday, March 15, 2017
Location: LL20 B
Murthy Mathiprakasam (Informatica)
Average rating: ***..
(3.00, 1 rating)
Stuck with manual, siloed, inflexible, laborious practices for big data projects? Successful teams use machine-learning-based approaches to power self-service preparation, enterprise-wide data catalogs, and real-time stream processing with role-specific tools. Murthy Mathiprakasam explains how using Informatica atop Hadoop, Spark, and Spark Streaming maximizes teamwork, trust, and timeliness. Read more.
1:50pm2:30pm Wednesday, March 15, 2017
Location: LL21 A
Ken Tsai (SAP), Michael Eacrett (SAP)
Ken Tsai and Michael Eacrett explore critical components of enterprise production environments that support day-to-day business processes while ensuring security, governance, and operational administration and share best practices to ensure business value. Read more.
2:40pm3:20pm Wednesday, March 15, 2017
Location: 210 B/F
Greg Michaelson (DataRobot)
Average rating: ****.
(4.50, 2 ratings)
Companies store tons of data in Hadoop in hopes of turning the data into actionable insights, but maximizing the value of this resource with artificial intelligence and machine learning eludes most organizations. Greg Michaelson defines analytic trends around Hadoop, separates fact from hype, and sets out a roadmap for fully optimizing the value of the data stored in Hadoop. Read more.
2:40pm3:20pm Wednesday, March 15, 2017
Location: LL20 B
The massive shift of data to the cloud is exacerbating data preparation and transport complexities that slow data analytics to a crawl. Bill Dentinger explains how the deployment of FPGA/x86-based heterogeneous compute architectures by cloud vendors is giving all organizations the opportunity to speed their data analytics to unprecedented levels. Read more.
2:40pm3:20pm Wednesday, March 15, 2017
Location: LL21 A
Reflecting the old horror gimmick "the call that comes from inside the house," an increasing number of data breaches are carried out by insiders. Charlotte Crain and Tyler Freckman share a unique, hybrid approach to insider threat deterrence that combines traditional detection methods and investigative methodologies with behavioral analysis to enable complete, continuous monitoring of activity. Read more.
2:40pm3:20pm Wednesday, March 15, 2017
Location: 230 B
Average rating: ****.
(4.50, 2 ratings)
Thousands of companies have made their initial investments into next-generation data lake architecture, and they are on the verge of generating quality business returns. Chandhu Yalla and Neshad Bardoliwalla explain how enterprises have unlocked tangible value from their data lakes with adaptive information management and how their organizations are providing self-service to business units. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Location: 210 B/F
Serdar Sahin (Peak Games)
Peak Games, a leading online and mobile company, unites 30 million monthly unique players with free, culturally relevant, community-driven games. Serdar Sahin shares the company's journey evaluating MPP columnar databases against Hadoop to find the right data infrastructure to enable the company to handle the unpredictable popularity of newly launched games. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Location: 230 B
Eric Anderson (Beachbody), Shyam Konda (Beachbody)
Average rating: ***..
(3.50, 2 ratings)
Eric Anderson and Shyam Konda explain how the IT team at Beachbody—the makers of P90X and CIZE—successfully ingested all their enterprise data into Amazon S3 and delivered self-service access in less than six months with Talend. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Location: LL20 B
Justin Murray (VMware)
Justin Murray outlines the benefits of virtualizing Hadoop and Spark, covering the main architectural approaches at a technical level and demonstrating how the core Hadoop architecture maps into virtual machines and how those relate to physical servers. You'll gain a set of design approaches and best practices to make your application infrastructure fit well with the virtualization layer. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Location: LL21 A
Scott Gnau (Hortonworks)
Average rating: ****.
(4.00, 5 ratings)
Big data is moving from science projects to mainstream, mission-critical deployments. Drawing on his interactions and conversations with business and IT leaders across the world, Scott Gnau outlines adoption trends and popular use cases. Read more.
5:10pm5:50pm Wednesday, March 15, 2017
Location: 210 B/F
Siva Raghupathy (Amazon Web Services), Ben Snively (Amazon Web Services (AWS))
Average rating: ****.
(4.00, 3 ratings)
Siva Raghupathy and Ben Snively explore the concepts behind and benefits of serverless architectures for big data, looking at design patterns to ingest, store, process, and visualize your data. Along the way, they explain when and how you can use serverless technologies to streamline data processing and share a reference architecture using a combination of cloud and open source technologies. Read more.
5:10pm5:50pm Wednesday, March 15, 2017
Location: 230 B
Xiatian Zhang (TalkingData)
Large-scale machine learning is a big challenge in industry due to the huge computing resources required and the difficulty of parameter tuning. Xiatian Zhang offers an overview of Fregata, TalkingData's open source machine-learning library based on Spark, which provides a lightweight, fast, memory-efficient, and parameter-free solution for large-scale machine learning. Read more.
11:00am11:40am Thursday, March 16, 2017
Location: 230 B
Average rating: ***..
(3.25, 4 ratings)
Teradata joined the Presto community in 2015 and is now a leading contributor to this open source SQL engine, originally created by Facebook. Join Kamil Bajda-Pawlikowski to learn about Presto, Teradata's recent enhancements in query performance, security integrations, and ANSI SQL coverage, and its roadmap for 2017 and beyond. Read more.
11:00am11:40am Thursday, March 16, 2017
Location: 210 B/F
Wee Hyong Tok (Microsoft), Danielle Dean (Microsoft)
Average rating: *****
(5.00, 1 rating)
Wee Hyong Tok and Danielle Dean explain how the global, trusted, and hybrid Microsoft platform can enable you to do intelligence at scale, describing real-life applications where big data, the cloud, and AI are making a difference and how this is accelerating the digital transformation for these organizations at a lighting pace. Read more.
11:00am11:40am Thursday, March 16, 2017
Location: LL20 B
Roger Rea (IBM Information Management), Jorge Castanon (IBM)
Roger Rea and Jorge Castanon outline the top enterprise use cases for streaming and machine learning. Read more.
11:50am12:30pm Thursday, March 16, 2017
Location: 230 B
Luhui Hu (Futurewei Technologies)
Average rating: ****.
(4.00, 3 ratings)
With Huawei's big data cloud ecosystem, you can define and setup your data pipelines quickly and easily, whether you’re looking for batch processing or stream analytics. Luhui Hu shares best practices for designing a big data pipeline in the cloud and explains how to implement serverless big data solutions and intelligent data clouds. Read more.
11:50am12:30pm Thursday, March 16, 2017
Location: 210 B/F
Vahid Fereydouny (VMware), Anjaneya Chagam (Intel Corporation)
Vahid Fereydouny and Anjaneya Chagam share the results of running Hadoop workloads on a standard all-flash vSAN cluster, unleashing the simplicity and power of big data in a hyperconverged environment. Read more.
11:50am12:30pm Thursday, March 16, 2017
Location: LL20 B
Victoria Livschitz (Grid Dynamics)
Average rating: *****
(5.00, 1 rating)
Victoria Livschitz outlines key business drivers for real-time analytics applications in retail and describes the emerging architectures based on in-stream processing (ISP) technologies. Victoria shares a complete open blueprint for an ISP platform—including a demo application for real-time Twitter sentiment analytics—designed with 100% open source components and deployable to any cloud. Read more.
1:50pm2:30pm Thursday, March 16, 2017
Location: 230 B
Jagane Sundar (WANdisco)
Jagane Sundar shares a strongly consistent replication service for replicating between cloud object stores, HDFS, NFS, and other S3- and Hadoop-compatible filesystems. Read more.
1:50pm2:30pm Thursday, March 16, 2017
Location: 210 B/F
Rob Craft (Google)
Average rating: *****
(5.00, 1 rating)
Rob Craft explores machine learning and predictive analytics, explaining how you can leverage the power of ML whether you have a machine-learning team of your own or just want to use ML as a service. Read more.