Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Sponsored sessions

Add to your personal schedule
11:20am–12:00pm Wednesday, 09/28/2016
Location: 1 E 14
Cheryl Wiebe (Think Big, a Teradata Company)
Average rating: ****.
(4.80, 5 ratings)
The IoT is fundamentally transforming industries and reconfiguring the technology landscape, but challenges exist for enterprises to effectively realize the value from this next wave of information and opportunity. Cheryl Wiebe explores how leading companies harness the IoT by putting IoT data in context, fostering collaboration between IT and OT and enabling a new breed of scalable analytics. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/28/2016
Location: 1B 01/02
Carey James (EMC)
Average rating: ***..
(3.67, 3 ratings)
Big data and analytics is a team sport empowering companies of all kinds to achieve business outcomes faster and with greater levels of success. Carey James explains how the formation of Dell Technologies and Dell EMC can help you on your data analytics journey and how you can turn actionable insights into new business opportunities. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/28/2016
Location: 1 E 09
Crystal Valentine (MapR Technologies)
Average rating: ***..
(3.15, 20 ratings)
Crystal Valentine draws on lessons learned from companies like Uber and Ericsson to outline the key principles to developing a microservices application. Along the way, Crystal describes how certain next-gen application areas—such as machine learning—are particularly well suited to implementation in a microservices architecture rather than a legacy application paradigm. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/28/2016
Location: 1 C04 / 1 C05
Kyle Ambert (Intel)
Average rating: ***..
(3.00, 2 ratings)
Creating production-ready analytical pipelines can be a messy, error-prone undertaking. Kyle Ambert explores the Trusted Analytics Platform, an open source-based platform that enables data scientists to ask bigger questions of their data and carry out principled data science experiments—all while engaging in iterative, collaborative development of production solutions with application developers. Read more.
Add to your personal schedule
11:20am–12:00pm Wednesday, 09/28/2016
Location: 1B 03/04
Average rating: ****.
(4.50, 2 ratings)
Guy Levy-Yurista explains the unexpected consequences of making big data processing significantly more agile than ever before and the impact it's having on human insight consumption. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/28/2016
Location: 1 E 09
Scott Anderson (ClearStory Data), Andrew Yeung (ClearStory Data)
Average rating: ***..
(3.50, 4 ratings)
More data exists than ever before and in more disparate silos. Getting the insights you need, sifting through data, and answering new questions have all been complex, hairy tasks that only data jocks have been able to do. Andrew Yeung and Scott Anderson explore new ways to challenge the status quo and speed insights on diverse sources and demonstrate real customer use cases. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/28/2016
Location: 1B 01/02
Ingo Mierswa (RapidMiner)
The flux capacitor was the core component that made time travel possible in Back to the Future, processing garbage as a power source. Did you know that you can achieve the same affect in machine learning? Ingo Mierswa demonstrates how you can power through your analytics faster than ever before using the knowledge of 250K data scientists. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/28/2016
Location: 1 E 14
Viral Shah (Asurion Services )
Average rating: **...
(2.50, 4 ratings)
Viral Shah explains how enterprises like Asurion Services are leveraging big data management solutions to accelerate enterprise data lake initiatives for business value. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/28/2016
Location: 1B 03/04
Jack Gudenkauf (Hewlett Packard Enterprise)
Average rating: ****.
(4.00, 3 ratings)
Jack Gudenkauf explores how organizations have successfully deployed tiered hyperscale architecture for real-time streaming with Spark, Kafka, Hadoop, and Vertica and discusses how advancements in hardware technologies such as nonvolatile memory, SSDs, and accelerators are changing the role of big data and big analytics platforms in an overall enterprise-data-platform strategy. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, 09/28/2016
Location: 1 C04 / 1 C05
Darryl Smith (Dell)
Average rating: ***..
(3.50, 4 ratings)
Hear the Chief Data Platform Architect of Dell Technologies outline streaming principles. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/28/2016
Location: 1 E 09
Ben Sharma (Zaloni)
Average rating: ****.
(4.23, 13 ratings)
When building your data stack, the architecture could be your biggest challenge. Yet it could also be the best predictor for success. With so many elements to consider and no proven playbook, where do you begin to assemble best practices for a scalable data architecture? Ben Sharma offers lessons learned from the field to get you started. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/28/2016
Location: 1 E 14
Tags: iot
Reiner Kappenberger (HPE Security–Data Security)
Average rating: ***..
(3.00, 1 rating)
Reiner Kappenberger explores the new standards and innovations enabling architects and developers to take a “build it in” approach to security in early design phases for big data and IoT systems, explaining why emerging technologies such as format-preserving encryption are rapidly delivering more trusted big data and IoT ecosystems without altering application behavior or device functionality. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/28/2016
Location: 1B 01/02
Chuck Yarbrough (Pentaho)
Average rating: **...
(2.50, 2 ratings)
It’s hard to get data into a data lake. Organizations hand-code their way through this, but with hundreds of data sources, it soon becomes unmanageable. Chuck Yarbrough offers a solution that uses metadata to autogenerate ingestion processes. Teams can drive hundreds of Hadoop onboarding processes through just a few templates, reducing development time and risk. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/28/2016
Location: 1B 03/04
Connor Carreras (Trifacta)
Average rating: ***..
(3.67, 3 ratings)
Connor Carreras offers an in-depth review of the most popular use cases for data wrangling solutions among enterprise organizations, drawing on real customer deployments to explain how data wrangling has enabled them to accelerate analysis and uncover new sources of business value. Read more.
Add to your personal schedule
2:05pm–2:45pm Wednesday, 09/28/2016
Location: 1 C04 / 1 C05
Shankar Ganapathy (Paxata), Mark Nelson (Standard Chartered Bank), Veronica Liwak (Polaris )
Average rating: **...
(2.00, 7 ratings)
Join data experts from Citi, Standard Charter Bank, and Polaris for a panel discussion moderated by Shankar Ganapathy. Learn about the principles, technologies, and processes they have used to design a highly efficient information management pipeline architected around the Hadoop ecosystem. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/28/2016
Location: 1 C04 / 1 C05
Jeremy Achin (DataRobot), Tom de Godoy (DataRobot)
Average rating: ****.
(4.33, 6 ratings)
In today's world, executives need to be the drivers for data science solutions. Data analysis has moved from the domain of data scientists to the forefront of core strategic initiatives. Are you empowering your team to identify and execute on every opportunity to optimize business with machine learning? In this session, you will learn how executives are transforming business with machine learning. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/28/2016
Location: 1 E 09
Jonathan Gray (Cask)
Average rating: ****.
(4.33, 3 ratings)
Building, running, and governing a data lake on Hadoop is often a difficult process filled with slow development cycles and painful operations. Jonathan Gray proposes a modern, unified integration architecture that helps IT mitigate these issues while enabling businesses to reduce time to insights and make decisions faster through a modern self-service environment. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/28/2016
Location: 1 E 14
Jonathon Whitton (PRGX USA)
Jonathon Whitton details how PRGX is using Talend and Cloudera to load two million annual client flat files into a Hadoop cluster and perform recovery audit services in order to help clients detect, find, and fix leakage in their procurement and payment processes. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/28/2016
Location: 1B 03/04
Johan Bjerke (Splunk Inc)
Average rating: ***..
(3.00, 3 ratings)
Machine data is growing at an exponential rate, and a key driver for this growth is the Internet of Things (IoT) revolution. Johan Bjerke explains how to find value in and make use of the unstructured machine data that plays an important role in the new connected world. Read more.
Add to your personal schedule
2:55pm–3:35pm Wednesday, 09/28/2016
Location: 1B 01/02
Anthony Dina (Dell)
Average rating: ***..
(3.00, 1 rating)
Mastercard's Nick Curcuru hosts an interactive fireside chat with Anthony Dina from Dell to explore how the flexibility, scalability, and agility of Hadoop big data solutions allow one of the world’s leading organizations to innovate, enable, and enhance the customer experience while still expanding emerging opportunities. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/28/2016
Location: 1 E 09
Peter Wang (Anaconda)
Average rating: ***..
(3.33, 3 ratings)
Although Python and R promise powerful data science insights, they can also be complex to manage and deploy with Hadoop infrastructure. Peter Wang distills the vast array of Hadoop and data science tools and architectures down to the essentials that deliver a powerful and lightweight stack quickly so that you can accelerate time to value while meeting your data science, governance, and IT needs. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/28/2016
Location: 1 E 14
Martin Yip (VMware)
Average rating: ****.
(4.00, 1 rating)
The trend of deploying Hadoop on virtual infrastructure is rapidly increasing. Martin Yip explores the benefits of virtualizing Hadoop through the lens of three real-world examples. You'll leave with the confidence to deploy your Hadoop clusters using virtualization. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/28/2016
Location: 1 C04 / 1 C05
Moderated by:
Edd Wilder-James (Google)
Panelists:
Maksim Pecherskiy (City of San Diego), Robert Stratton (Neustar), Chris Kakkanatt (Pfizer)
Average rating: *....
(1.00, 1 rating)
Analytic discovery is a team sport; the lone hero data scientist is a thing of the past. John Akred of Silicon Valley Data Science leads a panel of analytics and data experts from Pfizer, the City of San Diego, and Neustar that explores how these businesses were changed through analytic collaboration. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/28/2016
Location: 1B 03/04
Jake Dolezal (McKnight Consulting Group Global Services)
Average rating: *****
(5.00, 1 rating)
Jake Dolezal shares research into the performance of data quality and data management workloads on Hadoop clusters. Jake discusses a YARN-based approach to data management and outlines highly effective IT resource utilization techniques to achieve extreme agility for organizations and performance gains in Hadoop. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, 09/28/2016
Location: 1B 01/02
Antonio Rosales (Canonical)
Average rating: ****.
(4.00, 1 rating)
Antonio Rosales offers an overview of Juju, an open source method to distill the best practices and operations needed to use interconnected big data solutions. By providing an open source means to describe services and solutions, users can focus on using the science, and developers can focus on delivering best practices. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/28/2016
Location: 1 C04 / 1 C05
Amit Vij (Kinetica), Mark Brooks (Kinetica DB, Inc.)
Data lakes provide large-scale data processing and storage at low cost but struggle to deliver real-time analytics without investment in large clusters. If you need subsecond analytic response on streaming data, consider a GPU database. Amit Vij and Mark Brooks outline the dramatic performance benefits a GPU database offers and explain how to integrate it with Hadoop. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/28/2016
Location: 1 E 09
Matt Turck (FirstMark Capital), Einat Burshtine (Credit Suisse), Shui Cheung Yip (Pershing LLC (Bank of New York Mellon)), Alasdair Anderson (Nordea)
Average rating: ****.
(4.00, 4 ratings)
What's the point at which Hadoop tips from a Swiss-army knife of use cases to a new foundation that rearranges how the financial services marketplace turns data into profit and competitive advantage? This panel of expert practitioners looks into the near future to see if the inflection point is at hand. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/28/2016
Location: 1 E 14
Thomas Place (First Data)
Average rating: ****.
(4.33, 3 ratings)
Thomas Place explores the big data journey of the world’s biggest payment processor, which came dangerously close to building a data swamp before pivoting to embrace governance and quality-first patterns. This case study includes patterns, partners, successes, failures, and lessons learned to date and reviews the journey ahead. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/28/2016
Location: 1B 01/02
Jim McHugh (NVIDIA)
Average rating: ****.
(4.50, 2 ratings)
Customers are looking to extend the benefits beyond big data with the power of the deep learning and accelerated analytics ecosystems. Jim McHugh explains how customers are leveraging deep learning and accelerated analytics to turn insights into AI-driven knowledge and covers the growing ecosystem of solutions and technologies that are delivering on this promise. Read more.
Add to your personal schedule
5:25pm–6:05pm Wednesday, 09/28/2016
Location: 1B 03/04
Amar Arsikere (infoworks.io)
Average rating: *....
(1.00, 1 rating)
Current data warehouse technologies are increasingly challenged to handle the growth in data volume, new data types, and multiple analytics types. Hadoop has the potential to address these issues, but you need to solve several complexities before you can realize its full benefits. Amar Arsikere showcases the business and technical aspects of augmenting and modernizing data warehouses on Hadoop. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/29/2016
Location: 1 C04 / 1 C05
John Hugg (VoltDB)
VoltDB promises full ACID with strong serializability in a fault-tolerant, distributed SQL platform, as well as higher throughput than other systems that promise much less. But why should users believe this? John Hugg discusses VoltDB's internal testing and support processes, its work with Kyle Kingsbury on the VoltDB Jepsen testing project, and where VoltDB will continue to improve. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/29/2016
Location: 1 E 09
Tags: cloud
Rimma Nehme (Microsoft)
Average rating: *****
(5.00, 1 rating)
The amount of cutting-edge technology that Azure puts at your fingertips is incredible. Artificial intelligence is no exception. Azure enables sophisticated capabilities in artificial intelligence, machine learning, deep learning, cognitive services, and advanced analytics. Rimma Nehme explains why Azure is the next AI supercomputer and how this vision is being implemented in reality. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/29/2016
Location: 1 E 14
Rajesh Shroff (Cisco Systems Inc)
Rajesh Shroff reviews the big data and analytics landscape, lessons learned in enterprise over the last few years, and some of the key considerations while designing a big data system. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/29/2016
Location: 1B 03/04
Douglas Liming (SAS Institute Inc.)
Average rating: ****.
(4.00, 1 rating)
Ready to take a deeper look at how Hadoop and its ecosystem has a widespread impact on analytics? Douglas Liming explains where SAS fits into the open ecosystem, why you no longer have to choose between analytics languages like Python, R, or SAS, and how a single, unified open analytics architecture empowers you to literally have it all. Read more.
Add to your personal schedule
11:20am–12:00pm Thursday, 09/29/2016
Location: 1B 01/02
Average rating: ****.
(4.50, 2 ratings)
With so much variance across Hadoop distributions, ODPi was established to create standards for both Hadoop components and testing applications on those components. Join John Mertic and Berni Schiefer to learn how application developers and companies considering Hadoop can benefit from ODPi. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/29/2016
Location: 1 E 14
Average rating: *....
(1.00, 1 rating)
Big data is a critical part of the enterprise data fabric and must meet the critical enterprise criteria of correctness, quality, consistency, compliance, and traceability. Michael Eacrett explains how companies are using big data infrastructures, asynchronously and in real time, to actively solve information governance and data-quality challenges. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/29/2016
Location: 1B 01/02
Joe Goldberg (BMC Software)
Average rating: **...
(2.00, 1 rating)
Joe Goldberg explores how companies like GoPro, Produban, Navistar, and others have taken a platform approach to managing their workflows; how they are using workflows to power data ingest, ETL, and data integration processing; how an end-to-end view of workflows has reduced issue resolution time; and how these companies are achieving success in their data warehouse modernization projects. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/29/2016
Location: 1B 03/04
Sherri Adame (Cigna)
Average rating: ****.
(4.57, 7 ratings)
Launched in late 2015, Cigna's enterprise data lake project is taking the company on a data governance journey. Sherri Adame offers an overview of the project, providing insights into some of the business pain points and key drivers, how it has led to organizational change, and the best practices associated with Cigna’s new data governance process. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/29/2016
Location: 1 C04 / 1 C05
Richard Langlois (IT Architecture & Strategy)
The self-service YP Analytics application allows advertisers to understand their digital presence and ROI. Richard Langlois explains how Yellow Pages used this expertise for an internal use case that delivers real-time analytics with Tableau, using OLAP on Hadoop and enabled by its stack, which includes HDFS, Parquet, Hive, Impala, and AtScale, for fast, real-time analytics and data exploration. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, 09/29/2016
Location: 1 E 09
Tags: cloud
Chad W. Jennings (Google)
BigQuery provides petabyte-scale data warehousing with consistently high performance for all users. However, users coming from traditional enterprise data warehousing platforms often have questions about how best to adapt their workloads for BigQuery. Chad Jennings explores best practices and integration with BigQuery with special emphasis on loading and transforming data for BigQuery. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/29/2016
Location: 1B 01/02
Scott Gnau (Hortonworks)
Average rating: ***..
(3.00, 1 rating)
Scott Gnau provides unique insights into the tipping point for data, how enterprises are now rethinking everything from their IT architecture and software strategies to data governance and security, and the cultural shifts CIOs must grapple with when supporting a business using real-time data to scale and grow. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/29/2016
Location: 1 C04 / 1 C05
John Morrell (Datameer)
A panel of practitioners from from Dell, National Instruments, and Citi—companies that are gaining real value from big data analytics—explore their companies' big data journeys, explaining how analytics can answer groundbreaking new questions about business and create a path to becoming a data-driven organization. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/29/2016
Location: 1 E 09
Tags: cloud
Jonathan Fritz (Amazon Web Services)
Average rating: *****
(5.00, 1 rating)
Running Hadoop, Spark, and Presto can be as fast and inexpensive as ordering a latte at your favorite coffee shop. Jonathan Fritz explains how organizations are deploying these and other big data frameworks with Amazon Web Services (AWS) and how you too can quickly and securely run Spark and Presto on AWS. Jonathan shows you how to get started and shares best practices and common use cases. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/29/2016
Location: 1 E 14
Steve Touw (Immuta)
Average rating: *****
(5.00, 3 ratings)
Sharing your valuable data internally or with third-party consumers can be risky due to data privacy regulations and IP considerations, but sharing can also generate revenue or help nonprofits succeed at world-changing missions. Steve Touw explores real-world examples of how a proper data architecture enables philanthropic missions and offers ideas for how to better share your data. Read more.
Add to your personal schedule
2:05pm–2:45pm Thursday, 09/29/2016
Location: 1B 03/04
Joe Caserta (Caserta Concepts)
Average rating: *****
(5.00, 1 rating)
Joe Caserta explores how a leading membership interest group is utilizing a data lake to track its members’ path-to-purchase touch points across multiple channels by matching and mastering individuals using Spark GraphFrames and stitching together website, marketing, email, and transaction data to discover the most effective way to attract new members and retain existing high-value members. Read more.
Add to your personal schedule
2:55pm–3:35pm Thursday, 09/29/2016
Location: 1B 03/04
Mariusz Gadarowski (deepsense.io)
Mariusz Gądarowski offers an overview of Neptune, deepsense.io’s new IT platform-based machine-learning experiment management solution for data scientists. Neptune enhances the management of machine-learning tasks such as dependent computational processes, code versioning, comparing achieved results, monitoring tasks and progress, sharing infrastructure among teammates, and many others. Read more.