Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Personal schedule for GAURAV MATHUR

Download or subscribe to GAURAV MATHUR's schedule.

9:00am–5:00pm Tuesday, 03/29/2016
T.J. Alumbaugh (Continuum Analytics), James Powell (NumFOCUS), Bryan Van de Ven (Continuum Analytics), Sarah Bird (Continuum Analytics), Jake VanderPlas (eScience Institute, University of Washington), Katrina Riehl (Continuum Analytics)
Python has become an increasingly important part of the data-engineer and analytic-tool landscapes. PyData at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including IPython Notebook, NumPy/matplotlib, SciPy, and scikit-learn, and explores how to scale Python performance, including handling large, distributed datasets. Read more.
9:00am–5:00pm Tuesday, 03/29/2016
Hardcore Data Science
Location: 210 C/G
Ben Lorica leads a full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. Read more.
11:00am–11:40am Wednesday, 03/30/2016
Security

Location: LL21 B
Tags: media
Ram Shankar Siva Kumar (Microsoft (Azure Security Data Science)), Cody Rioux (Netflix (Real-time Analytics))
In the era of large-volume security applications, false positives, as Gartner says, can make the difference between building an "indicator machine" and an "answering machine." Ram Shankar and Cody Rioux explore how to suppress false positives in security monitoring systems through use cases from Microsoft and Netflix. Read more.
11:00am–11:40am Wednesday, 03/30/2016
Law, Ethics, Governance

Location: 211 A-C
Jake Porway (DataKind), Rachel Quint (Hewlett Foundation), Sue-Ann Ma, Jeremy Anderson (IBM)
So many of the data projects making headlines—from a new app for finding public services to a new probabilistic model for predicting weather patterns for subsistence farmers—are great accomplishments but don’t seem to have end users in mind. Discover how organizations are designing with, not for, people, accounting for what drives them in order to make long-lasting impact. Read more.
11:50am–12:30pm Wednesday, 03/30/2016
Law, Ethics, Governance

Location: 211 A-C
Jake Porway (DataKind), Daniella Perlroth (Lyra Health), Tim Hwang (ROFLCon / The Web Ecology Project), Lucy Bernholz (Stanford University)
So many of the data projects making headlines—from a new app for finding public services to a new probabilistic model for predicting weather patterns for subsistence farmers—are great accomplishments but don’t seem to have end users in mind. Discover how organizations are designing with, not for, people, accounting for what drives them in order to make long-lasting impact. Read more.
1:50pm–2:30pm Wednesday, 03/30/2016
Law, Ethics, Governance

Location: 211 A-C
Mike Lee Williams (Cloudera Fast Forward Labs)
Machines are not objective, and big data is not fair. Michael Williams uses sentiment analysis to show that supervised machine learning has the potential to amplify the voices of the most privileged people in society, violate the spirit and letter of civil rights law, and make your product suck. Read more.
2:40pm–3:20pm Wednesday, 03/30/2016
Law, Ethics, Governance

Location: 211 A-C
Louis Suarez-Potts (Age of Peers, Inc.)
2015 saw an increased urgency in the ethics of big data, as the UN began to adopt civil-society partnerships with big data organizations. But what, if anything, are we supposed to do with the data we acquire, interpret, and label big data? Louis Suarez-Potts examines big data ethics to explain best practices for putting to use the information gained by big data methodology. Read more.
2:40pm–3:20pm Wednesday, 03/30/2016
Data-driven Business

Location: LL21 C/D
Jin Zhang (CA Technologies), Jerry Overton (DXC), Michele Chambers (Continuum Analytics)
Data has become a hot career choice, but some fear that a career in data is highly stressful or simply boring. Jin Zhang, Jerry Overton, and Michele Chambers give an overview of the field and its various specializations with the hope that this understanding will eliminate any fear and empower attendees to pursue a career in data. Read more.
2:40pm–3:20pm Wednesday, 03/30/2016
Sandy Ryza (Clover Health)
Want to build models over data every second from millions of sensors? Dig into the histories of millions of financial instruments? Sandy Ryza discusses the unique challenges of time series data and explains how to work with it at scale. Sandy then introduces the open source Spark-Timeseries library, which provides a natural way of munging, manipulating, and modeling time series data. Read more.
4:20pm–5:00pm Wednesday, 03/30/2016
Sponsored

Location: 210 B/F
Patrick Hall (SAS), Paul Kent (SAS)
Although it’s been around for decades, machine learning is currently thriving, and organizations are looking to benefit from it. Patrick Hall and Paul Kent offer 10 crucial tips to know before venturing into the mix—a personal survival guide from the creators of a solution that was there in the beginning and continues to drive the industry today. Read more.
4:20pm–5:00pm Wednesday, 03/30/2016
Law, Ethics, Governance

Location: 211 A-C
Tags: media, telecom
Jonathan King (Ericsson)
Jonathan King outlines ethical best practices for big data and explores the difficult questions emerging from missteps that have caused public outcry, as well as the legal, ethical, and regulatory frameworks that are just beginning to take shape around big data. Read more.
4:20pm–5:00pm Wednesday, 03/30/2016
Tags: real-time, ai
Alex Ingerman (Amazon Web Services)
Alex Ingerman explains how several AWS services, including Amazon Machine Learning, Amazon Kinesis, AWS Lambda, and Amazon Mechanical Turk, can be tied together to build a predictive application to power a real-time customer-service use case. Read more.
4:20pm–5:00pm Wednesday, 03/30/2016
Security

Location: LL21 B
Tags: real-time
Yinglian Xie (DataVisor)
Yinglian Xie describes the anatomy of modern online services, where large armies of malicious accounts hide among legitimate users and conduct a variety of attacks. Yinglian demonstrates how the Spark framework can facilitate early detection of these types of attacks by analyzing billions of user actions. Read more.
4:20pm–5:00pm Wednesday, 03/30/2016
Data-driven Business

Location: LL21 C/D
Andreas Schmidt (Blue Yonder)
While many companies struggle to adopt big data, a number of industry leaders are leapfrogging big data adoption by going straight to automating core business processes. Andreas Schmidt presents examples from leading European companies that have overcome cultural, technical, and scientific challenges and unlocked the potential of big data in an entirely different way. Read more.
5:10pm–5:50pm Wednesday, 03/30/2016
Moderated by:
Michael Dauber (Amplify Partners)
Panelists:
Yael Garten (LinkedIn), Monica Rogati (Data Natives), Daniel Tunkelang (Various)
We’ve all heard that rare breed the data scientist described as a unicorn. In building your DS team, should you hold out for that unicorn or create groups of specialists who can work together? Michael Dauber, Yael Garten, Monica Rogati, and Daniel Tunkelang discuss the pros and cons of various team models to help you decide what works best for your particular situation and organization. Read more.
11:00am–11:40am Thursday, 03/31/2016
Data Innovations

Location: LL21 E/F
Tags: media
Daniel Weeks (Netflix)
Netflix is exploring new avenues for data processing where traditional approaches fail to scale. Daniel Weeks explains how Netflix has enhanced its 25+ petabyte warehouse by combining Parquet's features with Presto and Spark to boost both ETL and interactive queries. Daniel explores how these approaches offer new ways to look at the relationship between storage and compute. Read more.
11:00am–11:40am Thursday, 03/31/2016
Enterprise Adoption

Location: LL21 B
Donald Miner (Miner & Kasch)
Figuring out Hadoop is daunting. However, understanding a set of basic yet important principles is all you need to cut through the hype and make intelligent enterprise decisions. Donald Miner breaks down modern Hadoop into 10 important principles you need to know to understand what Hadoop is and how it is different from the old way of doing things. Read more.
1:50pm–2:30pm Thursday, 03/31/2016
Marcel Kornacker (Cloudera), Alexander Behm (Cloudera)
Marcel Kornacker explains how to use nested data structures to increase analytic productivity. Marcel uses the well-known TPC-H schema to demonstrate how to simplify analytic workloads with nested schemas. Read more.
1:50pm–2:30pm Thursday, 03/31/2016
Silvia Oliveros (Silicon Valley Data Science), Stephen O'Sullivan (Data Whisperers)
You have your Hadoop cluster, and you are ready to fill it up with data. But wait! Which format should you use to store your data? Should you store it in plain text, SequenceFile, Avro, or Parquet? (And should you compress it?) Silvia Oliveros and Stephen O'Sullivan cover the hows, whys, and whens of choosing one format over another and take a closer look at some of the tradeoffs each offers. Read more.
1:50pm–2:30pm Thursday, 03/31/2016
Aneesh Karve (Quilt)
Seemingly harmless choices in visualization, design, and content selection can distort your data and lead to false conclusions. Aneesh Karve presents a framework for identifying and overcoming these distortions by drawing upon research in human perception, focus and context, and mobile design. Read more.
2:40pm–3:20pm Thursday, 03/31/2016
Tags: science
Siddha Ganju (NVIDIA)
Siddha Ganju explains how CERN uses machine-learning models to predict which datasets will become popular over time. This helps to replicate the datasets that are most heavily accessed, which improves the efficiency of physics analysis in CMS. Analyzing this data leads to useful information about the physical processes. Read more.
2:40pm–3:20pm Thursday, 03/31/2016
Enterprise Adoption

Location: LL20 D
Jacques Nadeau (Dremio)
There are (too?) many options for BI on Hadoop. Some are great at exploration, some are great at OLAP, some are fast, and some are flexible. Understanding the options and how they work with Hadoop systems is a key challenge for many organizations. Jacques Nadeau provides a survey of the main options, both traditional (Tableau, Qlik, etc.) and new (Platfora, Datameer, etc.). Read more.
2:40pm–3:20pm Thursday, 03/31/2016
Tags: travel
Bill Hinderman (Vaystays)
With more than 1.4 billion smartphones and at least half that many tablets in use, there is a tremendous need for responsive web design in the data-visualization sphere. Bill Hinderman explains the principles of responsive data visualization, which allows you to respond to screen conditions as well as data conditions. Read more.
4:20pm–5:00pm Thursday, 03/31/2016
Enterprise Adoption

Location: 230 A
Krishnan Venkata (LatentView Analytics), Jose Abelenda (Hotwire)
While organizations understand the importance of customer satisfaction, quantifying its impact on future engagement is a surprisingly hard analytical problem (most rely on Net Promoter Scores). Krishnan Venkata and Jose Abelenda explain how Hotwire used big data to put a dollar figure on promoter/detractor behavior to help the organization objectively prioritize customer-engagement initiatives. Read more.