Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Schedule: Security and Privacy sessions

Recent regulations in Europe (GDPR) and California (Consumer Privacy Act) have placed concepts like “user control” and “privacy-by-design” at the forefront for companies wanting to deploy ML. The good news is that there are new privacy-preserving tools and techniques – including differential privacy – that are becoming available for both business intelligence and ML applications.

  • Data security and privacy: A recent white paper from the Hoover Institution observed that we are beginning to see the convergence of data privacy and security. This is an age when companies are guarding against the misuse of data, either by adversaries or by parties they presently trust but may not longer do so in the future: “Anyone, from a privacy perspective, can become an adversary, given enough time.”
  • The use of data, analytics, and machine learning in security and cybersecurity.
  • Secure and robust analytics, including secure machine learning and aspects of machine deception (such as machines deceiving machines, or people deceiving machines).
Add to your personal schedule
9:00am12:30pm Tuesday, March 26, 2019
Iman Saleh (Intel), Cory Ilo (Intel), Cindy Tseng (Intel)
Average rating: *****
(5.00, 3 ratings)
From healthcare to smart home to autonomous vehicles, new applications of autonomous systems are raising ethical concerns about a host of issues, including bias, transparency, and privacy. Iman Saleh, Cory Ilo, and Cindy Tseng demonstrate tools and capabilities that can help data scientists address these concerns and bridge the gap between ethicists, regulators, and machine learning practitioners. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 26, 2019
Andrew Burt (bnh.ai), Steven Touw (Immuta), richard geering (Immuta), Joseph Regensburger (Immuta), Alfred Rossi (Immuta)
Average rating: *****
(5.00, 2 ratings)
As ML becomes increasingly important for businesses and data science teams alike, managing its risks is quickly becoming one of the biggest challenges to the technology’s widespread adoption. Join Andrew Bur, Steven Touw, Richard Geering, Joseph Regensburger, and Alfred Rossi for a hands-on overview of how to train, validate, and audit machine learning models (ML) in practice. Read more.
Add to your personal schedule
9:40am10:00am Wednesday, March 27, 2019
Location: Ballroom
David Sanger (The New York Times)
Average rating: ****.
(4.32, 50 ratings)
David Sanger explains how the rise of cyberweapons has transformed geopolitics like nothing since the invention of the atomic bomb. From crippling infrastructure to sowing discord and doubt, cyber is now the weapon of choice for democracies, dictators, and terrorists. Read more.
Add to your personal schedule
10:00am10:20am Wednesday, March 27, 2019
Location: Ballroom
Shafi Goldwasser (UC Berkeley | MIT | Weizmann Institute of Science | Duality)
Average rating: ***..
(3.41, 22 ratings)
Keynote with Shafi Goldwasser Read more.
Add to your personal schedule
11:00am11:40am Wednesday, March 27, 2019
Alon Kaufman (Duality), Vinod Vaikuntanathan (MIT and Duality Technologies)
Average rating: ***..
(3.75, 4 ratings)
Alon Kaufman and Vinod Vaikuntanathan discuss the challenges and opportunities of machine learning on encrypted data and describe the state of the art in this space. Read more.
Add to your personal schedule
11:00am11:40am Wednesday, March 27, 2019
Mike Olson (Cloudera)
Average rating: ***..
(3.80, 5 ratings)
It's easier than ever to collect data, but managing it securely in compliance with regulations and legal constraints is harder. Mike Olson discusses the risks and the issues that matter most and explains how an enterprise data cloud that embraces your data center and the public cloud in combination can address them, delivering real business results for your organization. Read more.
Add to your personal schedule
5:10pm5:50pm Wednesday, March 27, 2019
Mike Lee Williams (Cloudera Fast Forward Labs)
Average rating: ****.
(4.00, 1 rating)
Imagine building a model whose training data is collected on edge devices such as cell phones or sensors. Each device collects data unlike any other, and the data cannot leave the device because of privacy concerns or unreliable network access. This challenging situation is known as federated learning. Mike Lee Williams discusses the algorithmic solutions and the product opportunities. Read more.
Add to your personal schedule
10:10am10:25am Thursday, March 28, 2019
Location: Ballroom
Peter Singer (New America)
Average rating: ****.
(4.80, 20 ratings)
Terrorists live-stream their attacks, “Twitter wars” sell music albums and produce real-world casualties, and viral misinformation alters not just the result of battles but the very fate of nations. The result is that war, tech, and politics have blurred into a new kind of battle space that plays out on our smartphones. P. W. Singer explains. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 28, 2019
Thomas Phelan (HPE BlueData)
Average rating: ****.
(4.50, 2 ratings)
Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE). But TDE is difficult to configure and manage—particularly when run in Docker containers. Thomas Phelan discusses these challenges and explains how to overcome them. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 28, 2019
Ram Shankar Siva Kumar (Microsoft (Azure Security))
Average rating: ****.
(4.33, 3 ratings)
How can we guarantee that the ML system we develop is adequately protected from adversarial manipulation? Ram Shankar Kumar shares a framework and corresponding best practices to quantitatively assess the safety of your ML systems. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 28, 2019
Alex Ingerman (Google)
Average rating: ****.
(4.67, 12 ratings)
Federated learning is an approach for training ML models across a fleet of participating devices without collecting their data in a central location. Alex Ingerman offers an overview of federated learning, compares traditional and federated ML workflows, and explores the current and upcoming use cases for decentralized machine learning, with examples from Google's deployment of this technology. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 28, 2019
Fang Yu (DataVisor)
Average rating: ***..
(3.75, 4 ratings)
Online fraud flourishes as online services become ubiquitous in our daily life. Fang Yu explains how DataVisor leverages cutting-edge deep learning technologies to address the challenges in large-scale fraud detection. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 28, 2019
Nick Curcuru (Mastercard)
Average rating: ****.
(4.50, 2 ratings)
Data—in part, harvested personal data—brings industries unprecedented insights about customer behavior. We know more about our customers and neighbors than at any other time in history, but we need to avoid crossing the "creepy" line. Nick Curcuru discusses how ethical behavior drives trust, especially in today's IoT age. Read more.
Add to your personal schedule
11:50am12:30pm Thursday, March 28, 2019
David Rodriguez (Cisco Systems)
Average rating: ****.
(4.50, 2 ratings)
Malicious DNS traffic patterns are inconsistent and typically thwart anomaly detection. David Rodriguez explains how Cisco uses Apache Spark and Stripe’s Bayesian inference software, Rainier, to fit the underlying time series distribution for millions of domains and outlines techniques to identify artificial traffic volumes related to spam, malvertising, and botnets (masquerading traffic). Read more.
Add to your personal schedule
11:50am12:30pm Thursday, March 28, 2019
Roger Chen (Computable)
Average rating: **...
(2.00, 1 rating)
Data remains a linchpin of success for machine learning yet too often is a scarce resource. And even when data is available, trust issues arise about the quality and ethics of collection. Roger Chen explores new models for generating and governing training data for AI applications. Read more.
Add to your personal schedule
1:50pm2:30pm Thursday, March 28, 2019
Animesh Singh (IBM), Tommy Li (IBM)
Average rating: ****.
(4.50, 2 ratings)
Animesh Singh and Tommy Li explain how to implement state-of-the-art methods for attacking and defending classifiers using the open source Adversarial Robustness Toolbox. The library provides AI developers with interfaces that support the composition of comprehensive defense systems using individual methods as building blocks. Read more.
Add to your personal schedule
1:50pm2:30pm Thursday, March 28, 2019
Louis DiValentin (Accenture), Dillon Cullinan (Accenture)
Average rating: ***..
(3.00, 3 ratings)
Louis DiValentin and Dillon Cullinan explain how Accenture's Cyber Security Lab built security analytics models to detect attempted lateral movement in networks by transforming enterprise-scale security data into a graph format, generating graph analytics for individual users, and building time series detection models that visualize the changing graph metrics for security operators. Read more.
Add to your personal schedule
2:40pm3:20pm Thursday, March 28, 2019
Mark Donsky (Okera), Nikki Rouda (Amazon Web Services)
Average rating: ****.
(4.33, 3 ratings)
The implications of new privacy regulations for data management and analytics, such as the General Data Protection Regulation (GDPR) and the upcoming California Consumer Protection Act (CCPA), can seem complex. Mark Donsky and Nikki Rouda highlight aspects of the rules and outline the approaches that will assist with compliance. Read more.
Add to your personal schedule
2:40pm3:20pm Thursday, March 28, 2019
John Bennett (Netflix), Siamac Mirzaie (Netflix)
Average rating: ***..
(3.33, 3 ratings)
Data has become a foundational pillar for security teams operating in organizations of all shapes and sizes. This new norm has created a need for platforms that enable engineers to harness data for various security purposes. John Bennett and Siamac Mirzaie offer an overview of Netflix's internal platform for quickly deploying data-based detection capabilities in the corporate environment. Read more.
Add to your personal schedule
3:50pm4:30pm Thursday, March 28, 2019
Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)
Average rating: ****.
(4.00, 1 rating)
Knowledge of customers' location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Václav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management. Read more.
Add to your personal schedule
3:50pm4:30pm Thursday, March 28, 2019
J Delange (Twitter), N Lu (Twitter)
Average rating: **...
(2.67, 3 ratings)
Julien Delange and Neng Lu explain how Twitter uses the Heron stream processing engine to monitor and analyze its network infrastructure—implementing a new data pipeline that ingests multiple sources and processes about 1 billion tuples to detect network issues and generate usage statistics. Join in to learn the key technologies used, the architecture, and the challenges Twitter faced. Read more.
Add to your personal schedule
4:40pm5:20pm Thursday, March 28, 2019
Michael Gregory (Cloudera)
Average rating: ****.
(4.25, 4 ratings)
The General Data Protection Regulation (GDPR) enacted by the European Union restricts the use of machine learning practices in many cases. Michael Gregory offers an overview of the regulations, important considerations for both EU and non-EU organizations, and tools and technologies to ensure that you're appropriately using ML applications to drive continued transformation and insights. Read more.
Add to your personal schedule
4:40pm5:20pm Thursday, March 28, 2019
Ji Peng (Earnin )
Average rating: ****.
(4.50, 2 ratings)
As a customer-facing fintech company, Earnin has access to various types of valuable customer data, from bank transactions to GPS location. Ji Peng shares how Earnin uses unique datasets to build machine learning models and navigates the challenges of prioritizing and applying machine learning in the fintech domain. Read more.