Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Schedule: Security and Privacy sessions

Recent regulations in Europe (GDPR) and California (Consumer Privacy Act) have placed concepts like “user control” and “privacy-by-design” at the forefront for companies wanting to deploy ML. The good news is that there are new privacy-preserving tools and techniques – including differential privacy – that are becoming available for both business intelligence and ML applications.

Data security and privacy: A recent white paper from the Hoover Institution observed that we are beginning to see the convergence of data privacy and security. This is an age when companies are guarding against the misuse of data, either by adversaries or by parties they presently trust but may not longer do so in the future: “Anyone, from a privacy perspective, can become an adversary, given enough time.”

The use of data, analytics, and machine learning in security and cybersecurity.

Privacy-preserving analytics.

Secure and robust analytics, including secure machine learning and aspects of machine deception (such as machines deceiving machines, or people deceiving machines).

9:00am–12:30pm Tuesday, March 26, 2019

AI privacy and ethical compliance toolkit

Data Science, Machine Learning & AI
Location: 2001

Iman Saleh (Intel), Cory Ilo (Intel), Cindy Tseng (Intel)

Average rating:

(5.00, 3 ratings)

From healthcare to smart home to autonomous vehicles, new applications of autonomous systems are raising ethical concerns about a host of issues, including bias, transparency, and privacy. Iman Saleh, Cory Ilo, and Cindy Tseng demonstrate tools and capabilities that can help data scientists address these concerns and bridge the gap between ethicists, regulators, and machine learning practitioners. Read more.

1:30pm–5:00pm Tuesday, March 26, 2019

Successfully deploy machine learning while managing its risks

Executive Briefing and best practices, Strata Business Summit
Location: 2003

Andrew Burt (bnh.ai), Steven Touw (Immuta), richard geering (Immuta), Joseph Regensburger (Immuta), Alfred Rossi (Immuta)

Average rating:

(5.00, 2 ratings)

As ML becomes increasingly important for businesses and data science teams alike, managing its risks is quickly becoming one of the biggest challenges to the technology’s widespread adoption. Join Andrew Bur, Steven Touw, Richard Geering, Joseph Regensburger, and Alfred Rossi for a hands-on overview of how to train, validate, and audit machine learning models (ML) in practice. Read more.

9:40am–10:00am Wednesday, March 27, 2019

Cyberconflict: A new era of war, sabotage, and fear

Location: Ballroom

David Sanger (The New York Times)

Average rating:

(4.32, 50 ratings)

David Sanger explains how the rise of cyberweapons has transformed geopolitics like nothing since the invention of the atomic bomb. From crippling infrastructure to sowing discord and doubt, cyber is now the weapon of choice for democracies, dictators, and terrorists. Read more.

10:00am–10:20am Wednesday, March 27, 2019

AI and cryptography: Challenges and opportunities

Location: Ballroom

Shafi Goldwasser (UC Berkeley | MIT | Weizmann Institute of Science | Duality)

Average rating:

(3.41, 22 ratings)

Keynote with Shafi Goldwasser Read more.

11:00am–11:40am Wednesday, March 27, 2019

Machine learning on encrypted data: Challenges and opportunities

Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall

Alon Kaufman (Duality), Vinod Vaikuntanathan (MIT and Duality Technologies)

Average rating:

(3.75, 4 ratings)

Alon Kaufman and Vinod Vaikuntanathan discuss the challenges and opportunities of machine learning on encrypted data and describe the state of the art in this space. Read more.

11:00am–11:40am Wednesday, March 27, 2019

Executive Briefing: From the edge to AI—Taking control of your data for fun and profit

Executive Briefing and best practices, Strata Business Summit
Location: 2020

Mike Olson (Cloudera)

Average rating:

(3.80, 5 ratings)

It's easier than ever to collect data, but managing it securely in compliance with regulations and legal constraints is harder. Mike Olson discusses the risks and the issues that matter most and explains how an enterprise data cloud that embraces your data center and the public cloud in combination can address them, delivering real business results for your organization. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

Federated learning

Data Science, Machine Learning & AI
Location: 2010

Mike Lee Williams (Cloudera Fast Forward Labs)

Average rating:

(4.00, 1 rating)

Imagine building a model whose training data is collected on edge devices such as cell phones or sensors. Each device collects data unlike any other, and the data cannot leave the device because of privacy concerns or unreliable network access. This challenging situation is known as federated learning. Mike Lee Williams discusses the algorithmic solutions and the product opportunities. Read more.

10:10am–10:25am Thursday, March 28, 2019

Likewar: How social media is changing the world…and how the world is changing social media

Location: Ballroom

Peter Singer (New America)

Average rating:

(4.80, 20 ratings)

Terrorists live-stream their attacks, “Twitter wars” sell music albums and produce real-world casualties, and viral misinformation alters not just the result of battles but the very fate of nations. The result is that war, tech, and politics have blurred into a new kind of battle space that plays out on our smartphones. P. W. Singer explains. Read more.

11:00am–11:40am Thursday, March 28, 2019

How to protect big data in a containerized environment

Data Engineering & Architecture
Location: 2024

Thomas Phelan (HPE BlueData)

Average rating:

(4.50, 2 ratings)

Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE). But TDE is difficult to configure and manage—particularly when run in Docker containers. Thomas Phelan discusses these challenges and explains how to overcome them. Read more.

11:00am–11:40am Thursday, March 28, 2019

Framework to quantitatively assess ML safety: Technical implementation and best practices

Data Science, Machine Learning & AI
Location: 2010

Ram Shankar Siva Kumar (Microsoft (Azure Security))

Average rating:

(4.33, 3 ratings)

How can we guarantee that the ML system we develop is adequately protected from adversarial manipulation? Ram Shankar Kumar shares a framework and corresponding best practices to quantitatively assess the safety of your ML systems. Read more.

11:00am–11:40am Thursday, March 28, 2019

The future of machine learning is decentralized

Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall

Alex Ingerman (Google)

Average rating:

(4.67, 12 ratings)

Federated learning is an approach for training ML models across a fleet of participating devices without collecting their data in a central location. Alex Ingerman offers an overview of federated learning, compares traditional and federated ML workflows, and explores the current and upcoming use cases for decentralized machine learning, with examples from Google's deployment of this technology. Read more.

11:00am–11:40am Thursday, March 28, 2019

Detecting coordinated fraud attacks using deep learning

Data Science, Machine Learning & AI
Location: 2016

Fang Yu (DataVisor)

Average rating:

(3.75, 4 ratings)

Online fraud flourishes as online services become ubiquitous in our daily life. Fang Yu explains how DataVisor leverages cutting-edge deep learning technologies to address the challenges in large-scale fraud detection. Read more.

11:00am–11:40am Thursday, March 28, 2019

Executive Briefing: Forcing the legal and ethical hands of companies that collect, use, and analyze data

Law and Ethics, Strata Business Summit
Location: 2020

Nick Curcuru (Mastercard)

Average rating:

(4.50, 2 ratings)

Data—in part, harvested personal data—brings industries unprecedented insights about customer behavior. We know more about our customers and neighbors than at any other time in history, but we need to avoid crossing the "creepy" line. Nick Curcuru discusses how ethical behavior drives trust, especially in today's IoT age. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Masquerading malicious DNS traffic

Data Science, Machine Learning & AI
Location: 2010

David Rodriguez (Cisco Systems)

Average rating:

(4.50, 2 ratings)

Malicious DNS traffic patterns are inconsistent and typically thwart anomaly detection. David Rodriguez explains how Cisco uses Apache Spark and Stripe’s Bayesian inference software, Rainier, to fit the underlying time series distribution for millions of domains and outlines techniques to identify artificial traffic volumes related to spam, malvertising, and botnets (masquerading traffic). Read more.

11:50am–12:30pm Thursday, March 28, 2019

Decentralized governance of data

Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall

Roger Chen (Computable)

Average rating:

(2.00, 1 rating)

Data remains a linchpin of success for machine learning yet too often is a scarce resource. And even when data is available, trust issues arise about the quality and ethics of collection. Roger Chen explores new models for generating and governing training data for AI applications. Read more.

1:50pm–2:30pm Thursday, March 28, 2019

Use the Jupyter Notebook to integrate adversarial attacks into a model training pipeline to detect vulnerabilities

Data Science, Machine Learning & AI
Location: 2009

Animesh Singh (IBM), Tommy Li (IBM)

Average rating:

(4.50, 2 ratings)

Animesh Singh and Tommy Li explain how to implement state-of-the-art methods for attacking and defending classifiers using the open source Adversarial Robustness Toolbox. The library provides AI developers with interfaces that support the composition of comprehensive defense systems using individual methods as building blocks. Read more.

1:50pm–2:30pm Thursday, March 28, 2019

Using graph metrics to detect lateral movement in enterprise cybersecurity data

Data Science, Machine Learning & AI
Location: 2010

Louis DiValentin (Accenture), Dillon Cullinan (Accenture)

Average rating:

(3.00, 3 ratings)

Louis DiValentin and Dillon Cullinan explain how Accenture's Cyber Security Lab built security analytics models to detect attempted lateral movement in networks by transforming enterprise-scale security data into a graph format, generating graph analytics for individual users, and building time series detection models that visualize the changing graph metrics for security operators. Read more.

2:40pm–3:20pm Thursday, March 28, 2019

Executive Briefing: Big data in the era of heavy worldwide privacy regulations

Executive Briefing and best practices, Strata Business Summit
Location: 2020

Mark Donsky (Okera), Nikki Rouda (Amazon Web Services)

Average rating:

(4.33, 3 ratings)

The implications of new privacy regulations for data management and analytics, such as the General Data Protection Regulation (GDPR) and the upcoming California Consumer Protection Act (CCPA), can seem complex. Mark Donsky and Nikki Rouda highlight aspects of the rules and outline the approaches that will assist with compliance. Read more.

2:40pm–3:20pm Thursday, March 28, 2019

Building and scaling a security detection platform: A Netflix Original

Data Engineering & Architecture
Location: 2024

John Bennett (Netflix), Siamac Mirzaie (Netflix)

Average rating:

(3.33, 3 ratings)

Data has become a foundational pillar for security teams operating in organizations of all shapes and sizes. This new norm has created a need for platforms that enable engineers to harness data for various security purposes. John Bennett and Siamac Mirzaie offer an overview of Netflix's internal platform for quickly deploying data-based detection capabilities in the corporate environment. Read more.

3:50pm–4:30pm Thursday, March 28, 2019

Data science at Deutsche Telekom: Predicting global travel patterns and network demand

Data Engineering & Architecture
Location: 2006

Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)

Average rating:

(4.00, 1 rating)

Knowledge of customers' location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Václav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management. Read more.

3:50pm–4:30pm Thursday, March 28, 2019

Real-time monitoring of Twitter's network infrastructure with Heron

Data Engineering & Architecture
Location: 2024

J Delange (Twitter), N Lu (Twitter)

Average rating:

(2.67, 3 ratings)

Julien Delange and Neng Lu explain how Twitter uses the Heron stream processing engine to monitor and analyze its network infrastructure—implementing a new data pipeline that ingests multiple sources and processes about 1 billion tuples to detect network issues and generate usage statistics. Join in to learn the key technologies used, the architecture, and the challenges Twitter faced. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Machine learning and GDPR

Data Science, Machine Learning & AI, Law and Ethics
Location: 2011

Michael Gregory (Cloudera)

Average rating:

(4.25, 4 ratings)

The General Data Protection Regulation (GDPR) enacted by the European Union restricts the use of machine learning practices in many cases. Michael Gregory offers an overview of the regulations, important considerations for both EU and non-EU organizations, and tools and technologies to ensure that you're appropriately using ML applications to drive continued transformation and insights. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Applying machine learning in fintech startups: Modeling with sensitive customer datasets

Data Science, Machine Learning & AI
Location: 2004

Ji Peng (Earnin )

Average rating:

(4.50, 2 ratings)

As a customer-facing fintech company, Earnin has access to various types of valuable customer data, from bank transactions to GPS location. Ji Peng shares how Earnin uses unique datasets to build machine learning models and navigates the challenges of prioritizing and applying machine learning in the fintech domain. Read more.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com