Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Ethics and Privacy sessions

Recent regulations in Europe (GDPR) and California (Consumer Privacy Act) have placed concepts like “user control” and “privacy-by-design” at the forefront for companies wanting to deploy ML. The good news is that there are new privacy-preserving tools and techniques – including differential privacy – that are becoming available for both business intelligence and ML applications.

Ethics and compliance are areas of interest to many in the data community. Beyond privacy, data professionals are much more engaged in topics such as fairness, transparency, and explainability in machine learning. Are data sets that are being used for model training representative of the broader population? For certain application domains and settings, transparency and interpretability are essential and regulators may require more transparent models, even at the expense of power and accuracy. More generally, how do companies mitigate risk when using ML?

9:00am–12:30pm Tuesday, 09/11/2018

Getting ready for GDPR: Securing and governing hybrid, cloud, and on-premises big data deployments, step by step

Location: 1E 11 Level: Intermediate

Mark Donsky (Okera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera), Ifigeneia Derekli (Cloudera), Camila Hiskey (Cloudera)

Average rating:

(4.50, 2 ratings)

New regulations such as GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads. Mark Donsky, Syed Rafice, Mubashir Kazia, Ifigeneia Derekli, and Camila Hiskey share hands-on best practices for meeting these challenges, with special attention paid to GDPR. Read more.

9:00am–12:30pm Tuesday, 09/11/2018

Practical techniques for interpreting machine learning models

Location: 1A 23/24 Level: Intermediate

Patrick Hall (bnh.ai | H2O.ai), Avni Wadhwa (H20.ai), Mark Chan (H2O.ai)

Average rating:

(4.50, 4 ratings)

Transparency, auditability, and stability are crucial for business adoption and human acceptance of complex machine learning models. Patrick Hall, Avni Wadhwa, and Mark Chan share practical and productizable approaches for explaining, testing, and visualizing machine learning models using open source, Python-friendly tools such as GraphViz, H2O, and XGBoost. Read more.

9:00am–5:00pm Tuesday, 09/11/2018

Findata Day

Location: 1A 08

Alistair Croll (Solve For Interesting), Robert Passarella (Alpha Features), Amro Alkhatib (National Health Insurance Company-Daman), Mridul Mishra (Fidelity Investments), Patrick Angeles (Cloudera), James Psota (Panjiva ), Andreas Kohlmaier (Munich Re), Paul Lashmet (Arcadia Data), Nick Curcuru (Mastercard), Robin Way (Corios), Theresa Johnson (Airbnb), Jane Tran (Unqork), Swatee Singh (American Express)

From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.

9:00am–5:00pm Tuesday, 09/11/2018

Data Case Studies

Location: 1E 10

Paco Nathan (derwen.ai), Katharina Warzel (EveryMundo), Mike Berger (Mount Sinai Health System), Sam Helmich (Deere & Company), Stephanie Fischer (datanizing GmbH), Maryam Jahanshahi (TapRecruit), Greg Quist (SmartCover Systems), Ann Nguyen (Whole Whale), Steve Otto (Navistar), Jennifer Lim (Cerner), S Anand (Gramener), Ian Brooks (Cloudera)

Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.

1:30pm–5:00pm Tuesday, 09/11/2018

How to be fair: A tutorial for beginners

Location: 1E 11 Level: Intermediate

Aileen Nielsen (Skillman Consulting)

Average rating:

(4.00, 4 ratings)

There is mounting evidence that the widespread deployment of machine learning and artificial intelligence in business and government applications is reproducing or even amplifying existing prejudices and social inequalities. Aileen Nielsen demonstrates how to identify and avoid bias and other unfairness in your analyses. Read more.

9:15am–9:25am Wednesday, 09/12/2018

Managing risk in machine learning

Location: 3E

Ben Lorica (O'Reilly)

Average rating:

(3.92, 13 ratings)

As companies begin adopting machine learning, important considerations, including fairness, transparency, privacy, and security, need to be accounted for. Ben Lorica offers an overview of recent tools for building privacy-preserving and secure machine learning products and services. Read more.

11:20am–12:00pm Wednesday, 09/12/2018

Executive Briefing: GDPR—Getting your data ready for heavy, new EU privacy regulations

Location: 1E 14 Level: Intermediate

Mark Donsky (Okera), Steven Ross (Cloudera)

In May 2018, the General Data Protection Regulation (GDPR) went into effect for firms doing business in the EU, but many companies still aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Mark Donsky and Steven Ross outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations. Read more.

11:20am–12:00pm Wednesday, 09/12/2018

Protecting sensitive data in huge datasets: Cloud tools you can use

Location: 1A 21/22 Level: Intermediate

Felipe Hoffa (Google), Damien Desfontaines (Google | ETH Zürich)

Average rating:

(4.00, 1 rating)

Before releasing a public dataset, practitioners need to thread the needle between utility and protection of individuals. Felipe Hoffa and Damien Desfontaines explore how to handle massive public datasets, taking you from theory to real life as they showcase newly available tools that help with PII detection and brings concepts like k-anonymity and l-diversity to the practical realm. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Privacy by design: Building in data privacy and protection versus bolting it on later

Location: 1E 12/13 Level: Advanced

Les McMonagle (BlueTalon)

Average rating:

(5.00, 2 ratings)

Privacy by design is a fundamentally important approach to achieving compliance with GDPR and other data privacy or data protection regulations. Les McMonagle outlines how organizations can save time and money while improving data security and regulatory compliance and dramatically reduce the risk of a data breach or expensive penalties for noncompliance. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

An ethical foundation for the AI-driven future

Location: 1E 12/13 Level: Beginner

Harry Glaser (Periscope Data)

Average rating:

(5.00, 2 ratings)

What is the moral responsibility of a data team today? As AI and machine learning technologies become part of our everyday life and as data becomes accessible to everyone, CDOs and data teams are taking on a very important moral role as the conscience of the corporation. Harry Glaser highlights the risks companies will face if they don't empower data teams to lead the way for ethical data use. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Solving the cold start problem: Data and model aggregation using differential privacy

Location: 1A 08 Level: Beginner

Chang Liu (Georgian Partners )

Average rating:

(5.00, 1 rating)

Chang Liu offers an overview of a common problem faced by many software companies, the cold-start problem, and explains how Georgian Partners has been successful at solving this problem by transferring knowledge from existing data through differentially private data aggregation. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Beyond explainability: Regulating machine learning in practice

Location: 1E 12/13 Level: Non-technical

Andrew Burt (bnh.ai)

Average rating:

(5.00, 2 ratings)

Machine learning is becoming prevalent across industries, creating new types of risk. Managing this risk is quickly becoming the central challenge of major organizations, one that strains data science teams, legal personnel, and the C-suite alike. Andrew Burt shares lessons from past regulations focused on similar technology along with a proposal for new ways to manage risk in ML. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Perverse incentives in metrics: Inequality in the like economy

Location: 1A 06/07 Level: Intermediate

Bonnie Barrilleaux (LinkedIn)

Average rating:

(4.50, 4 ratings)

As LinkedIn encouraged members to join conversations, it found itself in danger of creating a "rich get richer" economy in which a few creators got an increasing share of all feedback. Bonnie Barrilleaux explains why you must regularly reevaluate metrics to avoid perverse incentives—situations where efforts to increase the metric cause unintended negative side effects. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

Rationalizing risk in AI and ML

Location: 1E 12/13 Level: Non-technical

Kimberly Nevala (SAS)

Average rating:

(5.00, 1 rating)

Too often, the discussion of AI and ML includes an expectation—if not a requirement—for infallibility. But as we know, this expectation is not realistic. So what’s a company to do? While risk can’t be eliminated, it can be rationalized. Kimberly Nevala demonstrates how an unflinching risk assessment enables AI/ML adoption and deployment. Read more.

10:05am–10:20am Thursday, 09/13/2018

Brain-based human-machine interfaces: New developments, legal and ethical issues, and potential uses

Location: 3E

Amanda Pustilnik (University of Maryland School of Law | Center for Law, Brain & Behavior, Mass. General Hospital)

Average rating:

(4.50, 12 ratings)

Have you ever dreamed you could read minds? Do telekinesis? Maybe fly a magic carpet by thought alone? Until now, these powers have existed only in the realm of imagination or, more recently, video, AR, and VR games. Join Amanda Pustilnik to learn how brain-based human-machine interfaces are beginning to offer these powers in near-commercially-viable forms. Read more.

10:25am–10:45am Thursday, 09/13/2018

Black box: How AI will amplify the best and worst of humanity

Location: 3E

Jacob Ward (CNN | Al Jazeera | PBS)

Average rating:

(4.73, 15 ratings)

For most of us, our own mind is a black box—an all-powerful and utterly mysterious device that runs our lives for us, using rules and shortcuts of which we aren’t even aware. Jacob Ward reveals the relationship between the unconscious habits of our minds and the way that AI is poised to amplify them, alter them, maybe even reprogram them. Read more.

11:20am–12:00pm Thursday, 09/13/2018

Data and privacy at scale at Wikipedia

Location: 1E 12/13 Level: Beginner

Nuria Ruiz (Wikimedia)

The Wikipedia community feels strongly that you shouldn’t have to provide personal information to participate in the free knowledge movement. Nuria Ruiz discusses the challenges that this strong privacy stance poses for the Wikimedia Foundation, including how it affects data collection, and details some creative workarounds that allow WMF to calculate metrics in a privacy-conscious way. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

Enacting Data Subject Access Rights for GDPR with data services and data management

Location: 1E 12/13 Level: Intermediate

Jean-Michel Franco (Talend)

Average rating:

(3.50, 2 ratings)

GDPR is more than another regulation to be handled by your back office. Enacting the GDPR's Data Subject Access Rights (DSAR) requires practical actions. Jean-Michel Franco outlines the practical steps to deploy governed data services. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

Augmented reality: Going beyond plots in 3D

Location: 1A 12/14 Level: Beginner

Bob Levy (Virtual Cove, Inc.)

Average rating:

(3.00, 1 rating)

Augmented reality opens a completely new lens on your data through which you see and accomplish amazing things. Bob Levy explains how to use simple Python scripts to leverage completely new plot types. You'll explore use cases revealing new insight into financial markets data as well as new ways of interacting with data that build trust in otherwise “black box” machine learning solutions. Read more.

3:30pm–4:10pm Thursday, 09/13/2018

Balancing stakeholder interests in personal data governance technology

Location: 1E 12/13 Level: Intermediate

LaVonne Reimer, JD (Lumenous)

GDPR asks us to rethink personal data systems—viewing UI/UX, consent management, and value-add data services through the eyes of subjects of the data. LaVonne Reimer explains why the opportunity in the $150B credit and risk industry is to deploy data governance technologies that balance the interests of individuals to control their own data with requirements for trusted data. Read more.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com