Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Data preparation, governance and privacy sessions

Much of ML in use within companies falls under supervised learning, which means proper training data (or labeled examples) are essential. The rise of deep learning has made this even more pronounced, as many modern neural network architectures rely on large amounts of training data. Issues pertaining to data security, privacy and governance persist and are not necessarily unique to ML applications. But the hunger for large amounts of training data, the advent of new regulations like GDPR, and the importance of managing risk means a stronger emphasis on reproducibility and data lineage are very much needed.

9:00am–12:30pm Tuesday, 09/11/2018

Getting ready for GDPR: Securing and governing hybrid, cloud, and on-premises big data deployments, step by step

Location: 1E 11 Level: Intermediate

Mark Donsky (Okera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera), Ifigeneia Derekli (Cloudera), Camila Hiskey (Cloudera)

Average rating:

(4.50, 2 ratings)

New regulations such as GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads. Mark Donsky, Syed Rafice, Mubashir Kazia, Ifigeneia Derekli, and Camila Hiskey share hands-on best practices for meeting these challenges, with special attention paid to GDPR. Read more.

9:00am–5:00pm Tuesday, 09/11/2018

Findata Day

Location: 1A 08

Alistair Croll (Solve For Interesting), Robert Passarella (Alpha Features), Amro Alkhatib (National Health Insurance Company-Daman), Mridul Mishra (Fidelity Investments), Patrick Angeles (Cloudera), James Psota (Panjiva ), Andreas Kohlmaier (Munich Re), Paul Lashmet (Arcadia Data), Nick Curcuru (Mastercard), Robin Way (Corios), Theresa Johnson (Airbnb), Jane Tran (Unqork), Swatee Singh (American Express)

From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.

11:20am–12:00pm Wednesday, 09/12/2018

From data governance to AI governance: The CIO's new role

Location: 1E 10/11 Level: Non-technical

JF Gagne (Element AI)

Average rating:

(3.50, 4 ratings)

JF Gagne explains why the CIO is going to need a broader mandate in the company to better align their AI training and outcomes with business goals and compliance. This mandate should include an AI governance team that is well staffed and deeply established in the company, in order to catch biases that can develop from faulty goals or flawed data. Read more.

11:20am–12:00pm Wednesday, 09/12/2018

Executive Briefing: GDPR—Getting your data ready for heavy, new EU privacy regulations

Location: 1E 14 Level: Intermediate

Mark Donsky (Okera), Steven Ross (Cloudera)

In May 2018, the General Data Protection Regulation (GDPR) went into effect for firms doing business in the EU, but many companies still aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Mark Donsky and Steven Ross outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

A data marketplace case study with the blockchain and advanced multitenant Hadoop in a smart open data platform

Location: 1A 21/22 Level: Intermediate

Minh Chau Nguyen (ETRI), Heesun Won (ETRI)

Average rating:

(2.20, 5 ratings)

Minh Chau Nguyen and Heesun Won explain how to implement analytics services in data marketplace systems on a single Hadoop cluster across distributed data centers. The solution extends the overall architecture of the Hadoop ecosystem with the blockchain so that multiple tenants and authorized third parties can securely access data while still maintaining privacy, scalability, and reliability. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Privacy by design: Building in data privacy and protection versus bolting it on later

Location: 1E 12/13 Level: Advanced

Les McMonagle (BlueTalon)

Average rating:

(5.00, 2 ratings)

Privacy by design is a fundamentally important approach to achieving compliance with GDPR and other data privacy or data protection regulations. Les McMonagle outlines how organizations can save time and money while improving data security and regulatory compliance and dramatically reduce the risk of a data breach or expensive penalties for noncompliance. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Data governance: A big job that's getting bigger

Location: 1E 09 Level: Intermediate

Andrew Brust (Blue Badge Insights | ZDNet)

Average rating:

(4.50, 2 ratings)

Data governance has grown from a set of mostly data management-oriented technologies in the data warehouse era to encompass catalogs, glossaries, and more in the data lake era. Now new requirements are emerging, and new products are rising to meet the challenge. Andrew Brust tracks data governance's past and present and offers a glimpse of the future. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Beyond explainability: Regulating machine learning in practice

Location: 1E 12/13 Level: Non-technical

Andrew Burt (bnh.ai)

Average rating:

(5.00, 2 ratings)

Machine learning is becoming prevalent across industries, creating new types of risk. Managing this risk is quickly becoming the central challenge of major organizations, one that strains data science teams, legal personnel, and the C-suite alike. Andrew Burt shares lessons from past regulations focused on similar technology along with a proposal for new ways to manage risk in ML. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

Tracking data lineage at Stitch Fix

Location: 1A 23/24 Level: Intermediate

Neelesh Salian (Stitch Fix)

Average rating:

(1.33, 3 ratings)

Neelesh Srinivas Salian explains how Stitch Fix built a service to better understand the movement and evolution of data within the company's data warehouse, from the initial ingestion from outside sources through all of its ETLs. Neelesh covers why and how Stitch Fix built the service and details some use cases. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

Executive Briefing: Enhance your data lake with comprehensive data governance to improve adoption and meet compliance needs

Location: 1E 14 Level: Intermediate

Sanjeev Mohan (Gartner)

Average rating:

(5.00, 1 rating)

If the last few years were spent proving the value of data lakes, the emphasis now is to monetize the big data architecture investments. The rallying cry is to onboard new workloads efficiently. But how do you do so if you don’t know what data is in the lake, the level of its quality, or the trustworthiness of models? Sanjeev Mohan explains why data governance is the linchpin to success. Read more.

11:20am–12:00pm Thursday, 09/13/2018

Data discovery and lineage: Integrating streaming data in the public cloud with on-prem, classic data stores, and heterogeneous schema types

Location: 1E 09 Level: Advanced

Barbara Eckman (Comcast)

Average rating:

(4.33, 6 ratings)

Comcast’s streaming data platform comprises ingest, transformation, and storage services in the public cloud, with Apache Atlas for data discovery and lineage. Barbara Eckman explains how Comcast recently integrated on-prem data sources, including traditional data warehouses and RDBMSs, which required its data governance strategy to include relational and JSON schemas in addition to Apache Avro. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

Enacting Data Subject Access Rights for GDPR with data services and data management

Location: 1E 12/13 Level: Intermediate

Jean-Michel Franco (Talend)

Average rating:

(3.50, 2 ratings)

GDPR is more than another regulation to be handled by your back office. Enacting the GDPR's Data Subject Access Rights (DSAR) requires practical actions. Jean-Michel Franco outlines the practical steps to deploy governed data services. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

Scalable machine learning for data cleaning

Location: 1A 08 Level: Non-technical

Ihab Ilyas (University of Waterloo)

Average rating:

(5.00, 2 ratings)

Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas explains why leveraging data semantics and domain-specific knowledge is key in delivering the optimizations necessary for truly scalable ML curation solutions. Read more.

2:00pm–2:40pm Thursday, 09/13/2018

Let the machines learn to improve data quality

Location: 1A 08 Level: Intermediate

Archana Anandakrishnan (American Express)

Average rating:

(3.20, 5 ratings)

Building accurate machine learning models hinges on the quality of the data. Errors and anomalies get in the way of data scientists doing their best work. Archana Anandakrishnan explains how American Express created an automated, scalable system for measurement and management of data quality. The methods are modular and adaptable to any domain where accurate decisions from ML models are critical. Read more.

3:30pm–4:10pm Thursday, 09/13/2018

Balancing stakeholder interests in personal data governance technology

Location: 1E 12/13 Level: Intermediate

LaVonne Reimer, JD (Lumenous)

GDPR asks us to rethink personal data systems—viewing UI/UX, consent management, and value-add data services through the eyes of subjects of the data. LaVonne Reimer explains why the opportunity in the $150B credit and risk industry is to deploy data governance technologies that balance the interests of individuals to control their own data with requirements for trusted data. Read more.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com