Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Data preparation, governance and privacy sessions

Much of ML in use within companies falls under supervised learning, which means proper training data (or labeled examples) are essential. The rise of deep learning has made this even more pronounced, as many modern neural network architectures rely on large amounts of training data. Issues pertaining to data security, privacy and governance persist and are not necessarily unique to ML applications. But the hunger for large amounts of training data, the advent of new regulations like GDPR, and the importance of managing risk means a stronger emphasis on reproducibility and data lineage are very much needed.

9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 11 Level: Intermediate
Mark Donsky (Okera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera), Ifigeneia Derekli (Cloudera), Camila Hiskey (Cloudera)
Average rating: ****.
(4.50, 2 ratings)
New regulations such as GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads. Mark Donsky, Syed Rafice, Mubashir Kazia, Ifigeneia Derekli, and Camila Hiskey share hands-on best practices for meeting these challenges, with special attention paid to GDPR. Read more.
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1A 08
Alistair Croll (Solve For Interesting), Robert Passarella (Alpha Features), Amro Alkhatib (National Health Insurance Company-Daman), Mridul Mishra (Fidelity Investments), Patrick Angeles (Cloudera), James Psota (Panjiva ), Andreas Kohlmaier (Munich Re), Paul Lashmet (Arcadia Data), Nick Curcuru (Mastercard), Robin Way (Corios), Theresa Johnson (Airbnb), Jane Tran (Unqork), Swatee Singh (American Express)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1E 10/11 Level: Non-technical
JF Gagne (Element AI)
Average rating: ***..
(3.50, 4 ratings)
JF Gagne explains why the CIO is going to need a broader mandate in the company to better align their AI training and outcomes with business goals and compliance. This mandate should include an AI governance team that is well staffed and deeply established in the company, in order to catch biases that can develop from faulty goals or flawed data. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1E 14 Level: Intermediate
Mark Donsky (Okera), Steven Ross (Cloudera)
In May 2018, the General Data Protection Regulation (GDPR) went into effect for firms doing business in the EU, but many companies still aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Mark Donsky and Steven Ross outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 21/22 Level: Intermediate
Minh Chau Nguyen (ETRI), Heesun Won (ETRI)
Average rating: **...
(2.20, 5 ratings)
Minh Chau Nguyen and Heesun Won explain how to implement analytics services in data marketplace systems on a single Hadoop cluster across distributed data centers. The solution extends the overall architecture of the Hadoop ecosystem with the blockchain so that multiple tenants and authorized third parties can securely access data while still maintaining privacy, scalability, and reliability. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 12/13 Level: Advanced
Les McMonagle (BlueTalon)
Average rating: *****
(5.00, 2 ratings)
Privacy by design is a fundamentally important approach to achieving compliance with GDPR and other data privacy or data protection regulations. Les McMonagle outlines how organizations can save time and money while improving data security and regulatory compliance and dramatically reduce the risk of a data breach or expensive penalties for noncompliance. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1E 09 Level: Intermediate
Andrew Brust (Blue Badge Insights | ZDNet)
Average rating: ****.
(4.50, 2 ratings)
Data governance has grown from a set of mostly data management-oriented technologies in the data warehouse era to encompass catalogs, glossaries, and more in the data lake era. Now new requirements are emerging, and new products are rising to meet the challenge. Andrew Brust tracks data governance's past and present and offers a glimpse of the future. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1E 12/13 Level: Non-technical
Andrew Burt (Immuta)
Average rating: *****
(5.00, 2 ratings)
Machine learning is becoming prevalent across industries, creating new types of risk. Managing this risk is quickly becoming the central challenge of major organizations, one that strains data science teams, legal personnel, and the C-suite alike. Andrew Burt shares lessons from past regulations focused on similar technology along with a proposal for new ways to manage risk in ML. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 23/24 Level: Intermediate
Neelesh Salian (Stitch Fix)
Average rating: *....
(1.33, 3 ratings)
Neelesh Srinivas Salian explains how Stitch Fix built a service to better understand the movement and evolution of data within the company's data warehouse, from the initial ingestion from outside sources through all of its ETLs. Neelesh covers why and how Stitch Fix built the service and details some use cases. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 14 Level: Intermediate
Sanjeev Mohan (Gartner)
Average rating: *****
(5.00, 1 rating)
If the last few years were spent proving the value of data lakes, the emphasis now is to monetize the big data architecture investments. The rallying cry is to onboard new workloads efficiently. But how do you do so if you don’t know what data is in the lake, the level of its quality, or the trustworthiness of models? Sanjeev Mohan explains why data governance is the linchpin to success. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 09 Level: Advanced
Barbara Eckman (Comcast)
Average rating: ****.
(4.33, 6 ratings)
Comcast’s streaming data platform comprises ingest, transformation, and storage services in the public cloud, with Apache Atlas for data discovery and lineage. Barbara Eckman explains how Comcast recently integrated on-prem data sources, including traditional data warehouses and RDBMSs, which required its data governance strategy to include relational and JSON schemas in addition to Apache Avro. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1E 12/13 Level: Intermediate
Average rating: ***..
(3.50, 2 ratings)
GDPR is more than another regulation to be handled by your back office. Enacting the GDPR's Data Subject Access Rights (DSAR) requires practical actions. Jean-Michel Franco outlines the practical steps to deploy governed data services. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 08 Level: Non-technical
Ihab Ilyas (University of Waterloo)
Average rating: *****
(5.00, 2 ratings)
Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas explains why leveraging data semantics and domain-specific knowledge is key in delivering the optimizations necessary for truly scalable ML curation solutions. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 08 Level: Intermediate
Archana Anandakrishnan (American Express)
Average rating: ***..
(3.20, 5 ratings)
Building accurate machine learning models hinges on the quality of the data. Errors and anomalies get in the way of data scientists doing their best work. Archana Anandakrishnan explains how American Express created an automated, scalable system for measurement and management of data quality. The methods are modular and adaptable to any domain where accurate decisions from ML models are critical. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1E 12/13 Level: Intermediate
LaVonne Reimer, JD (Lumenous)
GDPR asks us to rethink personal data systems—viewing UI/UX, consent management, and value-add data services through the eyes of subjects of the data. LaVonne Reimer explains why the opportunity in the $150B credit and risk industry is to deploy data governance technologies that balance the interests of individuals to control their own data with requirements for trusted data. Read more.