Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Democratizing data within your organization

Mark Grover (Lyft), Deepak Tiwari (Lyft)

17:25–18:05 Wednesday, 23 May 2018

Data science and machine learning
Location: Capital Suite 14 Level: Intermediate

Average rating:

(3.83, 6 ratings)

Who is this presentation for?

Big data architects, managers, and developers

Prerequisite knowledge

Familiarity with big data architectures

What you'll learn

Learn best practices for how to make your organization more productive when it comes to using data

Description

Sure, you’ve got the best and fastest running SQL engine, but you’ve still got some problems: Users don’t know which tables exist or what they contain; sometimes bad things happen to your data, and you need to regenerate partitions but there is no tool to do so. Mark Grover and Deepak Tiwari explain how to make your team and your larger organization more productive when it comes to consuming data, focusing on five most popular issues related to data tools and productivity and sharing best practices for solving them.

Topics include:

Auditing: When you want to see who accessed what table, when they did so, and with what query, not just for compliance purposes but for capacity planning, resource management, and debugging issues
Query replaying: When you want to replay the queries that you captured in auditing because you are testing out an upgrade or a new system
Data discovery: When your users want to see which tables are commonly used, what the table and column descriptions are, who owns them, big users, etc.
Data backfilling: When your users want to regenerate a subset of the table due to an incident that rendered it bad
Custom file upload: When your users want to upload custom files (e.g., CSVs) they may have gotten from other sources into the data warehouse

Mark Grover

Lyft

Mark Grover is a product manager at Lyft. Mark’s a committer on Apache Bigtop, a committer and PPMC member on Apache Spot (incubating), and a committer and PMC member on Apache Sentry. He’s also contributed to a number of open source projects, including Apache Hadoop, Apache Hive, Apache Sqoop, and Apache Flume. He’s a coauthor of Hadoop Application Architectures and wrote a section in Programming Hive. Mark is a sought-after speaker on topics related to big data. He occasionally blogs on topics related to technology.

Deepak Tiwari

Lyft

Deepak Tiwari is the head of product management for data at Lyft, where he’s responsible for the company’s data vision as well as for building its data infrastructure, data platform, and data products. This includes Lyft’s streaming infrastructure for real-time decision making, geodata store and visualization, platform for machine learning, and core infrastructure for big data analytics. Previously, he was a product management leader at Google, where he worked on search, cloud, and technical infrastructure products. Deepak is passionate about building products that are driven by data, focus on user experience, and work at web scale. He holds an MBA from Northwestern’s Kellogg School of Management and a BT in engineering from the Indian Institute of Technology, Kharagpur.

Website

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com