Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Democratizing data within your organization

Mark Grover (Lyft), Deepak Tiwari (Lyft)
17:2518:05 Wednesday, 23 May 2018
Data science and machine learning
Location: Capital Suite 14 Level: Intermediate
Average rating: ***..
(3.83, 6 ratings)

Who is this presentation for?

  • Big data architects, managers, and developers

Prerequisite knowledge

  • Familiarity with big data architectures

What you'll learn

  • Learn best practices for how to make your organization more productive when it comes to using data


Sure, you’ve got the best and fastest running SQL engine, but you’ve still got some problems: Users don’t know which tables exist or what they contain; sometimes bad things happen to your data, and you need to regenerate partitions but there is no tool to do so. Mark Grover and Deepak Tiwari explain how to make your team and your larger organization more productive when it comes to consuming data, focusing on five most popular issues related to data tools and productivity and sharing best practices for solving them.

Topics include:

  • Auditing: When you want to see who accessed what table, when they did so, and with what query, not just for compliance purposes but for capacity planning, resource management, and debugging issues
  • Query replaying: When you want to replay the queries that you captured in auditing because you are testing out an upgrade or a new system
  • Data discovery: When your users want to see which tables are commonly used, what the table and column descriptions are, who owns them, big users, etc.
  • Data backfilling: When your users want to regenerate a subset of the table due to an incident that rendered it bad
  • Custom file upload: When your users want to upload custom files (e.g., CSVs) they may have gotten from other sources into the data warehouse
Photo of Mark Grover

Mark Grover


Mark Grover is a product manager at Lyft. Mark’s a committer on Apache Bigtop, a committer and PPMC member on Apache Spot (incubating), and a committer and PMC member on Apache Sentry. He’s also contributed to a number of open source projects, including Apache Hadoop, Apache Hive, Apache Sqoop, and Apache Flume. He’s a coauthor of Hadoop Application Architectures and wrote a section in Programming Hive. Mark is a sought-after speaker on topics related to big data. He occasionally blogs on topics related to technology.

Photo of Deepak Tiwari

Deepak Tiwari


Deepak Tiwari is the head of product management for data at Lyft, where he’s responsible for the company’s data vision as well as for building its data infrastructure, data platform, and data products. This includes Lyft’s streaming infrastructure for real-time decision making, geodata store and visualization, platform for machine learning, and core infrastructure for big data analytics. Previously, he was a product management leader at Google, where he worked on search, cloud, and technical infrastructure products. Deepak is passionate about building products that are driven by data, focus on user experience, and work at web scale. He holds an MBA from Northwestern’s Kellogg School of Management and a BT in engineering from the Indian Institute of Technology, Kharagpur.