Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY

Hadoop Effortlessly: A Data Inventory is Key to Data Self-service

Moderated by:
Alex Gorelik (Waterline Data)
Suresh Srinivas (Hortonworks), Mike Sutten (Kaiser Permanente), John Mount (Win-Vector LLC), Clark Farrey (Capital One), Sunil Soares (Information Asset)
1:45pm–2:25pm Thursday, 10/16/2014
Location: 1 E05
Average rating: ***..
(3.33, 3 ratings)

Companies are deploying Hadoop “data lakes” to provide unprecedented access to data for data science and analytics. However, the advantages of frictionless ingest, flexible schema on read, and lack of data governance, turn into increasingly insurmountable challenges to enable true data self-service, and create a barrier to the enterprise adoption of Hadoop.

Join the panel of industry experts who will share their experiences on how to solve these challenges. Panelists will explain why and how to build a successful data inventory that enables true data self-service; discuss best practices for managing and governing Hadoop data; and suggest how to shape the right use cases to maximize the business ROI of Hadoop.

This session is sponsored by Waterline Data Science (booth 553)

Photo of Alex Gorelik

Alex Gorelik

Waterline Data

Alex Gorelik is the founder and CEO of Waterline Data, a startup focused on enhancing the value of Hadoop through data self-service and governance. Alex is a serial entrepreneur and innovator who has spent over 25 years inventing and bringing to market cutting-edge data-oriented technology.

Prior to Waterline, Alex was an EIR at Menlo Ventures. He joined Menlo from Informatica, where he held several executive roles, including GM of Informatica’s Data Quality Business Unit—driving marketing, product management, and R&D for an $80M business—and SVP of R&D for Core Technology—driving innovation in big data and social media while managing a team of 400 engineers and product managers developing Informatica’s platform and data-integration technology. Alex joined Informatica from IBM, where he was an IBM distinguished engineer for the Information Integration team.

He is a former founder, CTO, and VP of engineering at Exeros (acquired by IBM in 2009). Earlier, Alex was cofounder, CTO, and VP of engineering at Acta Technology (acquired by Business Objects in 2002 and marketed as Business Objects Data Services). Prior to Acta, Alex managed development of the replication server at Sybase and worked on Sybase’s strategy for enterprise application integration. Earlier, he developed the database kernel at Amdahl’s Design Automation group. Alex holds a BS in computer science from Columbia University’s School of Engineering and a master’s degree in computer science from Stanford University.

Photo of Suresh Srinivas

Suresh Srinivas


Suresh is an Apache Hadoop committer and member of Apache Hadoop Project Management Committee (PMC). He is a long term active contributor to the Apache Hadoop project and has designed and developed many significant features for Hadoop. He also regularly contributes to related projects such as Apache Falcon and Apache Storm. At Hortonworks, he leads Hadoop many initiatives related to Storage and Data Management. Prior to co-founding Hortonworks, he served as a software architect at Yahoo! working on Apache Hadoop, where he developed features and supported some of the largest installations of Hadoop clusters.

Photo of Mike Sutten

Mike Sutten

Kaiser Permanente

Mike Sutten joined Kaiser Permanente in 2013 as Chief Technology Officer (CTO) and Senior Vice President. Under his leadership, the CTO organization is focused on setting the future direction for technology across Kaiser Permanente. This includes defining technology and platform standards, and leading initiatives on analytics, cloud capabilities, data storage, and mobile technologies.

Mike has more than 20 years of CTO and CIO leadership experience with Fortune 500 organizations, including Royal Caribbean Cruises, Koch Industries, Sybase and General Electric. Most recently, he served as CTO and deputy CIO within the U.S. government, where he was recognized for exemplary leadership and service. Throughout his career, he has driven technology innovation by architecting modernized, profitable enterprise-wide systems, ensuring compliance for data integration, and reducing IT costs through the standardization and application of tools.

Mike holds an MBA in information systems from the University of San Diego and a Bachelor of Science degree in engineering from Iowa State University. He is also a certified ISO 9000 Process Quality Auditor, merchant marine reserve captain and private pilot.

Photo of John Mount

John Mount

Win-Vector LLC

John Mount is a principal consultant at Win-Vector LLC, a San Francisco data science consultancy. John has worked as a computational scientist in biotechnology and a stock-trading algorithm designer and has managed a research team for (now an eBay company). He is the coauthor of Practical Data Science with R (Manning Publications, 2014). John started his advanced education in mathematics at UC Berkeley and holds a PhD in computer science from Carnegie Mellon (specializing in the design and analysis of randomized algorithms). He currently blogs about technical issues at the Win-Vector blog, tweets at @WinVectorLLC, and is active in the Rotary. Please contact for projects and collaborations.

Clark Farrey

Capital One

Photo of Sunil Soares

Sunil Soares

Information Asset

Sunil Soares is the Founder & Managing Partner of Information Asset, LLC. Prior to this role, Sunil was the Director of Information Governance at IBM. He is the author of four books including The IBM Data Governance Unified Process, Selling Information Governance to the Business, Big Data Governance, and IBM InfoSphere – A Platform for Big Data Governance and Process Data Governance. His fifth book, Data Governance Tools, will be published in the fall of 2014.

Comments on this page are now closed.


Picture of John Mount
John Mount
08/18/2014 12:24pm EDT

Really looking forward to speaking on this topic (and learning from the rest of the panel and audience). Developers use the “technical debt” to describe the accumulation of future cost due to incomplete design, documentation and work. The point is: data science projects usually start out deep in debt due to a lack of effective schema documentation. A data inventory is a good tool for working through this.