Petabyte Scale, Automated Support for Remote Devices

Ron Bodkin (Google), Kumar Palaniappan (NetApp)

NetApp is a fast growing provider of storage technology. Its devices
“phone home” regularly, sending unstructured auto-support log and
configuration data back to centralized data centers. This data is used
to provide timely support, to improve sales, and to plan product
improvements. To allow this, data is collected, organized, and
analyzed. The system currently ingests 5 TB of compressed data per
week, which is growing 40% per year. NetApp was previously storing
flat files on disk volumes and keeping summary data in relational
databases. Today, NetApp is working with Accenture to design, build,
and implement the enterprise transformation project for next
generation auto-support, with Think Big Analytics as a partner and
expert in Big Data solutions. The new system uses Hadoop, HBase and
related technologies to ingest, organize, transform and present
auto-support data. This will enable business users to make decisions
and provide timely response, and will enable automated response based
on predictive models. Key requirements include:

  • Query data in seconds within 5 minutes of event occurrence.
  • Execute complex ad hoc queries to investigate issues and plan accordingly.
  • Build models to predict support issues and capacity limits to take
    action before issues arise.
  • Build models for cross-sale opportunities.
  • Expose data to applications through REST interfaces

In this session we look at the lessons learned while designing and
implementing a system to:

  • Collect 1000 messages of 20MB compressed per minute.
  • Store 2 PB of incoming support events by 2015.
  • Provide low latency access to support information and configuration
    changes in HBase at scale within 5 minutes of event arrival.
  • Support complex ad hoc queries that join multiple data sets
    accessing diverse structured and unstructured large scale data sets
  • Operate efficiently at scale.
  • Integrate with a data warehouse in Oracle.
Photo of Ron Bodkin

Ron Bodkin


Ron founded Think Big Analytics to help customers leverage new data processing technologies like Hadoop and NoSQL databases and R for statistical analysis. Works with customers to identify opportunities and rapidly develop solutions that integrate data and extract information.

Previously Ron was the VP of Engineering for Quantcast. Each day Quantcast ingests 10 billion events and processes two petabytes of data using Hadoop. The Quantcast map reduce stack handles production data processing, ad hoc analysis, data mining and machine learning. Prior to that Ron was a founder of enterprise consulting companies C-bridge and New Aspects.

Photo of Kumar Palaniappan

Kumar Palaniappan


Kumar Palaniappan is an Enterprise Architect at NetApp where he leads efforts at adopting Hadoop technologies for strategic applications. Previously Kumar was an architect at Cisco Systems, responsible for large scale, mission critical architectures.


  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata contacts