BigData / NoSQL Operations Workshop: How to Scale Dirty and Influence People

Philip (Flip) Kromer (CSC), Dennis Yang (Infochimps)
Operations Mission City
Please note: to attend, your registration must include Workshops.
Average rating: *....
(1.96, 25 ratings)

Ignore Best Practices and Scale on a Shoestring with Flume, Chef and
the Cloud Chimpanzee Superhero Toolbelt

Chef and Flume are gaining major momentum in cloud architectures.
There’s all the reasons you think you should adopt: Chef reduces ops
complexity and lets you share recipes; Flume centralizes and scales
your log aggregation. We’ve adopted each within a system architecture
that includes a 20-billion row HBase cluster, 4-billion document
ElasticSearch, a distributed robot army of scrapers backed by
Cassandra, and transient Hadoop clusters for offline processing.

I’ll talk about four aspects of our stack — two that draw on Chef and
two that draw on Flume. In each, I’ll explain the technical details
of our solution, and then a specific way it let us rethink and speed
up our development process.

Our open-source ClusterChef toolkit includes recipes for our core
stack, along with a wonderfully compact way of describing cloud
systems that make the cluster (and not the program or machine) the
fundamental unit of deploy. An increasing portion of our data (not
just logs) flows among these systems via Flume. For example, our
scraper bots dead-drop their data to Flume, with zero responsibility
for where that data should go or what anyone else might do with it.
Flume ensures reliable transport across all the decorators that
interact with the data fanout into the data store or other
destinations, and all the hairy complexity of plumbing data at scale.

We’ve found that the real value these tools provide is to radically
decouple and insulate systems: “Powerful Black Boxes with Beautiful
Glue”. The more you federate, the simpler the boxes become. Instead of
a large team that constantly scrums, you talk to the teammate managing
the other side of a strong, well-defined interface. Repos are small,
which means no checkin collisions, no branches, and simple systems
that an intern can get hacking on immediately. Systems are decoupled
but delivery is robust, so they can live or die on their own: you can
abandon unit tests, deploy to trunk, expect failures and learn to fix
them fast. Decoupling systems enables not only scalable robust systems
but also scalable robust teams.

Photo of Philip (Flip) Kromer

Philip (Flip) Kromer


I’m a Distinguished Engineer at CSC and co-founder, CTO and chief architect of Infochimps, a CSC Big Data Business, the leading big data platform in the cloud. At Infochimps, a CSC Big Data Business we built a scalable architecture that allows app programmers and statisticians to quickly and confidently manipulate data streams at arbitrary scale — terabytes in size, thousands of events per second, dozens of disparate data sources. We use a mixture of Hadoop, Elasticsearch, Storm/Kafka, Goliath and other industrial-strength solutions.

As part of this work, I’ve authored several successful open-source projects including Wukong (the most-used frameworks for Ruby streaming in Hadoop); Ironfan (cloud orchestration capable of spinning up clusters large or small at the push of a button) and Configliere (ruby configuration made easy). I am also a core committer to Goliath (liquid fast concurrent web framework) and Storm (an open-source streaming analytics platform emerging as a core piece of the Big Data stack).

I am the author of “Big Data for Chimps”, a book on data science in practice for O’Reilly books ( I have spoken at South by Southwest, Hadoop World, Strata, NIST and CloudCon, and contributed a case study chapter to “Hadoop: The Definitive Guide”.

Photo of Dennis Yang

Dennis Yang


Dennis is the Director of Product & Marketing. He joined Infochimps after five years as a co-founder of Floor64 where he produced the well-known business & technology blog, Techdirt, and created the crowdsourced thinktank, Insight Community. Prior to that, he spent seven years at mySimon, a division of CNET Networks. As Associate Vice President, Dennis was responsible for the overall management, operations, and development of the mySimon comparison shopping site. Dennis first cut his teeth at Andersen Consulting, and holds a B.S. from Cornell University, where he and Flip met while building a hybrid electric vehicle. Dennis authors his own personal blog, eponymously at, and you can follow him on Twitter at @sinned.

Comments on this page are now closed.


Jim Riddle
06/14/2011 9:36am PDT

Given a Chef workshop was scheduled at the same time, spending time reviewing what Chef is to an audience that haven’t heard of chef is a complete was of the other attendee’s time.

Chris Faehl
06/14/2011 4:54am PDT

Not well-organized, and a bit rambling. Needed to move it along a bit better, and demonstrate up-front what the point of this workshop was. Often got lost in his own presentation or displays. Disappointing for such an interesting subject.

  • Keynote Systems
  • Cisco
  • Google
  • Neustar
  • Betfair
  • Cotendo
  • Rackspace Hosting
  • Akamai
  • Apica
  • dynaTrace
  • Equinix
  • Facebook
  • New Relic
  • Opscode
  • Yahoo! Inc.
  • AppDynamics
  • Aptimize
  • Blaze
  • CDNetworks
  • Cedexis
  • Citrix Systems
  • Compuware Corporation
  • Dyn Inc.
  • F5 Networks
  • Heroku
  • Percona
  • Quest Software
  • Schooner Information Technology
  • SiteSpect
  • Splunk
  • Strangeloop
  • WatchMouse
  • Zeus Technology
  • Neustar

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Yvonne Romaine at

Download the Velocity Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Velocity contacts