Python in an evolving enterprise system: Integration solutions with Hadoop

Angelica Pando (AppNexus), Steve Kannan (AppNexus), Dave Himrod (AppNexus)
Location: D136 Level: Intermediate
Average rating: ***..
(3.50, 10 ratings)

At AppNexus, our data pipeline is growing like crazy, processing more than 30 terabytes of data every day and more than tripling in the last year alone. This has meant the rapid scaling and iteration of our optimization tools used for big data exploration and aggregations, all of which are built in Python. In 2011, we moved our data pipeline to the Hadoop stack in order to enable horizontal scalability for future growth. However, integrating our Python-based optimization tools with our Hadoop data pipeline has been challenging. Our continued explosive growth demands increased efficiency, whether that’s in simplifying our infrastructure or building more shared services. Over the past few months, we evaluated multiple solutions for integrating Python with Hadoop including using Hadoop Streaming, PIG with Jython UDFs, writing MapReduce in Jython, and of course, why not just do it in Java?

We’d like to share our best practices and lessons learned when integrating and scaling with Python and Hadoop. In our talk, we’ll explore the different Python-Hadoop integration options, share our evaluation process, and invite an interactive dialogue of lessons learned.

Angelica Pando


As a Software Engineer for Optimization and Analytics, Angelica builds and implements algorithms for AppNexus’ ad transaction optimization systems. Since joining AppNexus in 2011, Angelica has implemented and continues to improve original budgeting and spend pacing algorithms. Previously, Angelica was a software developer working in Research at Deutsche Bank, developing proprietary stock market index investment instruments and providing quantitative analysts with customized equity research tools. Angelica has a Bachelor’s of Electrical and Computer Engineering Degree from Cornell University.

Steve Kannan


As Engineering Manager for Optimization and Analytics, Steve manages software development for AppNexus’s best-in-class systems for ad transaction optimization. Since joining AppNexus in 2010, Steve has led the design of distributed systems for scalable computation and data processing and set the technical standards for a team of engineers while iterating on the optimization feature set. Previously, Steve was a software developer at Google working on Google Places for Business and Local Search Quality. Steve has a Master’s of Engineering in Electrical Engineering and Computer Science and a Bachelor’s Degree in Computer Science from MIT.

Photo of Dave Himrod

Dave Himrod


As Director of Optimization and Analytics, Dave Himrod manages a team of
analysts, quants, and engineers devoted to crafting world-class
algorithms. When Dave joined in 2009, he managed AppNexus¹ first account ­
eBay. While building AppNexus’ original optimization algorithm, Dave was heavily
Involved in building out the data-pipeline and defining the data model still in use
today. He has since grown his team to more than 20
people and focuses his time on building a world-class scalable optimization
system. He and his team continue to improve the tools for optimized
pricing and budgeting for the over 27 billion ad impressions our platform sees per day.
Dave has a Bachelor¹s Degree in Computer Science from University of


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or

Contact Us

View a complete list of OSCON contacts