July 20–24, 2015
Portland, OR

Streamlining and automating data ingestion into Hadoop

Sastry Malladi (StubHub)
10:40am–11:20am Wednesday, 07/22/2015
Sponsored E142
Average rating: ***..
(3.50, 2 ratings)

Everyone wants to leverage Hadoop for their data processing needs, whether for real-time processing or batch processing; but there is no single common way to bring data from a variety of data sources and types into Hadoop in an automated fashion. There are tools and frameworks available to discretely address this for one type of data source or the other, but there is no comprehensive solution. At StubHub, we have many use cases such as real-time personalized recommendations, customer insights, fraud analysis, and business analytics, that require leveraging all our data in Hadoop. This prompted us to build a data ingestion framework to help automate data ingestion.

In this talk we cover an approach that we took, and present the generic ingestion framework we developed that leverages Flume, Kafka, and others.

This session sponsored by StubHub

Photo of Sastry Malladi

Sastry Malladi


Sastry Malladi is chief architect at StubHub, responsible for the overall technology architecture, strategy, and direction. Sastry works with and leads a team of architects, and closely collaborates with product management, engineering, and business leads. Sastry also spearheaded and oversees big data platform development and the overall data strategy transformation to big data and data-driven decision culture.

Sastry is a veteran technologist with nearly two-and-a-half decades of experience developing, leading, and architecting various highly-scalable and distributed systems, in the areas of service oriented architecture (SOA), application servers, Java/J2EE/web services middleware, and cloud computing, to name a few. Before transitioning to StubHub, he led the architecture transformation of eBay from its monolithic architecture to the distributed and scalable service-oriented architecture that it is today. Prior to joining eBay, Sastry was co-founder and CTO of OpenGridSolutions, founding member and architect at SpikeSource, and an architect at Oracle. He frequently speaks at conferences.