Taming the Big Data Fire Hose

Data: Real-Time and Streaming
Location: C121/122
Average rating: ***..
(3.50, 4 ratings)

The term Big Data describes a new class of database applications that need to process massive data volumes in two disparate states – real time and historical. In either state, the requirements of Big Data applications vastly exceed the capabilities of traditional, one-size-fits-all database systems. Most Big Data applications require MPP scale-out architectures and have the following characteristics:
1. A “fire hose” data source such as an HTTP streams, sensor grid or other machine-generated data
2. A real-time database capable of ingesting, organizing and managing high volume inputs
3. A persistent data storage and analysis infrastructure capable of managing petabyte+ historical databases
In this talk, we will introduce a simple formula for all Big Data applications: Big Data = Fast Data + Deep Data. Through a use-case format, we will discuss the specialized requirements for real-time (“fast”) and analytic (“deep”) data management. We’ll also explore ways in which popular business intelligence solutions can be used to implement real-time and historical analytics.

Photo of John Hugg

John Hugg


John Hugg has spent his entire career working with databases and information management. In 2008, John was lured away from a PhD program by Mike Stonebraker to work on what became VoltDB. As the first engineer on the product, he liaised with a team of academics at MIT, Yale, and Brown who were building H-Store, VoltDB’s research prototype. Then John helped build the world-class engineering team at VoltDB to continue development of the open source and commercial products.