Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud, a real-world case study

Jaipaul Agonus (FINRA)
2:05pm–2:45pm Wednesday, 09/30/2015
Hadoop Use Cases
Location: 1 E12/ 1 E13 Level: Intermediate
Average rating: ***..
(3.71, 14 ratings)
Slides:   1-PDF    external link

The Financial Industry Regulatory Authority (FINRA) acquires approximately 30 billion market events per day that are ingested, aggregated, and analyzed for the purpose of performing surveillance of the US markets. FINRA uses cloud computing and leverages big data technology to process an ever-increasing volume of financial data.

FINRA’s Market Regulation Surveillance Program runs hundreds of surveillance algorithms and patterns daily against hundreds of terabytes of market data to detect market manipulation, compliance breeches, and other potentially illegal activities. FINRA is leveraging a Hadoop architecture running on Amazon Web Services (AWS) cloud infrastructure. This architecture provides elastic capacity so that FINRA can easily handle dynamic workloads while benefiting from operational economies of scale.

This talk will focus on how FINRA is successfully migrating a large portfolio of SQL-oriented batch analytics jobs from an in-house proprietary MPP database appliance platform to a cloud based SQL over the Hadoop platform, consisting of Hive, Amazon EMR, and S3. This presentation covers the following:

  • Current MPP-based architecture behind FINRA’s surveillance patterns
  • Pain points with the MPP platform that are related to data silos, cost, and non-elasticity
  • New architecture solution that’s designed based on SQL on Hadoop and IaaS using Hive, AWS EMR, and S3
  • SQL over Hadoop Vs Java map reduces considerations for implementation
  • HIVE performance tuning for large-scale analytics
  • Leveraging elasticity for real-world problems
  • Utilizing SQL combined with Hive UDFs and UDTFs – a way to get the best of both worlds!
  • Lessons learned, risks, and mitigation strategies applied for migration
  • Future evolution of SQL over Hadoop
  • Future plans and the role of computation platforms like SPARK
Photo of Jaipaul Agonus

Jaipaul Agonus

FINRA

Jaipaul Agonus is a director in the Market Regulation Technology Department at FINRA. Jaipaul is a big data engineering leader with nearly 18 years of IT industry experience, specializing in big data analytics and cloud-based solutions. He’s currently involved in building next-generation big data market analytic platforms with machine learning, advanced visualization, and contextual access across applications.

Comments on this page are now closed.

Comments

Picture of Jaipaul Agonus
Jaipaul Agonus
09/30/2015 11:38am EDT

Folks,
I have the deck available in SlideShare in case you are interested,
http://www.slideshare.net/agonusj/hive-amazon-emr-s3-elastic-big-data-sql-analytics-processing-in-the-cloud
-Jaipaul