The Financial Industry Regulatory Authority (FINRA) is a private sector regulator responsible for analyzing over 90% of the equities and 65% of the option activity in the US to look for fraud, market manipulation, insider trading, and abuse. John Hitchingham shares insights into the design and operation of FINRA’s data lake in the AWS cloud, which provides storage, query, and catalog capability using S3, EMR, and a FINRA-developed data catalog and management system. Users can query across petabytes of data in seconds on AWS S3 using Presto and Spark—all while maintaining security and data lineage. FINRA implemented the cloud data warehouse to consolidate a series of data silos as part of a two-and-a-half-year all-in migration of FINRA’s Market Regulation systems to the cloud. It provides increased operational resiliency in response to market events such as Brexit while giving analysts and data scientists within FINRA increased insight into data.
Leveraging S3 for storage provides a resilient, scalable, cost-effective storage layer for data in the cloud data warehouse. Data is stored in text format for archival queries and ORC format for performant queries. The herd data catalog provides a platform-independent way to track data. It supports data versioning, storage of business and technical metadata, and schema information that can be used to query registered data. AWS EMR provides a scalable and secure compute query platform for running ETL, batch analytics and interactive analytics against data stored on S3. Keeping data on S3 provides increased durability, along with the ability to rapidly scale compute up and down to match demand.
John Hitchingham is director of performance engineering at FINRA, where he is responsible for driving technical innovation and efficiency across a cloud application portfolio that processes over 75 billion market events per day to detect fraud, market manipulation, insider trading, and abuse. Previously, John worked at both large and boutique consulting firms providing technical design and consulting services to startup, media, and telecommunications clients. John holds a BS in electrical engineering from Rutgers University.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org