Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Griffin: Fast-tracking model development in Hadoop

Steven Totman (Cloudera), Faraz Rasheed (TD Bank)
1:15pm1:55pm Thursday, September 28, 2017
Data engineering, Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Architecture, Financial services
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Architects, team leads, big data project managers, data engineers, and data scientists

Prerequisite knowledge

  • A basic understanding of Hadoop components and Spark

What you'll learn

  • Explore the Griffin framework, which enables common model development tasks within four phases: data understanding, feature extraction, model development, and serving modeling results

Description

TD Bank’s teams faced challenges addressing propensity (e.g., the propensity to pick up a new product and the propensity to be a first time home buyer). These problems are often solved during the machine learning workflow, with data understanding, feature extraction, and model development. Generally, data scientists will select the most relevant features, address the class imbalance issue, select a suitable machine learning algorithm, and iteratively tune it to produce an acceptable model. For example, data scientists can use random forest to select the top features and oversample to handle the class imbalance issue, pick the logistic regression model, and tune it iteratively until 80% accuracy is achieved.

Drawing on their experience working with analytics teams across different business units within TD Bank, Steven Totman and Faraz Rasheed offer an overview of Griffin, a high-level, easy-to-use framework built on top of Spark, which encapsulates the complexities of common model development tasks within four phases: data understanding, feature extraction, model development, and serving modeling results. Griffin encapsulates common modeling use cases and exposes a simple, high-level API which handles the implementation of complex modeling tasks under the hood.

Photo of Steven Totman

Steven Totman

Cloudera

Steven Totman is Cloudera’s big data subject-matter expert, helping companies monetize their big data assets using Cloudera’s Enterprise Data Hub. Steve works with over 180 customers worldwide and helps across verticals in architectures around data management tools, data models, and ethical data usage. Previously, Steve ran strategy for a mainframe-to-Hadoop company and drove product strategy at IBM for DataStage and Information Server after joining with the Ascential acquisition. He architected IBM’s Infosphere product suite and led the design and creation of governance and metadata products like Business Glossary and Metadata Workbench. Steve holds several patents in data integration and governance- and metadata-related designs. Although he is based in NYC, Steve is happiest onsite with customers wherever they may be in the world.

Photo of Faraz Rasheed

Faraz Rasheed

TD Bank

Faraz Rasheed is senior manager at TD Bank, Canada where he is leading the Enterprise Big Data Analytics team helping different line of businesses build data science solutions on bank’s big data analytics platform. Faraz holds a PhD in Computer Science with focus on Machine Learning from University of Calgary. Before joining TD Bank, Faraz has worked as senior data scientist at BlackBerry Ltd. Faraz has also been teaching data science at Ryerson University and WeCloud Data.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)