TD Bank’s teams faced challenges addressing propensity (e.g., the propensity to pick up a new product and the propensity to be a first time home buyer). These problems are often solved during the machine learning workflow, with data understanding, feature extraction, and model development. Generally, data scientists will select the most relevant features, address the class imbalance issue, select a suitable machine learning algorithm, and iteratively tune it to produce an acceptable model. For example, data scientists can use random forest to select the top features and oversample to handle the class imbalance issue, pick the logistic regression model, and tune it iteratively until 80% accuracy is achieved.
Drawing on their experience working with analytics teams across different business units within TD Bank, Steven Totman and Faraz Rasheed offer an overview of Griffin, a high-level, easy-to-use framework built on top of Spark, which encapsulates the complexities of common model development tasks within four phases: data understanding, feature extraction, model development, and serving modeling results. Griffin encapsulates common modeling use cases and exposes a simple, high-level API which handles the implementation of complex modeling tasks under the hood.
Steven Totman is Cloudera’s big data subject-matter expert, helping companies monetize their big data assets using Cloudera’s Enterprise Data Hub. Steve works with over 180 customers worldwide and helps across verticals in architectures around data management tools, data models, and ethical data usage. Previously, Steve ran strategy for a mainframe-to-Hadoop company and drove product strategy at IBM for DataStage and Information Server after joining with the Ascential acquisition. He architected IBM’s Infosphere product suite and led the design and creation of governance and metadata products like Business Glossary and Metadata Workbench. Steve holds several patents in data integration and governance- and metadata-related designs. Although he is based in NYC, Steve is happiest onsite with customers wherever they may be in the world.
Faraz Rasheed is senior manager at TD Bank, Canada where he is leading the Enterprise Big Data Analytics team helping different line of businesses build data science solutions on bank’s big data analytics platform. Faraz holds a PhD in Computer Science with focus on Machine Learning from University of Calgary. Before joining TD Bank, Faraz has worked as senior data scientist at BlackBerry Ltd. Faraz has also been teaching data science at Ryerson University and WeCloud Data.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org