Harnessing graph-native algorithms to enhance machine learning: A primer

Brandy Freitas (Pitney Bowes)

5:25pm–6:05pm Wednesday, September 25, 2019

Location: 1A 01/02

Data Science, Machine Learning, & AI

Secondary topics: Transportation and Logistics

Average rating:

(4.25, 4 ratings)

Who is this presentation for?

Data scientists, data analysts, data science managers, executives in charge of data science development, and IT professionals

Level

Beginner

Description

Graph databases have become much more widely popularized in the recent year. Brandy Freitas demystifies the mathematical principles behind graph databases, offers a primer to graph native algorithms, and outlines the current use of graph technology in industry.

By representing highly connected data in a graph, you have access to a host of graph native algorithms that depend on and exploit the relationships between your data. Computing and storing graph metrics can add strong new features to nodes, creating innovative predictors for machine learning. Using algorithms designed for path finding, centrality, community detection, and graph pattern matching, you can rely less on inflexible, subject-driven feature engineering.

Beyond use of derived graph metrics, finding a way to incorporate information about the structure of the graph is a critical issue for furthering the use of machine learning on connected data. So, the question is how to enable the machine learning algorithm to access the inherent structure of the graph itself. Similar to the movement in natural language processing (Word2Vec), where the aim is to preserve information about where a word is in a sequence, there’s a movement in graph analysis to capture community and adjacency of nodes. Using node embedding to create a low dimension vector representation of the node and its structural components, you no longer need to compromise and query away important structural relationships.

You’ll leave with an understanding of the uses of native graph algorithms, advantages to using graph derived metrics in feature engineering, and current techniques for encoding graph structural information into low dimensional feature vectors.

Prerequisite knowledge

A working knowledge of relational database technology
A basic understanding of machine learning algorithms

What you'll learn

Understand graph databases, graph native algorithms, how graph metrics can provide enhanced features for machine learning, and where graph database technology is appropriate (and where it isn't) in industry use cases

Brandy Freitas

Pitney Bowes

Brandy Freitas is a principal data scientist at Pitney Bowes, where she works with clients in a wide variety of industries to develop analytical solutions for their business needs. Brandy is a research-physicist-turned-data-scientist based in Boston, Massachusetts. Her academic research focused primarily on protein structure determination, applying machine learning techniques to single-particle cryoelectron microscopy data. Brandy is a National Science Foundation Graduate Research Fellow and a James Mills Pierce Fellow. She holds an undergraduate degree in physics and chemistry from the Rochester Institute of Technology and did her graduate work in biophysics at Harvard University.