Sep 23–26, 2019

Harnessing graph-native algorithms to enhance machine learning: A primer

Brandy Freitas (Pitney Bowes)
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 01/02
Secondary topics:  Transportation and Logistics

Who is this presentation for?

Data scientists, data analysts, data science managers, executives in charge of data science development, IT professionals




Graph databases have become much more widely popularized in the recent year. This talk aims to demystify the mathematical principles behind graph databases, offer a primer to graph native algorithms, and outline the current use of graph technology in industry.

By representing highly connected data in a graph, we have access to a host of graph native algorithms that depend on and exploit the relationships between our data. Computing and storing graph metrics can add strong new features to nodes, creating innovative predictors for machine learning. Using algorithms designed for path finding, centrality, community detection, and graph pattern matching, we can begin to rely less on inflexible, subject-driven feature engineering.

Beyond utilization of derived graph metrics, finding a way to incorporate information about the structure of the graph is a critical issue for furthering the use of machine learning on connected data. So, the question is: how we can we enable the machine learning algorithm to access the inherent structure of the graph itself? Similar to the movement in natural language processing (word2vec), where the aim is to preserve information about where a word is in a sequence, there is a movement in graph analysis to capture community and adjecency of nodes. Using node embedding to create a low dimension vector representation of the node and its structural components, we no longer need to compromise and query away important structural relationships.

In this session, we will discuss the uses of native graph algorithms, advantages to using graph derived metrics in feature engineering, and current techniques for encoding graph structural information into low dimensional feature vectors.

Prerequisite knowledge

This presentation should be approachable to all attendees interested in understanding graph technology. Ideally, attendees will have a background in relational database technology and some understanding of machine learning algorithms.

What you'll learn

Understanding of graph databases, graph native algorithms, how graph metrics can provide enhanced features for machine learning, where graph database technology is appropriate (and where it is not) in industry use cases.
Photo of Brandy Freitas

Brandy Freitas

Pitney Bowes

Brandy Freitas is a principal data scientist at Pitney Bowes, where she works with clients in a wide variety of industries to develop analytical solutions for their business needs. Brandy is a research physicist-turned-data scientist based in Boston, MA. Her academic research focused primarily on protein structure determination, applying machine learning techniques to single-particle cryoelectron microscopy data. Brandy is a National Science Foundation Graduate Research Fellow and a James Mills Pierce Fellow. She holds an undergraduate degree in physics and chemistry from the Rochester Institute of Technology and did her graduate work in biophysics at Harvard University.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts