Harnessing graph-native algorithms to enhance machine learning: A primer
Who is this presentation for?Data scientists, data analysts, data science managers, executives in charge of data science development, IT professionals
Graph databases have become much more widely popularized in the recent year. This talk aims to demystify the mathematical principles behind graph databases, offer a primer to graph native algorithms, and outline the current use of graph technology in industry.
By representing highly connected data in a graph, we have access to a host of graph native algorithms that depend on and exploit the relationships between our data. Computing and storing graph metrics can add strong new features to nodes, creating innovative predictors for machine learning. Using algorithms designed for path finding, centrality, community detection, and graph pattern matching, we can begin to rely less on inflexible, subject-driven feature engineering.
Beyond utilization of derived graph metrics, finding a way to incorporate information about the structure of the graph is a critical issue for furthering the use of machine learning on connected data. So, the question is: how we can we enable the machine learning algorithm to access the inherent structure of the graph itself? Similar to the movement in natural language processing (word2vec), where the aim is to preserve information about where a word is in a sequence, there is a movement in graph analysis to capture community and adjecency of nodes. Using node embedding to create a low dimension vector representation of the node and its structural components, we no longer need to compromise and query away important structural relationships.
In this session, we will discuss the uses of native graph algorithms, advantages to using graph derived metrics in feature engineering, and current techniques for encoding graph structural information into low dimensional feature vectors.
Prerequisite knowledgeThis presentation should be approachable to all attendees interested in understanding graph technology. Ideally, attendees will have a background in relational database technology and some understanding of machine learning algorithms.
What you'll learn
Brandy Freitas is a principal data scientist at Pitney Bowes, where she works with clients in a wide variety of industries to develop analytical solutions for their business needs. Brandy is a research physicist-turned-data scientist based in Boston, MA. Her academic research focused primarily on protein structure determination, applying machine learning techniques to single-particle cryoelectron microscopy data. Brandy is a National Science Foundation Graduate Research Fellow and a James Mills Pierce Fellow. She holds an undergraduate degree in physics and chemistry from the Rochester Institute of Technology and did her graduate work in biophysics at Harvard University.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts