Collaboration insights through data access graphs
Who is this presentation for?Data scientists or analysts
A deep understanding of how users interact with your products—and how they interact with each other through your products—is of immense value. As is the realization that the (anonymized) data fingerprint of data accessed and generated by your users can be a trail that, when followed, leads to these insights.
Ravi Krishnaswamy outlines a working implementation that leverages existing analytics frameworks and tools like Spark, Mixpanel, and Neo4j to reconstruct communities and obtain workflow insights based on accessed and modified design file data. The key insight applied here—each time a data element or document changes—recording the before and after fingerprints in the same record lets you apply postprocessing logic at scale to recreate lineages of data. A hashed fingerprint of data not only connects two users but also knowing the before and after hashes of a modified data item establishes its lineage.
The flexibility of a schema for a graph database allowed Autodesk to represent data from different products, users, and applications. The resulting graphs generated a lot of interesting insights, including high-level captures, for example, communities that emerged based on industry type, including web and mobile users interacting with industry type and companies.
Once the data was in a graph representation, Autodesk found it was able to extract information (e.g., time series based on lineage access) and further process using tools like pandas. Having a graph as an intermediate representation gave Autodesk the ability to leverage other tools further downstream that it would not have be able to otherwise. It also trained a model with the graph to apply link prediction. Using several machine learning (ML) models, including GraphSage and training the models on three out of four weeks of data, Autodesk estimated how well the links in the remaining week would be predicted.
You’ll move through motivation, implementation challenges, analytics pipeline integrating Spark with Neo4j, and a list of queries that generated some of our results.
- Familiarity with Spark, SparkQL, and analytics techniques (e.g., pandas)
- A basic understanding of graphs and graph algorithms (useful but not required)
What you'll learn
- Learn how to infer user communities by data access patterns using graph database techniques
- Understand graph database queries and their power
- See real-world application of the techniques and some recent ML techniques using GraphSage
Ravi Krishnaswamy is the director of software architecture in the AutoCAD Group at Autodesk. He has a passion for technology and has implemented a wide range of solutions for products at Autodesk from analytics and database applications to mobile graphics. His current projects involve analytics solutions on product usage data that leverage graph databases and machine learning techniques on graphs.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires