Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Graph mining for log data

David Andrzejewski (Sumo Logic)
3:30pm–4:00pm Wednesday, 02/18/2015
Hardcore Data Science
Location: LL20 BC.
Average rating: ***..
(3.60, 5 ratings)

In a typical web or mobile application, even simple user interactions like logging in can trigger an intricate cascade of software behaviors across many services spread over multiple machines. As these requests flow through the system, the log messages emitted may be uninteresting on their own but noteworthy when considered collectively in terms of their graphical or temporal structure. If a new user signup on our site typically results in the sequence of log events (A,B,C), we may be concerned if we only observe (A,B) or (A,C), or if our logs show (A,B,B,B,C) or (A,C,B).

This kind of analysis is often awkward or impractical to express using standard relational queries. A simple question like "what is the most common sequence of remote calls triggered by this user action?” can be difficult to formulate in terms of a relational database query but quite natural to consider from a graph mining perspective.

This talk will provide an introduction to graph mining concepts, the fundamental operations which can serve as building blocks for these analyses, and demonstrations of practical applications to machine log data, including real-world examples of:

  • user behavior modeling
  • distributed systems debugging
  • security forensics
  • monitoring critical workflows
  • exploring connections between system components
Photo of David Andrzejewski

David Andrzejewski

Sumo Logic

Lead Data Sciences Engineer at Sumo Logic and co-organizer of SF Bay Area Machine Learning meetup group.


Comments on this page are now closed.


Picture of David Andrzejewski
David Andrzejewski
03/02/2015 10:11pm PST

Slides (pdf):