Presented By O’Reilly and Intel Nervana
Put AI to work
September 17-18, 2017: Training
September 18-20, 2017: Tutorials & Conference
San Francisco, CA

AI within O'Reilly Media

Paco Nathan (O'Reilly Media)
11:55am–12:35pm Tuesday, September 19, 2017
Verticals and applications
Location: Yosemite A Level: Non-technical
Secondary topics:  Media
Average rating: ****.
(4.00, 4 ratings)

What you'll learn

  • Explore examples of how a media company leverages AI, particularly for enterprise training and learning use cases
  • Understand common issues in media, training, and search


Paco Nathan explains how O’Reilly Media employs AI, from the obvious (chatbots, case studies about other firms) to the less so (using AI to show the structure of content in detail, enhance search and recommendations, and guide editors for gap analysis, assessment, pathing, etc.). Approaches include vector embedding search, summarization, TDA for content gap analysis, and speech-to-text to index video.

AI efforts within O’Reilly Media began in late 2016, starting with improved search analytics and indexing before being combined with full-text NLP analytics of books and video transcripts, then topic modeling for accepted conference proposals. These techniques help augment the capabilities of editors (e.g., offering inferred themes/mappings for the content which they curated), but O’Reilly also inserts a human in the loop into what the text analytics based on ML automation produces. The foundation of this work produces an ontology that describes the semantics of most audience interactions with O’Reilly Media, as well as vendor-sponsor relations. One of the important lessons O’Reilly learned was the value and priority of maintaining integrity between the human-scale ontology graph and the large semantic similarity graph produced by ML automation. (Crucial use cases operate on the former.)

Some of these experiences at O’Reilly are relatively unique, since its content comes from a number of different publishers (all on Safari) across a broad range of disciplines and content types, served to thousands of enterprise organizations and B2C customers. Overall, this work reflects recent major changes in industry, away from “reference” content, with much more emphasis placed on training (less about topics and keywords, more about job roles and skills). Looking ahead, there are opportunities for L&D buyers to leverage ontologies when evaluating training vendors, which would require an independent, cross-vendor body to manage a constrained vocabulary.

Photo of Paco Nathan

Paco Nathan

O'Reilly Media

Paco Nathan leads the Learning Group at O’Reilly Media. Known as a “player/coach” data scientist, Paco led innovative data teams building ML apps at scale for several years and more recently was an evangelist for Apache Spark, Apache Mesos, and Cascading. Paco has expertise in machine learning, distributed systems, functional programming, and cloud computing with 30+ years of tech industry experience, ranging from Bell Labs to early-stage startups. Paco is an advisor for Amplify Partners and was cited in 2015 as one of the top 30 people in big data and analytics by Innovation Enterprise. He is the author of Just Enough Math, Intro to Apache Spark, and Enterprise Data Workflows with Cascading.

Comments on this page are now closed.


Picture of Paco Nathan
09/20/2017 4:19am PDT

Many thanks for the opportunity to present. I really enjoyed the Q&A and discussions afterwards! Here are my slides: