Recommendation engines are some of the first commercial examples of cognitive computing applications. They were also the first big data products produced – think Amazon product recommendations, Google search results, or LinkedIn’s “People You May Know” feature. Recommendations narrow what could become a complex decision to just a few recommendations. Their underlying algorithms “learn” from the experience of the past. They reduce big data observations to small data actions. What if recommendations could help analysts sort through data in Hadoop?
Building a query recommendation engine is slightly more complex than building a product, data, or people recommendation engine. Writing queries is a complex task, made up of multiple steps and those steps are not always predictably directed. In this session we’ll share some of the technical challenges and learnings from building a cognitive application in daily use today, by analyst teams from eBay to Square. We’ll cover:
Understanding user context at the point of interaction
What recommendations will a user benefit from? For query recommendations, do we want to suggest attributes, tables, filters, joins, or show data quality or other warnings on the data being queried? The recommendation depends on the context where the user is within a query. This requires us to deeply understand the user and their context.
Ensuring quality & relevance
Recommendations have to be very relevant to the user. Achieving this relevance requires that we understand the correlations among various data objects and that we leverage them well for the current user. For example, among the myriad choices available, what are the three most useful, accurate filters and joins to suggest for the specific query being written by an analyst, which efficiency or quality warnings need to be shown to a user, etc. All of these require us to accurately model the usage of the data being queried.
Delivering performance and responsiveness
Recommendations have to appear at the speed of typing so that a user doesn’t notice lag as they type. Often, databases being queried may have several millions of objects including tables, attributes, predicates. Such sizes pose a challenge on the recommendation engine and lead to several interesting trade-offs around what information to push to a user’s browser for very fast responses, and what information to retain on the server.
We’ll use the Alation SmartSuggest feature as a working example of how to navigate these technical trade-offs, to build a cognitive application in use daily by one of the most discerning audiences for recommendations: analysts and data scientists themselves.
Venky Ganti has been a data enthusiast since graduate school, and has enjoyed working at various levels of the data analysis stack. At Google, he was an avid data consumer who helped engineer innovative data products that now generate over one billion dollars in yearly revenue. At Microsoft, he worked on advanced data quality infrastructure in ETL platforms. Venky started out working on advanced data analysis and mining technology during his PhD at the University of Wisconsin-Madison. Venky thoroughly enjoys spending time with his family, going on walks, and roller-blading, when he feels adventurous.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.