14–17 Oct 2019

Executive Briefing: How the growth of voice-based AI stands to blur the lines of big data

16:5017:30 Thursday, 17 October 2019
Location: Windsor Suite
Secondary topics:  Text, Language, and Speech



As voice-based AI continues to gain popularity as a way for customers, businesses, and brands to operate more efficiently, it’s important to understand that, while it presents a slew of new data at our disposal, the technology is still in its infancy. Big data use that stems from user interaction is usually personal and accurate, comes from a single device, and most of the time has a clear context.

Like any new baby born to the world, the inability to communicate correctly and the lack of experience understanding its surroundings means it’s inevitable that mistakes will be made. Andreas Kaltenbrunner examines three significant ways voice-based virtual assistants will make big data analytics more complex and explores the various steps you can take to manage the added complexity in your company.

First, usage data is noisier because understanding voice depends on understanding each person. This introduces new factors such as accents, tone of voice variances, slang, utterances, etc., which lowers the quality of the conversation to text.

Second, virtual assistants are present in different social contexts including family, work, social gatherings, etc., where there are multiple people speaking and multiple virtual assistants present, all processing the interactions at the same time. Virtual assistants must be able to recognize different users, making usage data more complex and leading to increased redundancy as the same conversations are captured several times.

Third, multiple people having several different conversations at the same time also increases complexity. This requires the technology to understand overlapping conversations and split them in a meaningful way, making conversation data more difficult to interpret and harder to determine the correct context. Adding to the challenge, big companies haven’t agreed on a standard direct communication between voice assistants, meaning many logged conversations will be between assistants (e.g., Google talking to Alexa) rather than humans, diminishing the power to help people learn.

But Andreas offers you steps you can take to manage the added complexity: don’t cut corners to save money on the quality of speech-to-text conversion, particularly in languages your best customers use; devise a protocol where the assistant learns to recognize each user, at least for the ones who will use it periodically—this may require the user to repeat some key phrases after a few interactions—and also prevents unauthorized users from requesting unwanted actions (e.g., your child buying something delivered to his best friend) and helps distinguish human from nonhuman voices such as other assistants; devise a strategy where assistants of the same brand directly agree on who logs what or at least agree to mark which conversations might be redundant; work within your industry to agree on standards for direct communication between assistants and standard ways to share anonymous data usage that may alleviate the complexity (e.g., like the IoT Alliance).

Like any good assistant, the purpose of voice-based virtual AI is to help businesses operate more efficiently and generate more revenue. But we must also remember that innovation demands education; it must go through an inevitable learning process. While its initial effects stand to mess a bit with big data, when the complexity is managed correctly, the reward will surely outweigh the effort.

Prerequisite knowledge

  • A basic understanding of data science, data analytics, machine learning, and business intelligence

What you'll learn

  • Discover three significant ways voice-based virtual assistants will make big data analytics more complex and various steps you can take to manage the added complexity
Photo of Andreas Kaltenbrunner

Andreas Kaltenbrunner


Andreas Kaltenbrunner is senior director of data analytics at NTENT, where he leads a team focused on user behavior analysis and improvements for ranking in mobile search. Andreas is also teaching a master course on data-driven social analytics at Universitat Pompeu Fabra and is involved in research activities centered on computational social science, social media and social network analysis, areas in which he has coauthored more than 70 publications. Previously, he led the Social Media Research Line at the Barcelona Media technology center and led the Digital Humanities Research Unit at the Eurecat technology center. He earned his PhD in computer science and digital communication from the Universitat Pompeu Fabra, with a thesis about stochastic effects in human and neural communication patterns.

  • Intel AI
  • O'Reilly
  • Amazon Web Services
  • IBM Watson
  • Dell Technologies
  • Hewlett Packard Enterprise
  • AXA

Contact us


For conference registration information and customer service


For more information on community discounts and trade opportunities with O’Reilly conferences


For information on exhibiting or sponsoring a conference


For media/analyst press inquires