As we’ve moved from simple statistical analyses of big data to decision-making based on big data and data-science models, we face an ironic “dirty secret.” It is becoming increasingly difficult to understand why particular decisions have been made. In many applications, data-driven models now take as input massive numbers of “signals”, including words in text, locations frequented, merchants transacted with, web pages visited, things Liked on Facebook, and more. I will illustrate, using two or three predictive analytics examples, why various stakeholders need to understand the reasons for decisions: managers, engineers, data scientists, business customers … and consumers. I will show a general technique for understanding the decisions made based on these massive, sparse signals. I will close by suggesting how this might create an opportunity for making certain predictions about consumers more privacy friendly.
Foster Provost is coauthor of the O’Reilly best-selling book, Data Science for Business (http://data-science-for-biz.com). He has designed data science solutions for businesses for over two decades, and has co-founded several successful companies focusing on data science for advertising (incl., Dstillery & Integral Ad Science). In his current job as Professor and NEC Faculty Fellow at the NYU Stern School of Business, Foster teaches in the MS in Data Science, MS in Business Analytics, MBA, and PhD programs. His data science research has won many awards and is broadly cited. He served as Program Chair for the ACM SIGKDD Conference and for many years as Editor-in-Chief for the journal Machine Learning.
For exhibition and sponsorship opportunities, email firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences, email email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World contacts
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.