Real-world data is incomplete and imperfect. The right way to handle it is with Bayesian inference. Michael Williams demonstrates how probabilistic programming languages hide the gory details of this elegant but potentially tricky approach, making a powerful statistical method easy and enabling rapid iteration and new kinds of data-driven products.
Michael begins by introducing Bayesian inference and using the approach to solve a famous problem (the German Tank Problem) in three lines of code. (The code we’ll write is so simple you won’t need to be a programmer or a mathematician to understand it.) Michael then offers an overview of two working Fast Forward Labs product prototypes that crucially depend on Bayesian inference—one that supports decisions about consumer loans and one that models the future of the NYC real estate market—to highlight the advantages and use cases of the Bayesian approach, which include domains where data is scarce, where prior institutional knowledge is important, and where quantifying risk is crucial.
But as you’ll see, this naive approach to implementing Bayesian inference has serious limitations and is only useful for tiny problems. Michael explores the challenges involved in speeding it up and shares solutions ranging from classics like Metropolis Hastings and MCMC Monte Carl to modern industrial-strength algorithms like NUTS and ADVI. These algorithms are complicated, and implementing them so they give the right answer quickly is difficult.
Which brings us to the real subject of this talk: probabilistic programming—a family of languages that define fundamental probabilistic ideas such as random variables and probability distributions as primitive objects, which makes code short, simple, and declarative. And they have expert-written, blazing-fast implementations of the latest and greatest inference algorithms built right in. Michael examines a handful of probabilistic programming languages, taking a particularly close look at Stan and PyMC3—comparing their performance and deployment trade-offs and showing how the German Tank Problem and our consumer loan and NYC real estate problems could be solved using them.
Mike Lee Williams is a research engineer at Cloudera Fast Forward Labs, where he builds prototypes that bring the latest ideas in machine learning and AI to life and helps Cloudera’s customers understand how to make use of these new technologies. Mike holds a PhD in astrophysics from Oxford.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.