Machine learning for managers
Who is this presentation for?Non-technical or Business audience
Due to the tremendous recent success and popularity of ML, these technologies now affect a wide array of software products and businesses, notably healthcare. Bob Horton, Mario Inchiosa, and John-Mark Agosta provide an overview of the fundamental concepts of ML for decision makers and software product managers so you’ll be able to make more effective use of ML results and be better able to recognize opportunities to incorporate ML technologies in a variety of products and processes. This isn’t just another dumbed-down intro to ML; Bob, Mario, and John-Mark focus on how to use ML to make better decisions, including whether or not to use ML in a given application.
Part I: Software 2.0
Bob, Mario, and John-Mark walk you through building a language classifier, a program that can look at textual data and decide what language it’s written in using traditional hard-coded logic. To build a language classifier, you need to decide which words of the text to look at, how many different words you need to identify each language, useful statistics you can gather about how often various words appear in each language, how to measure how well your classifier works, or if a new rule you added to your classifier program made it better. ML basically just automates this process. You’ll build some simple ML classifiers, evaluate their performance, and examine the phenomenon that’s kryptonite to ML: overfitting. Along the way, you’ll learn important vocabulary terms like feature, label, and test set and become familiar with diagnostic plots that chart the learning process (learning curves) as well as commonly used plots for visualizing data distributions, and you’ll start to understand why your data scientists are always begging for more data.
Part II: Decision support
Most ML classifiers give fuzzy results; rather than telling you whether a picture is a dog or a cat, it gives you probabilities. People accustomed to black and white answers may need to learn new approaches to deal with these shades of gray. Bob, Mario, and John-Mark examine the process of characterizing the performance of a classifier by relating its sensitivity (the ability to detect positive cases) to its specificity (the ability to not detect the negatives). In general, classifiers allow you to make a trade-off between quality and quantity by adjusting a threshold; you have to settle for finding fewer positives if you insist on only taking the purest subset.
In a business context, you can often assign dollar values to each of the two types of mistakes a binary classifier can make: it can think a bad widget is good or it can think a good widget is bad. In medical testing, there’s usually a different weighting for screening tests (where it’s important to not miss anybody, so sensitivity is emphasized over specificity), as opposed to confirmatory tests (where you want to be sure the patient really has the disease). Since ML makes it possible to test for huge numbers of possible errors (e.g., in electronic health record systems), you may need to consider the risk of overwhelming users with false alarms (leading to alert fatigue). The trade-offs between sensitivity and specificity need to be evaluated in the context of the system in which the classifier is deployed. You’ll use an economic utility model to weight these types of errors and help decide on the best classifier threshold to use to maximize reward in scenarios. As part of that process, you’ll take an in-depth look at some of the most important types of diagnostic plots for visualizing classifier performance (including receiver operating characteristic (ROC) curves, precision-recall curves, and lift plots), and frame ML as a way to automate (some) decisions.
Part III: Causality and other cautionary tales
The dirty secret of ML is that it’s built on correlation, not causation. Just because we find that red-headed people are more likely to get melanoma doesn’t mean we can protect them from cancer by coloring their hair brown. Bob, Mario, and John-Mark outline the problem of confounding and how it can affect your interpretation of how various features might affect outcomes (this also shows why we still need statisticians to keep us honest). To really sort out cause-and-effect relationships, you need more than just ML; you need to do experiments. Both A/B tests in software development and randomized controlled trials of medical interventions are designed to detect causal relationships, and they briefly explore the statistical considerations involved in that kind of testing. You’ll discover a new approach to automating the experimentation process using a web-based cognitive service based on reinforcement learning.
- Familiarity with Microsoft Excel
Materials or downloads needed in advance
- A laptop with a web browser and Microsoft Excel installed
What you'll learn
- Gain a general overview of how ML differs from traditional software engineering
- Learn how to apply probabilistic results, including estimating the costs and benefits of applying ML classifiers in various contexts, and how ML and advanced analytics can help to guide, but not replace, the process of experimentally testing the effects of incremental changes to products and processes
Bob Horton is a senior data scientist on the user understanding team at Bing. Bob holds an adjunct faculty appointment in health informatics at the University of San Francisco, where he gives occasional lectures and advises students on data analysis and simulation projects. Previously, he was on the professional services team at Revolution Analytics. Long before becoming a data scientist, he was a regular scientist (with a PhD in biomedical science and molecular biology from the Mayo Clinic). Some time after that, he got an MS in computer science from California State University, Sacramento.
Mario Inchiosa’s passion for data science and high-performance computing drives his work as principal software engineer in Microsoft Cloud + AI, where he focuses on delivering advances in scalable advanced analytics, machine learning, and AI. Previously, Mario served as chief scientist of Revolution Analytics; analytics architect in the big data organization at IBM, where he worked on advanced analytics in Hadoop, Teradata, and R; US chief scientist at Netezza, bringing advanced analytics and R integration to Netezza’s SQL-based data warehouse appliances; US chief science officer at NuTech Solutions, a computer science consultancy specializing in simulation, optimization, and data mining; and senior scientist at BiosGroup, a complexity science spin-off of the Santa Fe Institute. Mario holds bachelor’s, master’s, and PhD degrees in physics from Harvard University. He has been awarded four patents and has published over 30 research papers, earning Publication of the Year and Open Literature Publication Excellence awards.
John-Mark Agosta is a principal data scientist in IMML at Microsoft. Previously, he worked with startups and labs in the Bay Area, including the original Knowledge Industries, and was a researcher at Intel Labs, where he was awarded a Santa Fe Institute Business Fellowship in 2007, and at SRI International after receiving his PhD from Stanford. He has participated in the annual Uncertainty in AI conference since its inception in 1985, proving his dedication to probability and its applications. When feeling low, he recharges his spirits by singing Russian music with Slavyanka, the Bay Area’s Slavic music chorus.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires