Programming by examples (PBE) is a new frontier in AI. It enables users to create scripts from input-output examples and can provide a 10-100x productivity increase for programmers in some task domains. PBE can also enable the 99% of computer users who are nonprogrammers to create small scripts to automate repetitive tasks. Sumit Gulwani leads a deep dive into this new programming paradigm and explores the science behind it.
PBE is revolutionizing data wrangling. Data scientists spend up to 80% of their time transforming data into a form suitable for machine learning (ML). PBE enables automation of many data manipulation tasks like string/number/date transformations (e.g., converting “FirstName LastName” to “LastName, FirstName”), column splitting, table extraction from log files, web pages, and PDFs, normalizing semistructured spreadsheets into structured tables, transforming JSON from one format to another, etc. These capabilities have been released inside multiple Microsoft products, including Excel, PowerShell, and Azure ML Workbench. The synthesized scripts are quite performant and enable efficient processing on large datasets. Another killer application of PBE is working with repetitive code transformations like formatting or refactoring, given that developers spend up to 40% of their time refactoring code in an application migration scenario.
A key technical challenge in PBE is to search for programs in an underlying domain-specific language that are consistent with the user-provided examples. Microsoft’s real-time search methodology leverages logical reasoning techniques and neural-guided heuristics. Another challenge is to resolve the ambiguity in examples, since many programs can satisfy few examples. Microsoft’s ML-based ranking techniques often select an intended program from among the many that satisfy the examples. Microsoft also leverages active-learning-based user interaction models that facilitate a bot-like conversation with the user. Microsoft PROSE SDK exposes these generic search and ranking algorithms (for noncommercial use), allowing advanced developers to construct PBE capabilities for new task domains.
Sumit Gulwani is a partner research manager at Microsoft, where he leads the PROSE research and engineering team that develops APIs for program synthesis (programming by examples and natural language) and incorporates them into real products. He is the inventor of the popular Flash Fill feature in Microsoft Excel, used by hundreds of millions of people. He has published 120+ peer-reviewed papers in top-tier conferences and journals across multiple computer science areas, delivered 40+ keynotes and invited talks at various forums, and authored 50+ patent applications (granted and pending). Sumit is a recipient of the prestigious ACM SIGPLAN Robin Milner Young Researcher Award, ACM SIGPLAN Outstanding Doctoral Dissertation Award, and the President’s Gold Medal from IIT Kanpur.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com