Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Programming by input-output examples

Sumit Gulwani (Microsoft)
4:35pm–5:15pm Wednesday, 09/12/2018
Data science and machine learning
Location: 1A 08 Level: Intermediate

Who is this presentation for?

  • Data scientists, advanced developers, and anyone curious about this new programming paradigm

Prerequisite knowledge

  • Familiarity with domain-specific languages and grammars (useful but not required)

What you'll learn

  • Explore the programming by examples paradigm through input-output examples and discover the science behind it
  • Understand cool and useful features in current products based on the cutting-edge programming-by-examples technology

Description

Programming by examples (PBE) is a new frontier in AI. It enables users to create scripts from input-output examples and can provide a 10-100x productivity increase for programmers in some task domains. PBE can also enable the 99% of computer users who are nonprogrammers to create small scripts to automate repetitive tasks. Sumit Gulwani leads a deep dive into this new programming paradigm and explores the science behind it.

PBE is revolutionizing data wrangling. Data scientists spend up to 80% of their time transforming data into a form suitable for machine learning (ML). PBE enables automation of many data manipulation tasks like string/number/date transformations (e.g., converting “FirstName LastName” to “LastName, FirstName”), column splitting, table extraction from log files, web pages, and PDFs, normalizing semistructured spreadsheets into structured tables, transforming JSON from one format to another, etc. These capabilities have been released inside multiple Microsoft products, including Excel, PowerShell, and Azure ML Workbench. The synthesized scripts are quite performant and enable efficient processing on large datasets. Another killer application of PBE is working with repetitive code transformations like formatting or refactoring, given that developers spend up to 40% of their time refactoring code in an application migration scenario.

A key technical challenge in PBE is to search for programs in an underlying domain-specific language that are consistent with the user-provided examples. Microsoft’s real-time search methodology leverages logical reasoning techniques and neural-guided heuristics. Another challenge is to resolve the ambiguity in examples, since many programs can satisfy few examples. Microsoft’s ML-based ranking techniques often select an intended program from among the many that satisfy the examples. Microsoft also leverages active-learning-based user interaction models that facilitate a bot-like conversation with the user. Microsoft PROSE SDK exposes these generic search and ranking algorithms (for noncommercial use), allowing advanced developers to construct PBE capabilities for new task domains.

Photo of Sumit Gulwani

Sumit Gulwani

Microsoft

Sumit Gulwani is a partner research manager at Microsoft, where he leads the PROSE research and engineering team that develops APIs for program synthesis (programming by examples and natural language) and incorporates them into real products. He is the inventor of the popular Flash Fill feature in Microsoft Excel, used by hundreds of millions of people. He has published 120+ peer-reviewed papers in top-tier conferences and journals across multiple computer science areas, delivered 40+ keynotes and invited talks at various forums, and authored 50+ patent applications (granted and pending). Sumit is a recipient of the prestigious ACM SIGPLAN Robin Milner Young Researcher Award, ACM SIGPLAN Outstanding Doctoral Dissertation Award, and the President’s Gold Medal from IIT Kanpur.