Jupyter notebooks are transforming the way we look at computing, coding, and science. But is this the only “data scientist experience” that this technology can provide? Natalino Busa explains how you can create interactive web applications for data exploration and analysis that in the background are still powered by the well-understood and well-documented Jupyter Notebook.
Natalino shares an architecture composed of three parts: a Jupyter server-only gateway, a Python Jupyter kernel, and an Angular/Bootstrap web application. In particular, the Jupyter gateway allows data scientists to expose notebook code as RESTful API endpoints. The web app can now programmatically run notebooks simply by accessing a REST API. In the background, the Python Jupyter kernel runs notebook data science and machine learning code fragments and returns the results back as JSON data. By chaining these components, you can create beautiful, rich apps that go beyond the limit of the “notebook experience,” providing engaging data analytics journeys where coding is hidden and the UI can be more tuned toward data exploration and more intuitive and guided “data science tours.”
Natalino then explores two examples of this new breed of notebook-powered web apps: O’Reilly’s Oriole Online Tutorials and Autoscience, a project of his own design. Oriole Online Tutorials are a mixture of embedded runnable code, videos, and text that provide a rich training experience where the video is synchronized with the text. In an Oriole Online Tutorial, the embedded code is actually run in the cloud, and the results are pushed back to the browser. The Autoscience project is an example of a meta-notebook. The UI is more intuitive for non-data scientists and provides a selection of precanned analyses of datasets, such as anomaly detection, dimensionality reduction, classification, and clustering. It uses a custom open sourced Python library in the background running on a Python Jupyter kernel.
Natalino Busa is the chief data architect at DBS, where he leads the definition, design, and implementation of big, fast data solutions for data-driven applications, such as predictive analytics, personalized marketing, and security event monitoring. Natalino is an all-around technology manager, product developer, and innovator with a 15+-year track record in research, development, and management of distributed architectures and scalable services and applications. Previously, he was the head of data science at Teradata, an enterprise data architect at ING, and a senior researcher at Philips Research Laboratories on the topics of system-on-a-chip architectures, distributed computing, and parallelizing compilers.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com