Your easy move to serverless computing and radically simplified data processing
Suppose you wrote Python code for Monte Carlo simulations to analyze financial data. The general process involves writing the code and running a simulation over small set of data to test it. Assuming this all goes smoothly, you now must run the same code at a massive scale, with parallelism, on terabytes of data, doing millions of Monte Carlo simulations. Clearly you’d prefer not to need to learn the intricacies of setting up virtual machines, suffer long setup times for the virtual machines, nor become an expert in scaling up Python code. This is exactly where serverless computing could come to the rescue. With serverless computing, you don’t need to set up the computing environment and only pay for the actual amount of resources your application consumes rather than prepurchased units of capacity. Here you’ll learn how to easily gain these benefits.
Gil Vernik takes a deep dive into the challenge of how serverless computing can be easily used for a broad range of scenarios, like high-performance computing (HPC), Monte Carlo simulations, and data preprocessing for AI. You’ll focus on how to connect existing code and frameworks to serverless without the painful process of starting from scratch and or learning new skills. To achieve this, you’re based on the open source PyWren framework that introduces serverless computing with minimal effort, and its new fusion with serverless computing brings automated scalability and the use of existing frameworks into the picture. You can simply write a Python function and provide an input pointing to the dataset in a storage bucket. Then PyWren does the magic by automatically scaling and executing the user function as a serverless action at massive scale.
Gil demonstrates how this capability allowed IBM to run broad range of scenarios over serverless, including Monte Carlo simulations to predict future stock prices and hyperparameter optimizations for ML models. IBM managed to complete the entire Monte Carlo simulation for stock price prediction in about 90 seconds with 1,000 concurrent invocations, compared to 247 minutes with almost 100% CPU utilization running the same flow over a laptop with 4 CPU cores. He’ll also show you how to bond TensorFlow and serverless for the data-preparation phases. Existing TensorFlow code can be easily adapted and benefit serverless with only minimal code modifications and without users having to learn serverless architectures and deployments.
- A basic understanding of Python, big data storage solutions like cloud object storage, and serverless computing
What you'll learn
- Learn how to connect existing code and frameworks to serverless without the painful process of starting from scratch and or learning new skills and how serverless computing may provide great benefit for different HPC flows, Monte Carlo simulations, big data, and AI processing frameworks
Gil Vernik is a researcher in the Storage Clouds, Security, and Analytics Group at IBM, where he works with Apache Spark, Hadoop, object stores, and NoSQL databases. Gil has more than 25 years of experience as a code developer on both the server side and client side and is fluent in Java, Python, Scala, C/C++, and Erlang. He holds a PhD in mathematics from the University of Haifa and held a postdoctoral position in Germany.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts