Mar 15–18, 2020

Using serverless Spark on Kubernetes for data streaming and analytics

Jay Smith (Google), Remy Welch (Google Cloud)
11:50am12:30pm Wednesday, March 18, 2020
Location: LL20C

Who is this presentation for?

Data scientists or analysts

Level

Intermediate

Description

Knative is an open source framework that was introduced in 2018 to help developers and operators run serverless applications cloud natively with Kubernetes. It provides an abstraction layer that can scale your pods (workers) down to zero when processes aren’t being used and scale to your resource needs when there are jobs to run. Using this framework and Kafka and Spark, you can ingest data from non-HTTP sources and run scalable Spark jobs in a cluster, reducing costs and simplifying the workload for data engineers.

This framework simplifies the infrastructure maintenance for operators while also empowering users to handle their Spark jobs. Jay Smith and Remy Welch explain how combining Kafka with Knative Eventing enables data engineers to collect streaming data from non-HTTP based sources. Tekton can be used to aid the ETL by creating a declarative pipeline for the jobs.

Prerequisite knowledge

  • Familiarity with containers (specifically Dockerfiles), event-driven applications, ETL, and Apache Spark

What you'll learn

  • Learn how to create scalable serverless Spark jobs, allowing for cost savings in resources without worrying about resource availability
  • Discover how to create streaming jobs using event-driven functionality from serverless
  • Learn how Kubernetes, Knative, Kafka, Spark, and Tekton can be used in serverless
Photo of Jay Smith

Jay Smith

Google

Jason “Jay” Smith is a Cloud customer engineer at Google. He spends his day helping enterprises find ways to expand their workload capabilities on Google Cloud. He’s on the Kubeflow go-to-market team and provides code contributions to help people build an ecosystem for their machine learning operations. His passions include big data, ML, and helping organizations find a way to collect, store, and analyze information.

Photo of Remy Welch

Remy Welch

Google Cloud

Remy Welch is a data analytics specialist at Google Cloud. She works with enterprises in San Francisco to understand best practices on collecting and analyzing data. Remy has expertise working within the gaming industry and helping them better handle data ingestion, storage, and analytics.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires