Using serverless Spark on Kubernetes for data streaming and analytics
Who is this presentation for?Data scientists or analysts
Knative is an open source framework that was introduced in 2018 to help developers and operators run serverless applications cloud natively with Kubernetes. It provides an abstraction layer that can scale your pods (workers) down to zero when processes aren’t being used and scale to your resource needs when there are jobs to run. Using this framework and Kafka and Spark, you can ingest data from non-HTTP sources and run scalable Spark jobs in a cluster, reducing costs and simplifying the workload for data engineers.
This framework simplifies the infrastructure maintenance for operators while also empowering users to handle their Spark jobs. Jay Smith and Remy Welch explain how combining Kafka with Knative Eventing enables data engineers to collect streaming data from non-HTTP based sources. Tekton can be used to aid the ETL by creating a declarative pipeline for the jobs.
- Familiarity with containers (specifically Dockerfiles), event-driven applications, ETL, and Apache Spark
What you'll learn
- Learn how to create scalable serverless Spark jobs, allowing for cost savings in resources without worrying about resource availability
- Discover how to create streaming jobs using event-driven functionality from serverless
- Learn how Kubernetes, Knative, Kafka, Spark, and Tekton can be used in serverless
Jason “Jay” Smith is a Cloud customer engineer at Google. He spends his day helping enterprises find ways to expand their workload capabilities on Google Cloud. He’s on the Kubeflow go-to-market team and provides code contributions to help people build an ecosystem for their machine learning operations. His passions include big data, ML, and helping organizations find a way to collect, store, and analyze information.
Remy Welch is a data analytics specialist at Google Cloud. She works with enterprises in San Francisco to understand best practices on collecting and analyzing data. Remy has expertise working within the gaming industry and helping them better handle data ingestion, storage, and analytics.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires