Sep 23–26, 2019
Please log in

Spark on Kubernetes for data science

Jordan Volz (Dataiku)
4:35pm5:15pm Thursday, September 26, 2019
Location: 1E 07/08
Average rating: ***..
(3.67, 3 ratings)

Who is this presentation for?

  • Data scientists, data engineers, and analytics managers




Data science has benefitted greatly from advances in big data and containerization technologies. Spark is the leading platform for data engineering and data science at scale. Kubernetes is the leading container orchestration service. Spark on Kubernetes is a winning combination for data science that stitches together a flexible platform harnessing the best of both worlds. Although still very experimental and young, Spark on Kubernetes shows tremendous promise and should be something all data science organizations are aware of.

Jordan Volz gives a brief overview of Spark and Kubernetes, explaining the history of each and why they are so crucial to the modern data scientist. He explores the Spark on Kubernetes project and why it’s an ideal fit for data scientists who may have been dissatisfied with other iterations of Spark in the past. He also dives into Spark on Kubernetes as the go-to platform in cloud native architectures as organizations begin to modernize their older on-premises architectures and ready them for cloud deployments. He shows some concrete examples to whet your appetite and get you excited to go home and start experimenting with Spark on Kubernetes for yourself.

Prerequisite knowledge

  • Familiarity with big data and containerization ideas (useful but not required)

What you'll learn

  • Learn how Spark and Kubernetes combine forces to create the next go-to platform for data science on cloud native architectures
Photo of Jordan Volz

Jordan Volz


Jordan Volz is a senior data scientist at Dataiku, where he helps customers design and implement ML applications. Previously, Jordan specialized in big data technologies as a systems engineer at Cloudera and enterprise search technology as a technical consultant at Autonomy, frequently working with large financial organizations in the US and Canada. He holds degrees from Bard College and the University of Amherst, and he’s academically trained in pure mathematics.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires