Sep 23–26, 2019
Please log in

Orchestrating data workflows using a fully serverless architecture

Tomer Levi (Fundbox)
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1E 07/08
Average rating: ****.
(4.80, 5 ratings)

Level

Intermediate

Fundbox is a growing fintech company that provides an automatic underwriting platform based on data and AI. While scheduling a limited number of data workflows is a generally manageable task, scaling to hundreds of data workflows with dependencies and diverse job types requires substantial customized engineering, complexity, and overall expensive resources. Serverless-based architectures offer an alternative to traditional resource management.

Tomer Levi explains how the data engineering team at Fundbox uses AWS Step Functions, Docker containers, and Spark to build a live, serverless data orchestration platform, focusing on the company’s decision to build a friendly, yet powerful and scalable solution. Tomer details AWS Step Functions state machines, their limitations, and how to overcome them by building custom job-scheduling and dependency features. He illustrates how resource bottlenecks were overcome using Docker containers and AWS Fargate. Fundbox’s architecture is scalable and already serves dozens of engineers, BI developers, and data scientists in the company.

Prerequisite knowledge

  • A basic understanding of serverless solutions
  • Familiarity with the challenges introduced by enterprise architectures

What you'll learn

  • Learn how Fundbox used AWS Step Functions, Docker containers, and Elastic Container Service (ECS) Fargate to build a serverless data workflow platform
  • Understand key considerations from a data engineering perspective for deploying data workflow jobs
Photo of Tomer Levi

Tomer Levi

Fundbox

Tomer Levi is a senior data engineer on the DataOps team at Fundbox, where he helps shape the data platform architecture to drive business goals. Previously, he was a data engineer at Intel’s advanced analytics group, helping to build out the data platform supporting the data storage and analysis needs of Intel Pharma Analytics Platform, an edge-to-cloud artificial intelligence solution for remote monitoring of patients during clinical trials. He’s incredibly passionate about the power of data. Tomer holds a BSc in software engineering.

Comments on this page are now closed.

Comments

Anushka Jadhav | sr software engineer
10/09/2019 4:39pm EDT

Hi, can you please post the slides for this talk

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires