4–7 Nov 2019
Please log in

Building autoscaling systems: A case study using Step Functions autoscaler

16:4517:25 Thursday, 7 November 2019
Location: Hall A3
Average rating: **...
(2.00, 1 rating)

Who is this presentation for?

  • Developers exploring autoscaling systems

Level

Beginner

Description

The challenging problem in scaling resources dynamically is to maintain a healthy system while limiting expense from unused resources. Every cloud service provider includes autoscaling as a first-class feature (AWS Lambda, SQS, Azure Cloud Functions, etc.). There are a lot of interesting implementation details and challenges for these autoscaling systems.

An autoscaler derives the resource requirements of a system by adapting to dynamic throughput or traffic pattern while maintaining your desired service-level objective. Devesh Chourasiya walks you through the primary components of any autoscaler system: signals and metrics to notify scaling requirements, service to validate signals and compute scaling request, and target deployment platform to update instance count accordingly.

He dives into details for each component and design considerations using the case study of the AWS Step Functions (SFN) autoscaler recently built at Yelp. He highlights considerations like when autoscaling is required or could be helpful (based on requests traffic or throughput of the system), how the latency between detecting scaling action and updating instance count impacts the system, and what relevant metrics to optimize for (compute cost versus acceptable degradation of system).

Yelp’s SFN autoscaler uses a combination of AWS services—CloudWatch metrics and alarms, SNS, and SQS—to generate scaling signals. A Python service acts as the brain of autoscaler system; it fetches autoscaling configs, makes decisions about scaling actions, and invokes PaaS APIs to update the instance count of any activity. The SFN autoscaler is rolled out in production for 85% of Yelp’s transactional traffic (food orders, spa reservations, etc.) and saves 34% on the compute resources per week.

You’ll hear lessons Yelp learned, like how to avoid continuous churn of scale-up and scale-down actions and suggestions on rolling out an autoscaler system.

Prerequisite knowledge

  • Familiarity with AWS Step Functions (useful but not required)

What you'll learn

  • Learn about autoscaling systems, including the architecture and major components in an autoscaling system, as well as important design and rollout considerations
Photo of Devesh Chourasiya

Devesh Chourasiya

Yelp

Devesh Chourasiya is a technical lead on the transaction core team at Yelp, providing the commerce platform for enabling consumer transactions at Yelp. His passions include building scalable, performant, and well-monitored systems. He holds a master’s degree in computer science from the University of Arizona.

  • Oracle Cloud Infrastructure
  • Cloudflare
  • JFrog
  • Akamas
  • Aqua Security Software
  • Fastly
  • Google
  • Instana
  • JetBrains
  • LaunchDarkly
  • LightStep
  • OVHcloud
  • SignalFx
  • VictorOps
  • Wayfair
  • Blameless
  • Chronosphere
  • FusionReactor
  • humanitec
  • replex GmbH
  • StackState
  • Datadog
  • GitLab
  • Gremlin
  • StormForger
  • SysEleven GmgH
  • Vamp.io

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

velocity@oreilly.com

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires