Instagram is a feature-rich app with over 1 billion monthly active users. The Instagram infrastructure is comprised of hundreds of servers running in different geographic locations and hosts a multitude of services such as a Python-based frontend, Cassandra key-value stores, ML ranking services, etc.
Guilin Chen and Shobhit Kanaujia pull back the curtain on how Facebook operates Instagram efficiently at scale. Guilin and Shobhit discuss infrastructure practices for supporting various initiatives for enriching user experience, such as the launch of new products and features. The objective of the various efficiency initiatives under play is to minimize the total number of servers used by all of the services that support Instagram.
Guilin and Shobhit walk you through their experiences in regard to meeting the efficiency goals at Instagram. In particular they discuss: how they arrived at their server-capacity needs—two inputs to this process are organic user-growth projection and a regression allowance for each of the services; how to define monitoring metrics (weights) to ensure services are operating within their regression allowance—these metrics help determine success or failure of meeting the efficiency goal; how they approached regression detection—they also share a snapshot of some of the techniques they have employed to address a subset of these regressions; and how to simulate DR scenarios to expose issues arising due to interdependency among services.
Guilin Chen is a software engineer at Facebook, where he works on mobile performance and leads the Instagram efficiency team.
Shobhit Kanaujia is an engineer at Facebook who specializes in full stack performance and efficiency at scale.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com