Training: June 20–21, 2016
Tutorials: June 21, 2016
Keynotes & Sessions: June 22–23, 2016
Santa Clara, CA

ELK: Moose-ively scaling your log system

Avleen Vig (Etsy)
9:00am–10:30am Tuesday, 06/21/2016
DevOps, First time at Velocity Santa Clara, Infrastructure reimagined
Location: Ballroom CD Level: Intermediate
Average rating: ****.
(4.88, 17 ratings)

This tutorial is for you because:
You are working with ELK and need to understand and scale your system.

Etsy migrated to ELK for log ingestion and searching in 2014 after a year of planning and testing. Since then, it’s grown from tens of millions of log lines to over five billion per day. During this time, Etsy has learned many lessons around choosing hardware, measuring the impact of change, and working around the limitations of Elasticsearch and Logstash and developed tooling to help monitor and continue scaling the system up.

Deploying and scaling ELK is often a trial by fire. Avleen Vig covers sizing your cluster, monitoring and extracting metrics from it, and patterns for effectively scaling it up as your logs grow and teaches you to design and grow your own ELK cluster with large log volumes. You’ll leave knowing how to correctly design and deploy your Elasticsearch cluster, how to monitor and observe changes to ELK, and how to scale your system.

The talk will be delivered in three parts:

  1. How to correctly design and deploy your Elasticsearch cluster: sizing a cluster (CPU, memory, disk, network) is a difficult task, and the advice from Elastic has usually been to try and figure it out yourself. Unfortunately, this is a slow and expensive process, but there are a lot of good initial rules you can lean on to start out.
  2. Monitoring and observing changes to ELK is opaque: Elastic sells a plugin to do this, but it isn’t cheap. Etsy has developed some strategies to incorporate the required monitoring and metrics collection into its existing tools (Nagios, Graphite, Ganglia) with great success. The performance impacts from different types of log lines and the need to benchmark Logstash plugins will also be covered.
  3. Scaling: cramming almost three years of learning into this section, Avleen covers the major issues Etsy has found. In addition to discussing how to actually scale the system up, Avleen also demonstrates which decisions you make today will most impact your ability to scale tomorrow.
Photo of Avleen Vig

Avleen Vig


Avleen Vig is a staff operations engineer at Etsy, where he spends much of his time growing the infrastructure for selling knitted gloves and cross-stitch periodic tables. Before joining Etsy, Avleen worked at several large tech companies, including EarthLink and Google, as well as a number of small successful startups.