Build Systems that Drive Business
June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference
San Jose, CA

How we built a global search engine for genetic data

Miro Cupak (DNAstack)
2:10pm–2:50pm Wednesday, June 13, 2018
Distributed Data, Hardware, Storage, and Datacenters
Location: 230 B Level: Beginner
Secondary topics: Systems Architecture & Infrastructure

Prerequisite knowledge

  • A basic understanding of backend web development concepts and API design practices (useful but not required)

What you'll learn

  • Explore the Beacon Network, the largest search and discovery engine of human genomic data in the world


The extremely sensitive nature of genetic data causes a major concern in genetics. A lot of life-saving information, despite having already been collected, is inaccessible. Data discovery and sharing has long been believed to be the key to unlocking new discoveries.

The Beacon Network is the largest search and discovery engine of human genomic data in the world. The system is a result of years of collaboration between developers, researchers, and scientists on a global scale and is the flagship project of the Global Alliance for Genomics and Health. Miro Cupak details the architecture and technologies behind the system with focus on the technical decisions that allow it to scale and disrupt the perception of genetic data.

Photo of Miro Cupak

Miro Cupak


Miro is a Co-founder and VP Engineering at DNAstack, where he builds a leading genomics cloud platform. He is a Java enthusiast with expertise in distributed systems and middleware, passionate about genetics and making meaningful software. Miro is the creator of the largest search and discovery engine of human genetic data, and the author of a book on parallelization of genomic queries. In his spare time, he blogs and contributes to several open-source projects.