Sep 23–26, 2019
Please log in

Using Spark to speed up the diagnosis performance for big data applications

Ruixin Xu (Microsoft), Long Tian (Microsoft), Yu Zhou (Microsoft)
4:35pm5:15pm Thursday, September 26, 2019
Location: 1E 09
Average rating: *....
(1.00, 1 rating)

Who is this presentation for?

  • Engineers, product managers, and DevOps engineers




Cosmos is Microsoft’s internal big data analysis platform. Everyday, it processes huge numbers of data from Microsoft services like Bing, Office, Windows, Xbox, Dynamics, etc. The DevOps team is responsible for keeping the service reliable. For each live site issue, the on-call engineer has a hard deadline to mitigate the problem. The Microsoft big data team has been working on bringing integrated development environment- (IDE) style diagnosis experience to large-scale applications. However, there were several challenges for on-call engineers to use the team’s IDE diagnosis tools: it’s slow to process complex jobs with large profiles, and the IDE may crash for jobs with a profile larger than 10G; the team provides an auto-diagnosis wizard for common issues but on-call engineers still need to digger deeper into various logging systems case by case; and it requires extra effort for on-call engineers to document their troubleshooting steps.

Ruixin Xu, Long Tian, and Yu Zhou outline an experiment run to solve these challenges, replacing the diagnosis engine with Spark and using the Jupyter notebook as the frontend. The experiment results indicate that the Spark-based solution has improved the diagnosis performance significantly, especially for complex jobs with a large profile. The Jupyter notebook also brings the benefit of fast iteration and easy knowledge share. They also outline the insights they’ve gained on the journey.

Prerequisite knowledge

  • Familiarity with distributed computing, debugging, Spark, and Jupyter notebooks

What you'll learn

  • Discover how to use Spark to build troubleshooting tools for large-scale applications
Photo of Ruixin Xu

Ruixin Xu


Ruixin Xu is a senior program manager on the Azure big data team at Microsoft. Her focus areas include product design and project management, development experience in big data platforms, the software development tool chain, and Software as a Service (SaaS) offerings.

Photo of Long Tian

Long Tian


Long Tian is a Software Engineer Manger at Microsoft Big Data Analytics team. Focus on building developer experience (authoring, debugging, continuous integration and monitoring) for cloud big data services, including Spark, Hive and Azure Datalake.

Photo of Yu Zhou

Yu Zhou


Yu Zhou is a software development engineer on the Azure big data team at Microsoft, where he develops innovative big data solutions, including distributing computing systems and streaming computing. He earned his master of science degree in EE from Beijing University of Posts and Telecommunications and his bachelor of science degree in EE from Hunan University.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires