Apps Behaving Badly

Operations and Culture
Location: Hall 1 B/C Level:
Presentation: Apps Behaving Badly Presentation [PDF]
Average rating: ***..
(3.89, 35 ratings)

What to do when something goes horribly wrong in production? Well of course we hope that it never happens, but there are occasions when mistakes occur or soething unexpected comes up and your servers start chewing memory and not completing connection, everything is going to hell.
At the guardian our CMS has a number of architecture decisions made that allow us to recover from almost all forms of failure, and we’ll detail how some of these work, and why we made them work they way we chose to.
Once you’ve managed to patch the system into such a state that it can recover, the next vital task is to reason out why it happened and how we can fix it. There is a method that we use when addressing serious site failures, and a number of tools and approaches that you can use after the fact to try to reinterpret what happened and trace back in time.

Photo of Michael Brunton-Spall

Michael Brunton-Spall

Guardian News and Media

Michael Brunton-Spall is the Developer Advocate for the Guardian. He has worked at the Guardian for three years now, helping to build and scale the website. He has spent a lot of time helping to setup and run the platform team that manages internal, behind the scenes, performance and scalability issues.
As a Developer Advocate, Michael speaks at conferences, organises conferences, supports users of the API’s and does training.

Photo of Lisa van Gelder

Lisa van Gelder

Guardian News and Media

Lisa van Gelder is one of the Guardian’s senior web developers. Lisa has been developing software for 12 years and has been involved in building and scaling the Guardian’s main website as well as the comments system. Lisa has worked closely with Operations to diagnose and debug apps in production and is experienced in supporting the cleanup and diagnosis of major performance issues.

Comments on this page are now closed.

Comments

Ian McDowall
10/11/2011 17:46 CET

Very interesting, particularly the ideas of killing misbehaving apps and having them restart in 60 seconds.

Bulletin

Bulletin

  • ip-label
  • Compuware Corporation
  • dynaTrace
  • Keynote Systems
  • New Relic
  • Citrix Systems
  • Google
  • Apica
  • AppDynamics
  • CDNetworks
  • Cotendo
  • Dyn Inc.
  • ImmobilienScout24
  • Spil Games
  • Dyn Inc.

For information on exhibition and sponsorship opportunities at the conference, contact Gloria Lombardo at glombardo@oreilly.com

View a list of Velocity Europe contacts