Post-incident reviews learning from failure for improved incident response

Anyone who works with technology knows that eventually something will go wrong—even in today’s complex, distributed, and highly available IT systems. In the battle to maintain uninterrupted service, your DevOps teams require updated methods for detecting and solving problems fast. In this report, au...

Descripción completa

Detalles Bibliográficos
Otros Autores: Hand, Jason, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Sebastopol, CA : O'Reilly Media [2017]
Edición:First edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630388406719
Descripción
Sumario:Anyone who works with technology knows that eventually something will go wrong—even in today’s complex, distributed, and highly available IT systems. In the battle to maintain uninterrupted service, your DevOps teams require updated methods for detecting and solving problems fast. In this report, author Jason Hand explains that effective post-incident reviews today encourage team members to play a key role in continuously improving the system. Traditional techniques for conducting post-incident analyses don’t work well in modern IT organizations, mainly because the command-and-control approach offers team members no incentive to explore the system and detect flaws when they occur. This report presents an up-to-date approach to post-incident reviews that embraces the human element and adds more eyes for discovering system flaws and potential improvements. Understand why sustained success depends on a core value of continuous improvement Examine why traditional post-incident approaches, such as Root Cause Analysis, do little to provide greater availability and reliability of IT services Understand the role that team members can play in discovering system flaws Learn why it’s often difficult to determine the cause and effect of outages in complex systems Get a case study that examines the unique phases of an outage incident Explore post-incident analysis in depth by moving away from causes and going deeper into the phases of the incident lifecycle
Descripción Física:1 online resource (1 volume) : illustrations
Bibliografía:Includes bibliographical references.
ISBN:9781491986998
9781491986950