Sumario: | Site reliability Engineering (SRE) as the name suggests is the art of maintaining stability, quality and thus reliability of a service or a feature that an application has to offer its end users. Reliability of production systems is directly proportional to the revenue of your company and one of the most important factors for growing your business over time. Several organizations use SRE to ensure all critical applications they built remain available throughout their life-span, even in midst of season peaks, infrastructure maintenance, and planned or unplanned software updates. Site reliability Engineers are responsible for maintaining the uptime of these systems and need to possess several skills to achieve maximum availability/minimum downtime. In this course, you will learn what it takes to be a modern SRE, with deep-dive into principles and core concepts. We will cover different aspects of observability (metrics, logging), monitoring, change management, SLOs, disaster recovery, Scale-up, tooling, troubleshooting and timeline based no-blame post mortems. At the end of this course, you will have a detailed understanding of who is an SRE, what exactly they do and what it takes to be a successful SRE.
|