©DevOps Institute SREP v1.2 Course Description
Site Reliability Engineering (SRE) Practitioner
SM
Course Description
DURATION - 24 Hours
Introduces a range of practices for advancing service reliability engineering through a
mixture of automation, organizational ways of working and business alignment. Tailored
for those focused on large-scale service scalability and reliability.
OVERVIEW
The SRE (Site Reliability Engineering) Practitioner course introduces ways to scale
services economically and reliably in an organization. It explores strategies to improve
agility, cross-functional collaboration, and transparency of health of services towards
building resiliency by design, automation and closed loop remediations.
The course aims to equip participants with the practices, methods, and tools to engage
people across the organization involved in reliability using real-life scenarios and case
stories. Upon completion of the course, participants will have tangible takeaways to
leverage when back in the office such as implementing SRE models that fit their
organizational context, building advanced observability in distributed systems, building
resiliency by design and effective incident responses using SRE practices.
The course is developed by leveraging key SRE sources, engaging with thought-leaders
in the SRE space and working with organizations embracing SRE to extract real-life best
practices and has been designed to teach the key principles & practices necessary for
starting SRE adoption.
This course positions learners to successfully complete the SRE Practitioner certification
exam.
COURSE OBJECTIVES
At the end of the course, the following learning objectives are expected to be
achieved:
1. Practical view of how to successfully implement a flourishing SRE culture in your
organization.
2. The underlying principles of SRE and an understanding of what it is not in terms of
anti-patterns, and how you become aware of them to avoid them.
3. The organizational impact of introducing SRE.
4. Acing the art of SLIs and SLOs in a distributed ecosystem and extending the usage of
Error Budgets beyond the normal to innovate and avoid risks.