site-reliability-engineering
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
Here are 114 public repositories matching this topic...
Language:All
Sort:Most stars
A curated list of Site Reliability and Production Engineering resources.
- Updated
Aug 28, 2025
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
- Updated
Nov 17, 2025 - JavaScript
A Chaos Engineering Platform for Kubernetes.
- Updated
Nov 1, 2025 - Go
A curated list of Chaos Engineering resources.
- Updated
Dec 28, 2023
An easy to use and powerful chaos engineering experiment toolkit.(阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具)
- Updated
Nov 24, 2025 - Go
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is athttps://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
- Updated
Nov 28, 2025 - Go
Chaos testing, network emulation, and stress testing tool for containers
- Updated
May 21, 2025 - Go
Your 24/7 On-Call AI Agent - Solve Alerts Faster with Automatic Correlations, Investigations, and More
- Updated
Nov 28, 2025 - Python
A collection of postmortem templates
- Updated
Jul 12, 2023
A curated list of Site Reliability and Production Engineering Tools
- Updated
Sep 1, 2025
Web UI for Jaeger
- Updated
Nov 29, 2025 - JavaScript
This repository includes resources which are more than sufficient to prepare for google interview if you are applying for a software engineer position or a site reliability engineer position
- Updated
Aug 18, 2022
What to Read to Learn More About DevOps
- Updated
Sep 14, 2022
Curated list of good SRE interview questions.
- Updated
Aug 16, 2022
Open-source AI copilot that lets you chat with your observability data and code 🧙♂️
- Updated
Apr 24, 2025 - TypeScript
DevOps Happiness: for AI Agents & Humans. Deploy apps and infra to any cloud, in minutes. Fast, simple, cloud-native 🚀
- Updated
Nov 28, 2025 - Python
A chaos engineering platform for supporting the complete fault drill lifecycle.
- Updated
May 27, 2024 - Go
A role-playing game for incident management training
- Updated
Feb 27, 2024 - HTML
Google Site Reliability Engineering book converted in audio
- Updated
Mar 22, 2017
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
- Updated
Jan 4, 2024 - Python
- Followers
- 145 followers
- Website
- github.com/topics/sre
- Wikipedia
- Wikipedia