SRE
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
Here are 1,002 public repositories matching this topic...
Language:All
Sort:Most stars
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
- Updated
Oct 7, 2025 - Python
A curated list of amazingly awesome open-source sysadmin resources.
- Updated
Oct 26, 2025
DevOps Roadmap for 2025. with learning resources
- Updated
Oct 2, 2025
A curated list of Site Reliability and Production Engineering resources.
- Updated
Aug 28, 2025
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
- Updated
Nov 17, 2025 - JavaScript
Terraform Pull Request Automation
- Updated
Nov 26, 2025 - Go
Site Reliability Engineer Interview Preparation Guide
- Updated
Nov 2, 2025
⭐ 【出版书籍】京东购买链接https://item.jd.com/14531549.html 深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。
- Updated
Nov 27, 2025 - JavaScript
At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
- Updated
Aug 13, 2024 - HTML
Coroot is an open-source observability and APM tool with AI-powered Root Cause Analysis. It combines metrics, logs, traces, continuous profiling, and SLO-based alerting with predefined dashboards and inspections.
- Updated
Nov 18, 2025 - Go
StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (seehttps://exchange.stackstorm.org) and ChatOps. Installer athttps://docs.stackstorm.com/install/index.html
- Updated
Oct 22, 2025 - Python
Compilation of public failure/horror stories related to Kubernetes
- Updated
Aug 23, 2020 - HTML
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
- Updated
Nov 26, 2025 - Groovy
Kubernetes prompt info for bash and zsh
- Updated
May 25, 2025 - Shell
A curated list of awesome DevOps platforms, tools, practices and resources
- Updated
Nov 23, 2025 - Python
CDN Up and Running - Building a CDN from Scratch to Learn about CDN, Nginx, Lua, Prometheus, Grafana, Load balancing, and Containers.
- Updated
May 4, 2024 - Lua
A checklist of anyone practicing Site Reliability Engineering
- Updated
Mar 29, 2024
- Followers
- 145 followers
- Website
- github.com/topics/sre
- Wikipedia
- Wikipedia