- Notifications
You must be signed in to change notification settings - Fork310
A reading list for services engineering, with a focus on cloud infrastructure services
NotificationsYou must be signed in to change notification settings
mmcgrana/services-engineering
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A reading list for services engineering, with a focus on cloudinfrastructure services.
We welcomesuggestions.
- Fault Injection in Production (Allspaw)
- Making Reliable Distributed Systems in the Presence of Software Errors (Armstrong)
- Highly Available Transactions: Virtues and Limitations (Bailis et al.)
- The Incident Command System (Bigley and Roberts)
- The Chubby Lock Service for Loosely Coupled Distributed Systems (Burrows)
- Bigtable: a Distributed Storage System for Structured Data (Chang et al.)
- Spanner: Google’s Globally-Distributed Database (Corbett et al.)
- Dynamo: Amazon’s Highly Available Key-Value Store (DeCandia et al.)
- MapReduce: Simplified Data Processing on Large Clusters (Dean and Ghemawat)
- The Google File System (Ghemawat et al.)
- On Designing and Deploying Internet Scale Services (Hamilton)
- Kafka: A Distributed Messaging System for Log Processing (Kreps et al.)
- Weathering the Unexpected (Krishnan)
- The Unified Logging Infrastructure for Data Analytics at Twitter (Lee et al.)
- Automatic Management of Partitioned, Replicated Search Services (Leibert et al.)
- Learning to Embrace Failure (Limoncelli et al.)
- Scaling Big Data Mining Infrastructure: The Twitter Experience (Lin and Rayboy)
- Dremel: Interactive Analysis of Web-Scale Datasets (Melnik et al.)
- Out of the Tar Pit (Moseley and Marks)
- The Log-Structured Merge-Tree (O'Neil et al.)
- In Search of an Understandable Consensus Algorithm (Ongaro and Ousterhout)
- Failure Trends in a Large Disk Drive Population (Pinheiro et al.)
- Fallacies of Distributed Computing Explained (Rotem-Gal-Oz)
- F1 - The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business (Shute et al.)
- Dapper, A Large Scale Distributed Systems Tracing Infrastructure (Sigelman et al.)
- Resident Distributed Datasets: a Fault-Tolerant Abstraction for In-Memory Cluster Computing (Zahari et al.)
- The Human Side of Postmortems (Zwieback)
- Crew Resource Management: a Positive Change for the Fire Service
- Resilience Engineering: Part I,Part II (Allspaw)
- Systems Engineering: a Great Definition (Allspaw)
- Chaos Monkey Released Into The Wild (Bennett and Tseitlin)
- Some Rules for Engineering and Operations (Black)
- Service Level Disagreements Part I,Part II (Black)
- Incuriosity Will Kill Your Infrastructure (Crayford)
- My Philosophy on Alerting (Ewaschuk)
- You Can’t Sacrifice Partition Tolerance (Hale)
- Customer Trust (Hamilton)
- Observations on Errors, Corrections, & Trust of Dependent Systems (Hamilton)
- Game Day Exercises at Stripe: Learning from
kill -9
(Hedlund) - Life Beyond Distributed Transactions: An Apostate’s Opinion (Helland)
- Notes on Distributed Systems for Young Bloods (Hodges)
- The Network is Reliable (Kingsbury)
- The Trouble with Clocks (Kingsbury)
- Call Me Maybe: Final Thoughts (Kingsbury)
- Getting Real About Distributed Systems Reliability (Kreps)
- The Log: What every software engineer should know about real-time data's unifying abstraction (Kreps)
- Incident Response at Heroku (McGranaghan)
- On HTTP Load Testing (Nottingham)
- Observability at Twitter (Watson)
- Stevey’s Google Platforms Rant (Yegge)
- Design, Lessons, and Advice from Building Distributed Systems at Google (Dean)
- Service Design Best Practices (Hamilton)
- The Field Guide To Understanding Human Error (Dekker)
- Agile Retrospectives: Making Good Teams Great (Derby et al.)
- Better: A Surgeon’s Notes on Performance (Gawande)
- The Checklist Manifesto: How to Get Things Right (Gawande)
- High Performance Browser Networking (Grigorik)
- Resilience Engineering in Practice (Hollnagel et al.)
- Effective Monitoring and Alerting (Ligus)
- Release It!: Design and Deploy Production-Ready Software (Nygard)
- The Challenger Launch Decision (Vaughan)
- Managing the Unexpected (Weick and Sutcliffe)
About
A reading list for services engineering, with a focus on cloud infrastructure services
Resources
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.
Contributors4
Uh oh!
There was an error while loading.Please reload this page.