413 Site Reliability and Production Engineering Resources & Tools

Source

What is Site Reliability Engineering (SRE)? Fundamentally, it’s what happens when you ask a software engineer to design an operations function. SRE is a people discipline focused on reliability, availability, and performance of software systems, whether web applications or systems software. SRE is a specialized team role, not a job description. SRE is a subset of Site Reliability Engineering, a methodology for designing, building, and operating large distributed systems reliably.

Site Reliability Engineering is a management philosophy introduced by Google in 2008 to describe its internal operations model. The goal of the site reliability engineering team is to create and maintain a platform that can be easily and frequently deployed and updated without any disruption to either services or users. To achieve this goal, the SRE team usually works closely with other teams, such as developers and designers. On large sites, the SRE team also maintains an organizational structure that allows it to move quickly and coordinate projects.

This post is a curated list of awesome Site Reliability and Production Engineering resources. These resources include books, articles, blogs, newsletters covering various topics such as culture, reliability, monitoring, planning, SLA and many more.

Books

  1. 3 Free SRE Ebooks by Google
  2. Post-Incident Reviews: Learning from Failure for Improved Incident Responses
  3. How to Monitoring the SRE Golden Signals (E-Book)
  4. Engineering Reliable Mobile Applications: Strategies for Developing Resilient Native Mobile Applications

Culture

  1. What is Site Reliability Engineering?
  2. Keys To SRE by Ben Treynor
  3. Google SRE Resources
  4. Notes from Production Engineering by Pedro Canahuati
  5. PostOps: Recovery from Operations
  6. Love DevOps? Wait ’till you meet SRE
  7. How Google Does Planet-Scale Engineering for Planet-Scale Infra
  8. Site Reliability Engineering at Facebook
  9. A History of Site Reliability Engineering at Uber
  10. Case Study: Adopting SRE Principles at StackOverflow
  11. Site Reliability Engineering at Dropbox
  12. Site Reliability Engineers — Keeping Google up and running 24/7
  13. From Sys Admin to Netflix SRE
  14. [email protected]: Thousands of DevOps Since 2004
  15. Transactional System Administration Is Killing Us and Must be Stopped
  16. A hierarchy of SRE needs
  17. PostOps: A Non-Surgical Tale of Software, Fragility, and Reliability
  18. SRE: An incomplete guide to cultural Narnia
  19. Putting Together Great SRE Teams
  20. Toil: A Word Every Engineer Should Know
  21. Engineering Reliability into Web Sites: Google SRE
  22. DEVOPS & SRE AMA – Building High Performance Organizations
  23. John Allspaw’s AMA on Incident Analysis and Postmortems
  24. Site Reliability Engineering with Paul Newson
  25. How SysAdmins Devalue Themselves
  26. The Softer Side of DevOps
  27. SRE, noun. See also: confidence, trust.
  28. Site Reliability Engineering with Stephen Weinberg
  29. We are the Google Site Reliability team. We make Google’s websites work. Ask us Anything!
  30. We are the Google Site Reliability Engineering team. Ask us Anything!
  31. The Ops Identity Crisis
  32. The Irreproducibility Of Bugs In Large-Scale Production Systems
  33. SE-Radio Episode 276: Björn Rabenstein on Site Reliability Engineering
  34. Microservices, DevOps and Production Complexity
  35. Introducing Google Customer Reliability Engineering
  36. Evolution or Rebellion? The rise of Site Reliability Engineers (SRE)
  37. The difference between Site Reliability Engineering, System Administration, and DevOps
  38. SRE in the Small and in the Large
  39. SBSRE Meetup: Different SRE roles and challenges(Netflix)
  40. Panel: Who/What Is SRE?
  41. Hope Is Not a Strategy
  42. Tenets of SRE
  43. Site Reliability Engineering Demystified
  44. Is Site Reliability Engineering the True ‘Ops’ in DevOps?
  45. SRE vs. DevOps vs. Cloud Native: The Server Cage Match
  46. SRE: What’s The Big Idea?
  47. Building the SRE Culture at LinkedIn
  48. Podcast #111 – SRE: Occasionally Maintaining Infrastructure That You Hate
  49. Splicing SRE DNA Sequences in the Biggest Software Company on the Planet
  50. Why should your app get SRE support? – CRE life lessons
  51. How SREs find the landmines in a service – CRE life lessons
  52. Making the most of an SRE service takeover – CRE life lessons
  53. The Cloudcast #301: SRE and Infrastructure Operations (Podcast)
  54. The SRE model
  55. Onboarding New Site Reliability Engineers
  56. Building Blocks for Site Reliability At Google
  57. Beyond Google SRE: What is Site Reliability Engineering like at Medium?
  58. Intelligent Site Reliability Engineering – A Machine Learning Perspective
  59. A crash course in LinkedIn’s global site operations
  60. Google’s Site Reliability Engineering with Todd Underwood
  61. What is Site Reliability Engineering? (VMware)
  62. A Gentle Introduction to SRE
  63. Understanding Site Reliability Engineering through Movies and Books
  64. GOTO 2017 • Site Reliability Engineering at Google • Christof Leng
  65. The Makeup of Successful Geographically-Distributed SRE Teams
  66. Tech Leadership in SRE
  67. The Azure Podcast: Episode 227 – Azure SRE
  68. The human scalability of “DevOps”
  69. Podcast: Site Reliability Management with Mike Hiraga
  70. How a cat inspired system reliability at Knowlarity
  71. Getting Started with Site Reliability Engineering
  72. Practical Applications of the Dickerson Pyramid by Nat Welch
  73. LinkedIn’s Kurt Andersen Uncovers Blindspots in SRE Implementations
  74. Interview with Betsy Beyer, Stephen Thorne of Google
  75. Less Risk Through Greater Humanity – Dave Rensin
  76. Getting Started with SRE – Stephen Thorne, Google
  77. Building Successful SRE in Large Enterprises
  78. Solving Reliability Fears with Site Reliability Engineering
  79. SRE vs. DevOps: competing standards or close friends?
  80. How to Avoid the 5 SRE Implementation Traps that Catch Even the Best Teams
  81. Reliability Engineering – The Essential Discipline for Complex Systems
  82. The Modern Site Reliability Workbench on Top of OCI
  83. SRE in the Third Age
  84. About SRE and how (not) to apply it
  85. Transitioning a typical engineering ops team into an SRE powerhouse
  86. Making a Lion Bulletproof: SRE in Banking
  87. Identifying and tracking toil using SRE principles
  88. From Ops to SRE: Evolution of the OpenShift Dedicated

RELATED

Other Related Posts

  1. 3 Free Site Reliability Engineering (SRE) Ebooks by Google – 2020
    SRE is what you get when you treat operations as if it’s a software problem. 3 Free Ebooks on SRE – Building Secure and Reliable Systems, The Site Reliability Workbook and Site Reliability Engineering.
  2. Problem-Solving Web Design: Strategies for Efficient Websites – 2018
    This ebook is all devoted to strategies and practices of problem-solving web design. We offer you the overview of the practical questions that could rise in the process of creating websites for different purposes.
  3. Other Free Web Design Ebooks and Resources

via getfreeebooks.com

Team

  1. Meeting reliability challenges with SRE principles
  2. A quick introduction to SRE principles
  3. The SRE I Aspire to Be
  4. Taming Operational Load with VMware CRE
  5. SRE Cultural Values
  6. Are we there yet? Thoughts on assessing an SRE team’s maturity

Education

  1. Panel: Educating SRE
  2. From Zero to Hero: Recommended Practices for Training your Ever-Evolving SRE Teams
  3. New to an SRE team?
  4. The Systems Engineering Side of Site Reliability Engineering
  5. Graduating from Bootcamp and interested in becoming a Site Reliability Engineer?
  6. So you want to be a Site Reliability Engineer?
  7. Spiraling Ops Debt & the SRE Coding Imperative
  8. So you want to be an SRE?
  9. Career Profiles/Site Reliability Engineer
  10. What is the role of a Site Reliability Engineer?
  11. Lynda.com: DevOps Foundations: Site Reliability Engineering
  12. Incident Management Training: Wheel of Misfortune
  13. The Ultimate Guide to Structuring a 90-Day Onboarding Plan
  14. SRE fundamentals: SLIs, SLAs and SLOs
  15. How to Get Into SRE
  16. Do you have an SRE team yet? How to start and assess your journey
  17. How SRE teams are organized, and how to get started
  18. Why SRE Documents Matter
  19. How to get started with site reliability engineering (SRE)
  20. Duties of a Site Reliability Engineering Manager
  21. Designing distributed systems using NALSD flashcards
  22. Training Site Reliability Engineers: What Your Organization Needs to Create a Learning Program
  23. SRE Classroom: Distributed PubSub workshop
  24. School of SRE: Curriculum for onboarding non-traditional hires and new grads

Hiring

  1. SRE Hiring
  2. Hiring SREs at LinkedIn
  3. Hiring Site Reliability Engineers
  4. Hiring your first SRE
  5. Growing the Site Reliability Team at LinkedIn: Hiring is Hard
  6. Engineering Manager – Site Reliability Engineering Interview Preparation

Reliability

  1. The Realities of the Job of Delivering Reliability
  2. Fail at Scale by Ben Maurer
  3. Embracing Failure: Fault-Injection and Service Reliability
  4. 10 Years of Crashing Google
  5. How we break things at Twitter: failure testing
  6. Reliable Cron across the Planet
  7. Push our limits – reliability testing at Twitter
  8. Weathering the Unexpected
  9. SRE Hour: Tech Talks by Box & Yelp
  10. Simplicity: A Prerequisite for Reliability
  11. The Two Sides to Google Infrastructure for Everyone Else
  12. How Embracing Continuous Release Reduced Change Complexity
  13. Making “Push On Green” a Reality
  14. BeyondCorp: A New Approach to Enterprise Security
  15. Brainstorming Failure by Jeff Smith
  16. The Ripple Effect Of Outages And Downtime Cannot Be Underestimated
  17. The infrastructure behind Twitter: efficiency and optimization
  18. Dickerson’s Hierarchy of Reliability
  19. The Morning Paper on Operability
  20. Production is all that matters
  21. Using load shedding to survive a success disaster – CRE life lessons
  22. How to avoid a self-inflicted DDoS Attack – CRE life lessons
  23. Don’t gamble when it comes to reliability
  24. Resilience Engineering: Learning to Embrace Failure
  25. The Infrastructure Behind Twitter: Scale
  26. Scaling Reliability at Twitter: So You Want to Add a 9
  27. Principles Of Chaos Engineering
  28. Chaos Engineering
  29. Available…or not? That is the question – CRE life lessons
  30. How Google Backs Up The Internet Along With Exabytes Of Other Data
  31. Performance, Scalability, And High Availability: 3 Key Infrastructure Adaptability Requirements
  32. The Production Environment at Google
  33. Reliable releases and rollbacks – CRE life lessons
  34. How release canaries can save your bacon – CRE life lessons
  35. Things I Learned Managing Site Reliability for Some of the World’s Busiest Gambling Sites
  36. Every Day Is Monday in Operations
  37. Under the Hood: Ensuring Site Reliability
  38. Designing reliable systems with cloud infrastructure (Google Cloud Next ’17)
  39. A Google SRE explores GitHub reliability with BigQuery
  40. Know thy enemy: how to prioritize and communicate risks – CRE life lessons
  41. Chaos Engineering resources
  42. CRE life lessons: What is a dark launch, and what does it do for me?
  43. Why you should pick strong consistency, whenever possible
  44. The Network is Reliable
  45. Are You Load Balancing Wrong?
  46. How production engineers support global events on Facebook
  47. Google: A Collection Of Best Practices For Production Services
  48. Canary Analysis Service
  49. Tips for High Availability
  50. Progressive Service Architecture At Auth0
  51. Google Cloud Production Guideline
  52. production readiness
  53. Trust By Design: The Fusion of Operational Maturity and Risk Modeling
  54. Top Seven Myths of Robust Systems
  55. Taming chaos: Preparing for your next incident
  56. PID Loops and the Art of Keeping Systems Stable
  57. Are you ready for production?
  58. Production Checklist for Web Apps on Kubernetes
  59. Finding a problem at the bottom of the Google stack
  60. Rethinking Task Size in SRE
  61. How maintenance windows affect your error budget
  62. The Production Readiness Spectrum
  63. Generic mitigations

Monitoring & Observability & Alerting

  1. A Working Theory-of-Monitoring
  2. The Evolution of Monitoring Systems at Google – Tony Rippy
  3. Monitoring without Infrastructure @ Airbnb
  4. Monitoring distributed systems
  5. Observability at Uber Engineering: Past, Present, Future
  6. The 4 Golden Signals of API Health and Performance in Cloud-Native Applications
  7. My Philosophy on Alerting by Rob Ewaschuk
  8. Time To Detect – Netflix
  9. Why Percentiles Don’t Work the Way you Think
  10. Building Twitter’s Next-Gen Alerting System
  11. Instrumentation: Worst case performance matters
  12. Instrumentation: What does ‘uptime’ mean?
  13. Incidents + Outages at CircleCI: Our Playbook and What We’ve Learned
  14. An introduction to monitoring and alerting with timeseries at scale, with Prometheus
  15. Detecting outliers and anomalies in realtime at Datadog
  16. How to Monitor the SRE Golden Signals
  17. Monitoring in a DevOps World
  18. Monitoring Your Monitoring’s Monitoring
  19. Observability: the new wave or buzzword?
  20. Monitoring Isn’t Observability
  21. Monitoring in the time of Cloud Native
  22. Principles of Monitoring Microservices
  23. The Many Ways Your Monitoring Is Lying to You
  24. GitOps Part 3 – Observability
  25. Want to Debug Latency?
  26. Debugging Latency in Go 1.11
  27. Alerting on SLOs like Pros
  28. Applied Alerting Philosophy
  29. Observations on Observability
  30. Deploys: It’s Not Actually About Fridays
  31. Site Reliability Engineering Best Practices for Data Pipelines
  32. Elastic Observability in SRE and Incident Response

On-Call

  1. Being an On-Call Engineer: A Google SRE Perspective
  2. Inside Atlassian: how our site reliability engineers do incident management
  3. Inside Atlassian: how IT & SRE use ChatOps to run incident management
  4. Incident Response at Heroku
  5. Who’s On Call?
  6. SysAdvent – Day 6 – No More On-Call Martyrs
  7. On Being On Call
  8. The On-Call Handbook
  9. Incident management at Google — adventures in SRE-land
  10. How Spotify and GOV.UK handle on call, and more
  11. Run Book / Operations Manual template
  12. Automating Your Oncall: Open Sourcing Fossor and Ascii Etch
  13. Project STAR*: Streamlining Our On-Call Process
  14. [email protected]: Managing Incidents Part I
  15. [email protected]: Managing Incidents Part II
  16. How To Establish a High Severity Incident Management Program
  17. How Your Systems Keep Running Day After Day – John Allspaw
  18. On-call doesn’t have to suck
  19. Why, as a Netflix infrastructure manager, am I on call?
  20. Oncall and Sustainable Software Development
  21. On Call Rotations: How Best to Wake Devs Up in the Middle of the Night
  22. Understanding The Role Of The Incident Manager On-Call (IMOC)
  23. 3 Ways to Minimize the Impact of High Severity Incidents
  24. Advice to Management Teams While Enrolling Changes to On-Call Systems
  25. Moving Past Shallow Incident Data
  26. Sustainable On-Call
  27. dotScale 2017 – Aish Raj Dahal – Chaos management during a major incident
  28. Incident Management at Netflix Velocity
  29. Incidents, fixes, and the day after
  30. 10 Steps to Develop an Incident Response Plan You’ll ACTUALLY Use
  31. Checklists: a stupidly simple but valuable operational gift
  32. How to write a status page update
  33. Atlassian Incident Handbook
  34. PagerDuty Incident Response Handbook
  35. Avoiding Burnout for SREs
  36. Better On-Call the SRE way
  37. Managing Incidents at Monzo
  38. Making On-Call Not Suck
  39. How we (Monzo) respond to incidents
  40. How we’ve evolved on-call at Monzo
  41. Code Yellow: When Operations Isn’t Perfect
  42. MTTR is dead, long live CIRT
  43. Extended Dreyfus Model for Incident Lifecycles
  44. Inhumanity of Root Cause Analysis
  45. Incident insights from NASA, NTSB, and the CDC
  46. My week shadowing a GitLab Site Reliability Engineer
  47. How our production team runs the weekly on-call handover
  48. Writing Runbook Documentation When You’re An SRE
  49. Incident response, programs and you(r startup)
  50. An Incident Command Training Handbook
  51. Shrinking the time to mitigate production incidents
  52. Incident writeup as sociological storytelling

Post-Mortem

  1. A collection of post-mortems
  2. Collection of Kubernetes Failure Stories
  3. Blameless PostMortems and a Just Culture
  4. A Tale of Postmortems
  5. Building a Blameless Post-Mortem Culture with Jason Hand
  6. The infinite hows
  7. Failure is Always An Option: How a Blameless Culture Leads to Better Results
  8. SysAdvent – Day 1 – Why You Need a Postmortem Process
  9. Etsy’s Debriefing Facilitation Guide for Blameless Postmortems
  10. Writing Your First Postmortem
  11. How to Write Great Outage Post-Mortems
  12. A collection of postmortem templates
  13. Embracing Feedback
  14. Postmortem Action Items: Plan the Work and Work the Plan
  15. Social Issues In Postmortems
  16. Google Has an Official Process in Place for Learning From Failure–and It’s Absolutely Brilliant
  17. Postmortem culture: how you can learn from failure
  18. re:Work – Postmortem discussion template
  19. Post-mortems to the rescue
  20. Postmortem Action Items: Plan the Work and Work the Plan
  21. Why Every Company Can Benefit from a Blameless Culture
  22. It’s dead, Jim: How we write an incident postmortem
  23. Our incident postmortem template
  24. Learn out of mistakes. Postmortems to the rescue.
  25. Improving Postmortem Practices with Veteran Google SRE, Steve McGhee

Capacity Planning

  1. Capacity Planning
  2. SouthBay SRE: Cloud Capacity Planning
  3. How do you do Capacity Planning
  4. How Back Market SREs prepared for Black Friday

Service Level Agreement

  1. If It’s in the Cloud, Get It on Paper: Cloud Computing Contract Issues
  2. Service Level Agreements in the Cloud: Who cares?
  3. Making a point with SLAs
  4. SysAdvent- Day 20 – How to set and monitor SLAs
  5. SLOs, SLIs, SLAs, oh my – CRE life lessons
  6. Service Levels and Error Budgets
  7. (Un)Reliability Budgets – Finding Balance between Innovation and Reliability
  8. The Calculus of Service Availability
  9. Availability Calculator: Calculate how much downtime should be permitted in your SLA
  10. Best practices to develop SLAs for cloud computing
  11. A Practical Guide to SLAs
  12. Building good SLOs – CRE life lessons
  13. No Grumpy Humans and Other Site Reliability Engineering Lessons from Google
  14. Consequences of SLO violations — CRE life lessons
  15. Service Level Objectives in Practice
  16. SRE Consensus Building
  17. An example escalation policy — CRE life lessons
  18. Error Budget Calculator
  19. Understanding error budget overspend – part one – CRE life lessons
  20. Good housekeeping for error budgets – part two – CRE life lessons
  21. SRE fundamentals: SLIs, SLAs and SLOs
  22. SLOs & You: A Guide To Service Level Objectives
  23. Earning Our Wings: Stories and Findings From Operating a Large-scale Concourse Deployment
  24. Nines are Not Enough: Meaningful Metrics for Clouds
  25. How many nines is my storage system?
  26. Don’t follow the sun.
  27. The Tyranny of the SLA
  28. Backblaze Durability is 99.999999999% — And Why It Doesn’t Matter
  29. DevOpsDays Chicago 2019 – The Art of SLOs
  30. The Art of SLOs Workshop Materials
  31. How to Include Latency in SLO-Based Alerting
  32. Succeeding With Service Level Objectives
  33. Putting customers first with SLIs and SLOs
  34. SRE Leadership: Have Tiered SLAs
  35. How SLOs Enable Fast, Reliable Application Delivery
  36. The Tail at Scale
  37. The Tail at Scale Revisited
  38. Defining SLOs for services with dependencies

Performance

  1. Performance Checklists for SREs
  2. South Bay SRE Meetup – Netflix Cloud Performance Team
  3. Software Performance Analysis Guided By SLOs
  4. A framework for pragmatic performance engineering

Programming

  1. Go Language for Ops and Site Reliability Engineering
  2. Go for SREs using Python
  3. Operability in Go
  4. Go Reliability and Durability at Dropbox

Misc Articles

  1. What is SRE (Site Reliability Engineering)?
  2. Here’s How Google Makes Sure It (Almost) Never Goes Down
  3. Site Reliability Engineers: “solving the most interesting problems”
  4. Site Reliability Engineers: the “world’s most intense pit crew”
  5. Site reliability engineering kicks rote tasks out of IT ops
  6. Notes on Site Reliability Engineering
  7. Adventures in SRE-land: Welcome to Google Mission Control
  8. Book Review: Site Reliability Engineering – How Google Runs Production Systems
  9. Site Reliability Engineers: “We solve cooler problems”
  10. SREcon17: Brave new world of site reliability engineering
  11. Open AWS guide
  12. 20 SRE / Devops / System Engineer Tricks
  13. Commentary on Site Reliability Engineering
  14. Site Reliability Engineering: 4 Things to Know
  15. Looking for SRE Success? Then Find the Intrapreneurs!
  16. What Team Structure is Right for DevOps to Flourish?
  17. Injured on Vacation? Applying Principles from Site Reliability Engineering to a Travel Emergency
  18. Building blameless working environment
  19. SRE Adoption Report
  20. SREs: The Happiest – and Highest Paid – in the Industry
  21. The Role of Site Reliability Engineering, Today and Tomorrow
  22. SRE as a Lifestyle Choice
  23. SRECon EMEA 2019 Recap
  24. Life of an SRE at Google – JC van Winkel
  25. Site Reliability Engineering for Native Mobile Apps – Abhijith Krishnappa
    Case study: Halodoc adaptation of SRE principles for Native Mobile Apps
  26. SRE Best Practices by InfraCloud

Blogs

  1. Brendan Gregg’s Blog
    Highly Technical Blog Posts About Systems Internals, Performance and SRE.
  2. Everything Sysadmin
    Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.
  3. High Scalability
    Technical Blog Posts About Systems Architecture.
  4. rachelbythebay
    Techincal Blog Posts.
  5. Susan J. Fowler
    Various blog posts about SRE, Software Engineering and Microservices.
  6. SysAdvent
    One article for each day of December, ending on the 25th article.
  7. Stephen Thorne’s Blog
    Blog Posts About SRE
  8. Increment
    A digital magazine about how teams build and operate software systems at scale.
  9. GopherSRE
    Blog Posts about Go and SRE.
  10. Cindy Sridharan
    Blog posts about distributed systems and their management.
  11. Blameless Blog
    Blog posts about SRE culture and practices.
  12. Resilience Roundup
    Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
  13. Squadcast Blog
    Blog posts about SRE best practices, reliability, on-call and incident management.
  14. FireHydrant Blog
    Posts about complex systems, incident response, and SRE best practices.
  15. Rootly Blog
    Incident management best practices and guides.

Newsletters

  1. DevOpsLinks
    A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
  2. KubeWeekly
    The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
  3. SRE Weekly
    Weekly Site Reliability Newsletter.
  4. O’Reilly Systems Engineering and Operations Newsletter
    Weekly systems engineering and operations news and insights from industry insiders.
  5. ChaosEngineering.news
    Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox!

Conferences & Meetups

  1. SRECon Conferences
    The Official SRE Conference.
  2. LISA Conferences
    Prominent Conference About SysAdmin/DevOps/SRE.
  3. SRE Tech Talks
    SRE Talks Hosted by Google.
  4. South Bay Site Reliability Engineering (Sunnyvale, CA) Meetup
    A Group For Individuals Who Tackle Reliability Challenges For Web-Scale Systems.
  5. San Francisco Reliability Engineering
    A Group Of People Who Are Passionate About Reliable, Performant Software Systems.
  6. Site Reliability Engineering Munich, Germany
    SRE Meetup in the greater area of Oktoberfest city.
  7. ADDO – All Day DevOps
    A 24 hour conference that is completely online and free.
  8. Site Reliability Engineering Paris, France
    SRE Meetup in the city of light.
  9. Site Reliability Engineering India
    SRE Meetup India

Twitter

  1. Google SRE Twitter Account
    Google’s SRE Twitter Account.
  2. SREBook
    The Official Twitter Account of Site Reliability Engineering Book.
  3. SREcon
    SRECon’s Official Twitter Account.
  4. SREWorkbook
    The Official Twitter Account of Site Reliability Workbook.
  5. The SRE Dev
    SRE-related Posts from dev.to
  6. Twitter SRE
    The Official Twitter Account of Twitter’s SRE team.
  7. Twitter SRE Weekly
    The Official Twitter Account of SRE Weekly Newsletter.
  8. USENIX Association
    The Official USENIX Twitter Account.
9 Likes

Fantastic share, thank you :+1:

1 Like