
The Art of Site Reliability Engineering (SRE) with Azure: Building and Deploying Applications That Endure
- Length: 297 pages
- Edition: 1
- Language: English
- Publisher: Apress
- Publication Date: 2022-10-10
- ISBN-10: 1484287037
- ISBN-13: 9781484287033
- Sales Rank: #0 (See Top 100 Books)
Gain a foundational understanding of SRE and learn its basic concepts and architectural best practices for deploying Azure IaaS, PaaS, and microservices-based resilient architectures.
The book starts with the base concepts of SRE operations and developer needs, followed by definitions and acronyms of Service Level Agreements in real-world scenarios. Moving forward, you will learn how to build resilient IaaS solutions, PaaS solutions, and microservices architecture in Azure. Here you will go through Azure reference architecture for high-available storage, networking and virtual machine computing, describing Availability Sets and Zones and Scale Sets as main scenarios. You will explore similar reference architectures for Platform Services such as App Services with Web Apps, and work with data solutions like Azure SQL and Azure Cosmos DB.
Next, you will learn automation to enable SRE with Azure DevOps Pipelines and GitHub Actions. You’ll also gain an understanding of how an open culture around post-mortems dramatically helps in optimizing SRE and the overall company culture around managing and running IT systems and application workloads. You’ll be exposed to incent management and monitoring practices, by making use of Azure Monitor/Log Analytics/Grafana, which forms the foundation of monitoring Azure and Hybrid-running workloads.
As an extra, the book covers two new testing solutions: Azure Chaos Studio and Azure Load Testing. These solutions will make it easier to test the resilience of your services.
After reading this book, you will understand the underlying concepts of SRE and its implementation using Azure public cloud.
What Will You Learn:
- Learn SRE definitions and metrics like SLI/SLO/SLA, Error Budget, toil, MTTR, MTTF, and MTBF
- Understand Azure Well-Architected Framework (WAF) and Disaster Recovery scenarios on Azure
- Understand resiliency and how to design resilient solutions in Azure for different architecture types and services
- Master core DevOps concepts and the difference between SRE and tools like Azure DevOps and GitHub
- Utilize Azure observability tools like Azure Monitor, Application Insights, KQL or Grafana
- Understand Incident Response and Blameless Post-Mortems and how to improve collaboration using ChatOps practices with Microsoft tools
Who Is This Book For:
IT operations administrators, engineers, security team members, as well as developers or DevOps engineers.
Table of Contents About the Author About the Technical Reviewer Acknowledgments Foreword Introduction Chapter 1: The Foundation of Site Reliability Engineering The History of Site Reliability Engineering Why SRE Is Not DevOps 2.0 Identify Best Practices Around SRE Automate Everything Identify Acceptable Service Levels Be Focused on Engineering Understand the Challenges of SRE Clarify Prerequisites to the Role of SRE Summary Chapter 3: Azure Well-Architected Framework (WAF) Understanding Well-Architected Framework (WAF) Concepts WAF – Reliability Building Block On-Premises Is (Way) Different Than Cloud Architecture Cloud Is Not 100% Highly Available Observability Is Key DevOps and Automation Self-Remediation Reliability Checklists Testing Applications for Resiliency Well-Architected Framework Assessment Summary Chapter 4: Architecting Resilient Solutions in Azure What Is Resiliency? Azure Platform Resiliency Availability Sets Availability Zones Region Pairs and Azure Site Recovery Resiliency Based in Numbers Resiliency on Application Design Mainly Used Components/Platform Features for Resilient Solutions Autoscaling Load Balancer Replication/Redundancy Resilient Architecture Examples IaaS Resilient Architecture PaaS Resilient Architecture Microservices Architecture Testing Resiliency on Azure Summary Chapter 5: Automation to Enable SRE with GitHub Actions/Azure DevOps/Azure Automation Automation for SRE CI/CD Automation with DevOps What Is DevOps Continuous Integration (CI) Continuous Delivery/Deployment (CD) Shift-Left Testing in DevOps Secure DevOps Infrastructure as Code (IaC) Configuration as Code with DSC/Azure Automation/Guest Configuration Azure Policy Guest Configuration Azure Pipelines [DEMO] CI/CD Multistage YAML Pipeline GitHub Actions Modern Deployment Strategies Rolling Deployment Blue-Green Deployment Feature Flags Canary Deployments/Ring-Based Deployment Dark Launching A/B Testing [DEMO] Modern Deployments with GitHub Actions and Azure App Configuration Summary Chapter 6: Monitoring As the Key to Knowledge Operational Awareness SLI/SLO/SLA Error Budget/Burn Rate Observability vs. Monitoring Azure Service Health Azure Monitor Data Sources Visualize Azure Dashboards Metrics Explorer (Metrics) Azure Workbooks Azure Monitor Insights Grafana Power BI Analyze Azure Monitor Logs Log Analytics/Azure Monitor Logs Kusto Query Language (KQL) Azure Resource Graph Application Insights Instrumentation Options/Setup Features Application Map Smart Detection Live Metrics Stream Transaction Search Availability Failures/Performance Troubleshooting Guides Logs Usage (User Behavior) Customized Application Insights Using SDK Application Insights API Advanced Configuration for Application Insights Azure Monitor Alerts [DEMO] Tracking SLI/SLO/SLA Using Application Insights and Log Analytics Azure DevOps GitHub Summary Chapter 7: Efficiently Handle Incident Response and Blameless Postmortems Incident Response (IR) Incident Response Pillars Roles On-Call/Rotations Incident Tracking/Detection Communication and ChatOps ChatOps Eradication/Remediation Measuring Performance [DEMO] Incident Response Blameless Postmortems Best Practices/Tips Summary Chapter 8: Azure Chaos Studio (Preview) and Azure Load Testing (Preview) Intro to Chaos Engineering Chaos Monkey Principles of Chaos (Engineering) Azure Chaos Studio Azure Chaos Studio Architecture Onboarding an Azure VM to Chaos Studio Onboarding an AKS Cluster to Chaos Studio Load/Performance Testing Azure Load Testing Azure Load Testing for Azure Container App Summary Index
How to download source code?
1. Go to: https://github.com/Apress
2. In the Find a repository… box, search the book title: The Art of Site Reliability Engineering (SRE) with Azure: Building and Deploying Applications That Endure
, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.