Leading Through Reliability: Coaching, Mentoring, and Decision-Making Under Pressure

Recent Posts

🐧 Solved: Troubleshooting Login and WiFi DNS Issues in antiX Linux January 9, 2026
[RivieraDev2025] Julien Sulpis – What is Color? The Science Behind the Pixels January 8, 2026
[DevoxxGR2025] Nx for Gradle – Faster Builds, Better DX January 7, 2026
[DevoxxFR2025] Spark 4 and Iceberg: The New Standard for All Your Data Projects January 5, 2026
[NDCMelbourne2025] DIY Usability Testing When You Have No Time and No Budget – Bekah Rice January 4, 2026
[DevoxxUK2025] Maven Productivity Tips January 3, 2026
Program of Conferences 2026 January 2, 2026
[KotlinConf2025] LangChain4j with Quarkus January 1, 2026
[DotAI2024] DotAI 2024: Daniel Phiri – Bridging the Multimodal Divide: From Monoliths to Mosaic Mastery December 30, 2025
[DevoxxFR2025] Alert, Everything’s Burning! Mastering Technical Incidents December 28, 2025
[KotlinConf2024] Kotlin Multiplatform Powers Google Workspace December 26, 2025
[DevoxxGR2025] Unmasking Benchmarking Fallacies December 25, 2025
[DevoxxBE2025] Finally, Final Means Final: A Deep Dive into Field Immutability in Java December 25, 2025
[NDCOslo2024] Building a Robot Arm with .NET 8, Raspberry Pi, Blazor and SignalR – Peter Gallagher December 24, 2025
[GoogleIO2025] Adaptive Android development makes your app shine across devices December 21, 2025
[OxidizeConf2024] Moving Electrons with Rust December 21, 2025
[KotlinConf2025] Closing Panel December 18, 2025
[AWSReInventPartnerSessions2024] Simulate COBOL data handling in Java-like structure December 17, 2025
Beyond ELK: A Technical Deep Dive into Splunk, DataDog, and Dynatrace December 16, 2025
[DotJs2025] Prompting is the New Scripting: Meet GenAIScript December 15, 2025

Archives

Leading Through Reliability: Coaching, Mentoring, and Decision-Making Under Pressure

Author: Jonathan Lalou | September 22, 2025

SRE leadership isn’t only about systems—it’s about people, processes, and resilience under fire.

1) Coaching Team Members Through Debugging

When junior engineers struggle with incidents, I walk them through the scientific method of debugging:

Reproduce the problem.
Collect evidence (logs, metrics, traces).
Form a hypothesis.
Test, measure, refine.

For example, in a memory leak case, I let a junior take the heap dump and explain findings, stepping in only to validate conclusions.

2) Introducing SRE Practices to New Teams

In teams without SRE culture, I start small:

Define a single SLO for a critical endpoint.
Introduce a burn-rate alert tied to that SLO.
Run a blameless postmortem after the first incident.

This creates buy-in without overwhelming the team with jargon.

3) Prioritizing and Delegating in High-Pressure Situations

During outages, prioritization is key:

Delegate evidence gathering (thread dumps, logs) to one engineer.
Keep communication flowing with stakeholders (status every 15 minutes).
Focus leadership on mitigation and rollback decisions.

After stabilization, I lead the postmortem, ensuring learnings feed back into automation, monitoring, and runbooks.

Posted in en-US | Tags: leadership, SRE

Leave a Reply