Peter Verheijen

My personal blog

10 May 26

Post-Mortems and Building a Learning Culture

When incidents happen, organizations often default to finding out who made the mistake. While understandable, this usually leads to the wrong conversations. In complex engineering environments, failures are rarely caused by a single person. More often, incidents emerge from a combination of unclear ownership, missing safeguards, communication gaps, operational pressure, and technical complexity.

A good post-mortem should focus on learning, not blame.

If people are afraid of being punished for mistakes, they naturally become more defensive. Information starts moving slower, uncertainty gets hidden, and teams become less likely to escalate issues early. Over time, this creates fragile organizations where small problems can quietly grow into larger incidents.

At the core, most people want to do good work and contribute positively.

With that in mind, building a healthy engineering culture often comes down to the following:

  1. Psychological Safety
  2. Accountability
  3. Continuous Improvement

These concepts are not opposites. In fact, strong accountability usually requires psychological safety first. People are far more willing to take ownership when they know the goal is improvement rather than punishment.

Focus on Systems, Not Individuals

One of the most valuable mindset shifts during a post-mortem is changing the question from:

“Who caused this?”

to:

“Why did this make sense at the time?”

That question tends to uncover the real problems underneath the surface.

Maybe documentation was unclear. Maybe ownership was ambiguous. Maybe monitoring failed to detect the issue early enough. Maybe deployment processes relied too heavily on manual actions. Maybe people were operating under time pressure with incomplete information.

People operate within systems. If mistakes are easy to make repeatedly, the system itself probably needs improvement.

A potential approach for running effective post-mortems is to focus on the following, in this order:

  1. Understanding
  2. Alignment
  3. Action

Start by building a shared understanding of what happened. From there, align on contributing factors and lessons learned. Finally, translate those lessons into concrete actions that improve the system moving forward.

Psychological Safety Enables Faster Learning

Teams that feel safe communicating openly tend to resolve incidents faster and learn more effectively from them.

People should feel comfortable saying:

“I think something is wrong.”

before an issue escalates into something larger.

This is especially important in fast-moving engineering environments where complexity grows quickly across infrastructure, software, and organizational boundaries. Early escalation and transparent communication are often the difference between a minor issue and a major outage.

At the same time, blameless does not mean accountability-free. Teams should absolutely discuss decisions, assumptions, trade-offs, and execution. The difference is that the discussion stays constructive and focused on improving systems rather than attacking individuals.

Follow-Through Matters More Than the Document

One of the biggest pitfalls with post-mortems is treating the document itself as the final outcome.

The real value comes from the improvements that happen afterwards.

Strong engineering organizations use incidents as opportunities to improve:

  • Automation
  • Monitoring
  • Documentation
  • Ownership clarity
  • Deployment processes
  • Communication paths
  • Operational playbooks

Over time, these small improvements compound into more resilient systems and healthier teams.

Final Thoughts

Incidents are inevitable in complex systems. What matters most is how organizations respond to them.

Good post-mortems build trust, improve collaboration, and help teams continuously learn from failure. When done well, they create an environment where people can move faster, communicate openly, and improve together over time.