[DevoxxBE2024] Mayday Mark 2! More Software Lessons From Aviation Disasters by Adele Carpenter
At Devoxx Belgium 2024, Adele Carpenter delivered a gripping follow-up to her earlier talk, diving deeper into the technical and human lessons from aviation disasters and their relevance to software engineering. With a focus on case studies like Air France 447, Copa Airlines 201, and British Midlands 92, Adele explored how system complexity, redundancy, and human factors like cognitive load and habituation can lead to catastrophic failures. Her session, packed with historical context and practical takeaways, highlighted how aviation’s century-long safety evolution offers critical insights for building robust, human-centric software systems.
The Evolution of Aviation Safety
Adele began by tracing the rapid rise of aviation from the Wright Brothers’ 1903 flight to the jet age, catalyzed by two world wars and followed by a 20% annual growth in commercial air traffic by the late 1940s. This rapid adoption led to a peak in crashes during the 1970s, with 230 fatal incidents, primarily due to pilot error, as shown in data from planecrashinfo.com. However, safety has since improved dramatically, with fatalities dropping to one per 10 million passengers by 2019. Key advancements, like Crew Resource Management (CRM) introduced after the 1978 United Airways 173 crash, reduced pilot-error incidents by enhancing cockpit communication. The 1990s and 2000s saw further gains through fly-by-wire technology, automation, and wind shear detection systems, making aviation a remarkable engineering success story.
The Perils of Redundancy and Complexity
Using Air France 447 (2009) as a case study, Adele illustrated how excessive redundancy can overwhelm users. The Airbus A330’s three pitot tubes, feeding airspeed data to multiple Air Data Inertial Reference Units (ADIRUs), failed due to icing, causing the autopilot to disconnect and bombard pilots with alerts. In alternate law, without anti-stall protection, the less-experienced pilot’s nose-up input led to a stall, exacerbated by conflicting control inputs in the dark cockpit. This cascade of failures—compounded by sensory overload and inadequate training—resulted in 228 deaths. Adele drew parallels to software, recounting an downtime incident at Trifork caused by a RabbitMQ cluster sync issue, highlighting how poorly understood redundancy can paralyze systems under pressure.
Deadly UX and Consistency Over Correctness
Copa Airlines 201 (1992) underscored the dangers of inconsistent user interfaces. A faulty captain’s vertical gyro fed bad data, disconnecting the autopilot. The pilots, trained on a simulator where a switch’s “left” position selected auxiliary data, inadvertently set both displays to the faulty gyro due to a reversed switch design in the actual Boeing 737. This “deadly UX” caused the plane to roll out of the sky, killing all aboard. Adele emphasized that consistency in design—over mere correctness—is critical in high-stakes systems, as it aligns with human cognitive limitations, reducing errors under stress.
Human Factors: Assumptions and Irrationality
British Midlands 92 (1989) highlighted how assumptions can derail decision-making. Experienced pilots, new to the 737-400, mistook smoke from a left engine fire for a right engine issue due to a design change in air conditioning systems. Shutting down the wrong engine led to a crash beside a motorway, though 79 of 126 survived. Adele also discussed irrational behavior under stress, citing the Manchester Airport disaster (1984), where 55 died from smoke inhalation during an evacuation. Post-crash recommendations, like strip lighting and wider exits, addressed irrational human behavior in emergencies, offering lessons for software in designing for stressed users.
Habituation and Complacency
Delta Airlines 1141 (1988) illustrated the risks of habituation, where routine dulls vigilance. Pilots, accustomed to the pre-flight checklist, failed to deploy flaps, missing a warning due to a modified takeoff alert system. The crash after takeoff killed 14. Adele likened this to software engineers ignoring frequent alerts, like her colleague Pete with muted notifications. She urged designing systems that account for human tendencies like habituation, ensuring alerts are meaningful and workflows prevent complacency. Her takeaways emphasized understanding users’ cognitive limits, balancing redundancy with simplicity, and prioritizing human-centric design to avoid software disasters.