An engineer's analysis of Boeing's woes
Mr. Uyless Black’s peroration on the 737MAX has some opportunities for improvement. To put this in perspective, an actual airline pilot who writes an aviation column called Ask the Pilot has said on any number of occasions that an aircraft accident is the nexus of a number of things going wrong and if you changed even one of them you probably could avoid the accident. Contrary to Mr. Black’s assertion, it really wasn’t faulty software that brought those planes down.
A nexus of hardware failure, system design failure, and hardware design implementation failure brought the aircraft down. The software was doing exactly what it has been told to do. So. What happened?
First of all, Boeing, moving in the direction taken by Airbus, increased the automation in the cockpit. Specifically they have a piece of avionics called a Maneuvering Characteristics Augmentation System (MCAS). It’s a device similar to an Automatic Flight Control Set (AFCS), a kind of power steering for your airplane.
AvWeek says that Boeing implemented this in the 737MAX so the aircraft would fly much more like the pre-MAX 737s. Lots of aircraft have AFCSs but AFCSs are not granted control authority. When you start approaching a stall regime with an AFCS, it’ll shake the stick, shake the rudder pedals, it’ll issue an audible alarm, it’ll do something depending on how it was implemented but it won’t take control from the pilot. The MCAS does.
When the MCAS thinks it’s approaching stall, it actuates the elevators (the horizontal wings in the rear of the aircraft) and commands nose down. The pilot’s in a tight spot here. I don’t know the 737 but an F-18 aircraft’s hydraulic system runs at 3,000 psi. That’s what the pilot has to pull against to raise the nose back up since the hydraulic system has been commanded to make a certain elevator movement. This, as far as I am concerned, is a system design failure. No flight control system should be designed such that the pilot may have to pull against the hydraulic system for anything more than a very short period of time.
Secondly, the 737 has two AOA sensors. Angle of Attack sensors detect the angle between the aircraft’s body coordinates and its direction of motion. This is part of the input to determine if you are in a stall flight regime.
According to AvWeek, the AOA sensors are only used one at a time. That surprises the hell out of me. The only real purpose of multiple sensors is redundancy so you continue to get the flight data even if one sensor fails. The problem with multiple inputs is that you must resolve them to a single answer and that resolution usually happens in the AFCS or MCAS. There are a variety of ways to effectuate that resolution and I don’t really think we need to go into them but an adjunct to this is that if the system thinks its sensors are bad, it usually shuts the system off. I consider this to be a combination of a system design failure (their failure modes and effects analysis would have told them that this was a hazardous single point failure) and a hardware design failure because the published descriptions of the two accidents indicate that the crew couldn’t shut the MCAS down. They shouldn’t have had that problem.
So let’s see what actually happened. Both accidents were caused by a failure of the AOA sensor that was in use. When the sensor quit providing good data, the MCAS thought that it was about to stall and it drove the nose down. This is exactly what it was supposed to do. The fact that the pilots couldn’t pull the nose back up is merely an unfortunate byproduct of the MCAS design function. The inability of two experienced crews to disengage the MCAS is, however, NOT an unfortunate byproduct of the system working like it’s supposed to.
So, sorry, Mr. Black. No faulty software. I’ll grant you, they will probably fix it via software.
If they did it right, they’d take the MCASs control surface authority away from it (via software) and replace it with an audible alert (vis software). They’ll have to write the code to give the MCAS the ability to use and to resolve sensor disagreements (that is software) because it clearly doesn’t have that now. Finally, they should do an automatic kill on the MCAS when they can’t resolve a valid AOA value. That would resolve everything that brought the two 737s down.
Update: According to AvWeek, the fix from Boeing for the MCAS is largely as I suggested. It retains control surface authority but can no longer overpower pilot inputs. They started using both AOA sensors instead of just one with a software filter and warnings if the sensors don’t agree and if they don’t agree, MCAS is inhibited.
•••
Jeff Harrison of Worley retired from McDonnell Douglas/Boeing Engineering in 2006. He worked in the sports car division, not the trash hauler division.