Despite modern diagnostics, "no fault found" is frequently the mechanics' verdict after avionics fault alerts.

Terence Hardeman/SINGAPORE

FAULT INDICATIONS on avionics line-replaceable units (LRUs) are costing airlines millions of dollars, and engineering executives around the world have called for action by component manufacturers to combat the problem.

At the Airline International Electronic meeting, in Kuala Lumpur, Malaysia, in September 1995, British Airways chief development engineer (avionics), Clive Baxter said, that a 38% "no fault found" (NFF) rate, during workshop checks on LRUs removed from aircraft, because of apparent defects is costing his airline £34 million ($52 million) a year.

At the same meeting, Thai Airways International quoted an NFF rate of more than 50% on electronic-flight-instrument-system interface units for its Boeing 747-400s. The problem is more acute for small airlines, which do not have sophisticated test and overhaul equipment and cannot afford large spares holdings.

The difficulties lie in the increased complexity and high unit cost of modern avionics equipment. Built-in test equipment (BITE) has been incorporated in avionics equipment since the early 1960s, but it can be misleading. "BITE has been a dismal failure," Baxter claims. "In general, it indicates faults that you don't need BITE to show. It should show where the fault is and needs to show technicians, which LRU is at fault, including cable faults. We would be prepared to pay more if we could get more efficient BITE [information] - for example, from the present 90% reliability to 99.5%."

The problem is not new and is not confined to avionics units, but with the increasing reliance, on computers linked to a multitude of sensors (integrated systems), it has intensified. Part of the trouble is that, as KLM's director of engineering, Paul van den Boom, points out, there is no universal standard on BITE read-outs. Some faults require technicians, to operate meters on the LRUs, some are indicated by mechanically latched flags or balls and some, have code read-outs on screens.

Modern avionics units usually have a great deal of redundancy built in and it is quite possible to have a defect indicated on the BITE when a system is functioning satisfactorily. The BITE can also indicate unit failure through the over sensitivity of its watchdog function to temporary external causes, such as the voltage transients caused when switching from auxiliary-power-unit to engine-driven generator power.

There is a need to differentiate between "soft" faults, which can be cleared by reset buttons or by removing the electrical supply, and "hard" faults, which reappear when the electrical supply is reconnected. Soft faults can be elusive and may even be the result of software defects. Hard faults are less equivocal and are usually indicative of component failure.

The number and variety of separate BITE-indicating LRUs probably reached its peak on the Lockheed TriStar. It was, if anything, an over-engineered aircraft, but, by using the BITE, an experienced technician could identify complex faults rapidly. More importantly, he could decide what maintenance was required before the aircraft could be released for service.

In line operation, however, pressures to get the aircraft away on schedule often drives technicians to take the easiest route, which is to change an accessible LRU and sign the aircraft off "for further flight report".

SHOTGUN TROUBLESHOOTING

The problem is particularly acute with intermittent defects, which disappear when the aircraft is on the ground. One time-honoured and expensive remedy has been to change all the items which could have caused the defect - so-called "shotgun troubleshooting". As a result, large numbers of serviceable LRUs are returned to the shop for test and are tagged with the NFF label.

Larger airlines have invested heavily in automated test equipment (ATE) to help pinpoint component defects and to avoid the expense of sending them to specialist third-party service centres, only to have them returned with another NFF label. The ATE also enables the airlines to specify the type of repair required on recalcitrant LRUs, but van den Boom dismisses this as a mere palliative.

Some airlines, such as Cathay Pacific Airways, take advantage of the fact that their simulators and aircraft share many LRUs, so they use the simulators to check the functions of suspect units with soft faults - a procedure which could add a little spice to training details.

Integrated avionics systems are used to perform the functions previously accomplished by several LRUs, thus concentrating the problem, but increasing its complexity. The answer, is supposedly, provided by the central maintenance computer (CMC). The CMC collects, stores and processes BITE codes and system information and translates this into a format, which can easily be interpreted, by maintenance personnel.

Further complexity is introduced by the fact that the LRUs communicate data to each other. Thus the presence of an insect in a pilot tube, for example, could show up as a fault in the central air-data computer (CADC), and in a host of other units which rely on the CADC for information - the so-called "cascading fault".

Do the airlines expect too much from their BITE and CMCs? Van den Boom says: "The CMC is an immature trouble-shooting tool," adding: "We would like to have BITE read-outs in plain English instead of codes." KLM is a member of a five-airline group, which includes British Airways, Scandinavian Airlines System, Swissair, Japan Airlines and several vendors, studying the NFF problem in co-ordination with the International Air Transport Association. Van den Boom concedes that, with the increased reliability of components, the problem is receding. Equivocally, he claims that increased reliability can create its own problem, because line mechanics then get less practice dealing with the BITE.

Daniel Malka, director of support services at Sextant Avionique, concedes that the ability to detect and diagnose avionics faults has become a major issue, saying: "BITE has not established a favourable reputation for trustworthiness with technicians," adding: "Most recent transport-aircraft development programmes have included extensive efforts to design more cost-effective BITE. Most of the expected benefits of BITE have not been achieved, however." Malka gives some of the reasons as being the result of undue sensitivity of BITE to power interruptions and voltage transients. Obscure software-defects can also contribute to elusive intermittent faults. "In the case of a [software or hardware] design defect," he says, "it can be said that the unit was never functioning properly, but that design verification and testing throughout the development and manufacturing process were not comprehensive enough to identify the defect."

In a presentation at Airtech '95 at Birmingham, UK, Cathay Pacific technical-services superintendent David Hope told of his airline's experience with the introduction of the Airbus A330/A340. "The Airbus A330 and 340 are the first aircraft in which a truly integrated maintenance approach has been taken - centring upon the process of noting stored faults in the CMS post-flight report and following related guidance in a comprehensive trouble-shooting manual [TSM]. It is undesirable, from both the safety and efficiency point of view, for line engineers to make their own interpretation of system failures by studying system schematics."

Hope is pleased with the new aircraft's CMS, explaining: "More than with previous attempts at centralised BITE, the CMS on the A330/A340 actually works and can be relied upon to accurately point to component faults. Despite this, we still see the wrong components removed because the line-maintenance technician, from his experience with other aircraft, has a historical distrust of BITE. Often, he feels he must do something positive in response to pilot reports, such as change a component. Another temptation under operational pressure is to ignore the recommendations of the CMS and remove the system control computer simply because, being a rack-mounted avionics box, it is the easiest component to change. The reported problem or message is apparently cleared by this action only to reappear two sectors later when the sensor that has caused the warning once more goes out of limits."

ON THE LINE AT CATHAY

In the early stages of the A330/340's entry to line operation, the NFF problem was still present. As Hope says: "We have found that, on average, less than one-fifth of avionics computers removed from the A330/340 actually have confirmed hardware failure. The presence of software anomalies in some computers still under development means that their mean time between removal may be as little as one-tenth of what is expected from their advanced hardware design."

The indifference of some airlines to BITE readouts has been bounced back at them by many component vendors, who have offered to refund service charges to customers provided that the BITE readout has been included when the unit is sent for overhaul. Sextant Avionique, made this offer at the Kuala Lumpur meeting and it has been repeated, by other vendors.

The NFF problem cannot be laid entirely at the door of the line mechanics, however. David Hope remarks: "To be fair to the line technician, troubleshooting problems are complicated by misleading, spurious, messages for which there may be an explanation, but it is not published in the TSM."

For this reason, Cathay Pacific has developed the Airbus computer-aided troubleshooting system. It runs on a personal computer and provides instant relationships between message codes, action and related reports.

"Spurious failure-warnings are usually caused by the monitoring software having an incomplete model of the way a sensor or system behaves in the real aircraft," says Hope. He says, for example, that a warning time-delay may be set too short for the actual time it takes for a valve to change position when commanded. Airbus is addressing all such problems as they arise through its technical follow-up system, Hope says. The onus is on the airline, however, to evaluate each event for its impact on maintenance practice, he adds.

Airbus has provided a filter for the A330/340 CMC, which airlines can activate to mask off well-known nuisance messages - those which are still being examined for resolution - which need not appear in the post-flight maintenance report. In modern digital avionics, aircraft designers often inadvertently introduce new software bugs when implementing programming changes designed to attack several known problems simultaneously.

The changes can also invalidate previously tested interfaces, introducing anomalies which appear only when the computer functions are fully exercised in the aircraft. As Hope says: "The inevitable result is that airline service becomes the final testing ground for ongoing design changes."

Teething problems are almost inevitable with the introduction of a radically new design into service. Airbus representatives told the Kuala Lumpur conference: "During its initial service, the A340 CMS would display more than 50 messages after each flight. The amount of data generated was overwhelming and engineers were spending time suppressing messages created by other engineers. Now, we are down to about five messages a flight. Air-framers have had to learn that it is hard to programme pilot experience into a computer. The computer has difficulty in deciding what is important, but we have the data to convince the authorities that some messages can be ignored."

Similar problems dogged the Boeing 747-400 when it entered service. Spurious random messages from power transients during start-up were common. Line mechanics were faced with a plethora of misleading and irrelevant information. Component designers trying to meet everyone's needs - their own design requirements and those of hangar maintenance, bench mechanics, maintenance planners and statisticians - were compiling BITE tests on the basis of hypothetical failures rather than the experience of line operation.

The deluge of trivia confronting line mechanics after each flight from the early 747-400s prompted Frank Jauregui, staff vice-president for maintenance and engineering, line operations for Northwest Airlines, to remark in 1990 that "...too much information can be a curse. Do we have to deal with all of it each time the aircraft is on the ground? The manufacturer must understand the intent and impact of the new technology on maintainability and regulations."

The lesson was learned, however, and, for the Boeing 777, maintenance men were kept in the design loop from the beginning. Jack Hessburg, Boeing's chief mechanic, says: "One problem with BITE was that we set it too comprehensive a task and led mechanics to believe that it would be a magic pill to cure all faults. If a module had four possible modes of failure and we told the mechanics it was one thing when in fact the fault was among the other three, then he would say: 'BITE tells me lies'."

NO OIL STAINS

Hessburg continues: "On previous-generation aircraft, we could track most defects by visual inspection - an oil-stain, for instance - but you can't trouble-shoot a computer chip by looking for ones and zeros falling off the end of a connector pin. Good BITE gives mechanics 'eyes' to look inside a unit. Now, we are more selective on what we monitor. The 777 CMC is more tolerant of 'dirty' power and transients. If there is some ambiguity in the possible solution to a problem we say to the CMC: 'show me the ambiguity'. We also told the component vendors, 'you are responsible for all messages to the CMC. There should not be a message unless it means something'. On some previous aircraft the CMC generated component tests, not the LRUs. That is nonsense. On the 777, the CMC tells the LRU to run a logic test and report back. Good troubleshooting is logical, deductive, reasoning and the best maintenance computer is still the one between the mechanic's ears."

Ian Gilbert, technical services manager (Europe) for AlliedSignal says that there are areas, which are difficult to check with BITE, but that the company's experience with airlines worldwide is that BITE is not used effectively. For example, if the radio altimeter circuit breaker is pulled, a traffic-alert collision-avoidance system (TCAS) BITE will show "TCAS fail". Technicians then change the TCAS unit. Gilbert adds: "We did not make systems idiot-proof. Perhaps we should have. If an antenna is covered with ice or water it can cause an imbalance for BITE, but, if you do a BITE check before you pull the box, you should get good results."

Hessburg is acerbic: "If a mechanic needs a computer to tell him to check a circuit-breaker, I don't want him near my aeroplane," he says. He also has a short reply for Clive Baxter's plea for 99.5% BITE efficiency, saying that, technically, it could be done, but the price would be high and it would not solve the NFF problem. He points out the potential danger of having a BITE system so complex that it is more likely to fail than the LRU itself, generating BITE to check the BITE and so on ad infinitum.

The first loyalty of both Hessburg and Malka is to the line mechanics: "They are the people with the most influence on on-time departure and they contribute the most to achieving airline dispatch reliability," says Malka. Both men share similar views on what constitutes good CMC/BITE design:

information which does not contribute to effective maintenance should not be displayed;

fault messages should identify the root cause;

the syntax in messages should be consistent and use simple terminology, bearing in mind that English is not the first language of most of the end users;

BITE information should correspond to pilot reports/alerts. Alerting flightcrew to a fault without telling ground mechanics what to do about it is unacceptable;

the logic used to generate maintenance messages should include efficient consolidation logic to eliminate cascaded faults;

the CMC should be simple and intuitive to operate. It should not require programming knowledge or codes for use;

If fault consolidation results in a large ambiguity group, do not waste time listing all possible causes. The limits of the BITE will have been reached in this instance, so admit it and offer and better strategy, such as directing the mechanic to the fault-isolation manual;

write fault-isolation manuals so that they are integrated with and complementary to the CMC strategies;

include hands-on CMC use in engineer training.

Hessburg admits that, with only a dozen 777s in airline service, it is too soon to judge whether Boeing has cracked the BITE ambiguity and NFF problems, but he claims that, to date, the system has run without a hitch. Of the CMC, he says: "The best testimony is that, when we started our 1,000-cycle test programme, mechanics were saying 'don't believe what the CMC tells you'. With the programme completed, they are telling each other, 'believe it'."

As Hessburg and Malka insist, there is still the need for the experienced mechanic on the line. The sophistication of BITE and the CMC are just the tools to help him in his task.

Source: Flight International