Safety-critical systems

santhosh · May 4, 2018, 10:26am

Several reliability regimes for safety-critical systems exist:

Fail-operational systems continue to operate when their control systems fail. Examples of these include elevators, the gas thermostats in most home furnaces, and passively safe nuclear reactors. Fail-operational mode is sometimes unsafe. Nuclear weapons launch-on-loss-of-communications was rejected as a control system for the U.S. nuclear forces because it is fail-operational: a loss of communications would cause launch, so this mode of operation was considered too risky. This is contrasted with the Fail-deadly behavior of the Perimeter system built during the Soviet era.

Fail-soft systems are able to continue operating on an interim basis with reduced efficiency in case of failure.Most spare tires are an example of this: They usually come with certain restrictions (e.g. a speed restriction) and lead to lower fuel economy. Another example is the “Safe Mode” found in most Windows operating systems.

Fail-safe systems become safe when they cannot operate. Many medical systems fall into this category. For example, an infusion pump can fail, and as long as it alerts the nurse and ceases pumping, it will not threaten the loss of life because its safety interval is long enough to permit a human response. In a similar vein, an industrial or domestic burner controller can fail, but must fail in a safe mode (i.e. turn combustion off when they detect faults).

Famously, nuclear weapon systems that launch-on-command are fail-safe, because if the communications systems fail, launch cannot be commanded. Railway signaling is designed to be fail-safe.

Fail-secure systems maintain maximum security when they cannot operate. For example, while fail-safe electronic doors unlock during power failures, fail-secure ones will lock, keeping an area secure.

Fail-Passive systems continue to operate in the event of a system failure. An example includes an aircraft autopilot. In the event of a failure, the aircraft would remain in a controllable state and allow the pilot to take over and complete the journey and perform a safe landing.

Fault-tolerant systems avoid service failure when faults are introduced to the system. An example may include control systems for ordinary nuclear reactors. The normal method to tolerate faults is to have several computers continually test the parts of a system, and switch on hot spares for failing subsystems.

As long as faulty subsystems are replaced or repaired at normal maintenance intervals, these systems are considered safe. Interestingly, the computers, power supplies and control terminals used by human beings must all be duplicated in these systems in some fashion.