Independence and Protection from Common Cause Failures
Protective systems should be sufficiently independent of the control system or other protective systems (electrical/electronic or programmable).
Where there is an interface between systems (e.g. for indication, monitoring or shared components) or shared utilities (e.g. power), environment (e.g. accommodation, wiring routes) or management systems (maintenance procedures, personnel), then the method of achieving independence should be defined, and common cause failures adequately considered.
Measures to defend against common mode failures due to environmental interactions may include physical separation or segregation of system elements (sensors, wiring, logic, actuators or utilities) of different protective systems.
Independence will also be required for protection against systematic and common mode faults. Measures may include use of diverse technology for different protective systems. Where more than one E/E/PES protective system is used to provide the required risk reduction for a safety function, then adequate independence should be achieved by diverse technology, construction, manufacturer or software as necessary to achieve the requires safety integrity level.
Another good practice is to monitor dissimilar but related alarm conditions in the same equipment. The interlock system for each section should then be independent of one another so that failure of one interlock system will not affect the other. For example, monitor both low cooling water flow and high water outlet temperature, rather than having 2 flow alarms, to preclude common-mode failure of similar “low-flow” switches and potential loss of both interlocks.
Dependence on Utilities
Utilities which are required for the protective system to perform its safety function may include power supplies such as electricity, air, inhibitor materials and their propellants, inert gas such as nitrogen, cooling water, steam, pilot flames and their gases all of which should be adequately reliable. Confirmation that the designed capacity of reserves is adequate should be demonstrated by test.
The actions required from the protective system depend upon the nature of the process. The actions may be passive in nature, such as simple isolation of plant or removal of power, or they may be active in that continued or positive action is required to maintain or restore a safe state, for example by injection of inhibitor into the process, or provision of emergency cooling.
Active protective measures have a high dependence upon utilities, and may be particularly vulnerable to common mode failures. The scope of the protective system therefore includes all utilities upon which it depends, and they should have an integrity consistent and contributory to that of the remainder of the system.
Measures taken to defend against common mode failure of utilities will be commensurate with the level of safety integrity required, but may include redundancy (standby) or uninterruptable/reservoir supplies for electricity, air, cooling water, or other utilities essential for performance of the safety function. Such measures should themselves be of sufficient integrity.
Survivability and External Influences
The protective system should be adequately protected against environmental influences, the effects of the hazard against which it is protecting, and other hazards that may be present. Environmental influences include power system failure or characteristics, lightning, electromagnetic radiation (EMR), flammable atmospheres, corrosive or humid atmospheres, ingress of water or dust, temperature, rodent attack, chemical attack, vibration physical impact, and other plant hazards.
Degradation of protection against environmental influences during maintenance and testing should have been considered and appropriate measures taken. e.g. use of radios by maintenance personnel may be prohibited during testing of a protective system with the cabinet door open where the cabinet provides protection against EMR.
Utilities may also introduce external influences into the protective systems (e.g. from electrical supplies) .
Measures to protect against external influences may include:
" Under/Over voltage protection
" Over-current and short circuit protection
" Use of an uninterruptable power supply or voltage conditioning or filtering
" Careful attention to lightning protection and equi-potential bonding
Protection Against Random Hardware Faults
The architecture of the protective system should be designed to protect against random hardware failure. It should be demonstrated that the required reliability has been achieved commensurate with the require integrity level. Defensive measures may include high reliability elements, automatic diagnostic features to reveal faults, and redundancy of elements (e.g. 2 out of 3 voting for sensors) to provide fault tolerance.
Sensors include their connection to the process, both of which should be adequately reliable. A measure of their reliability is used in confirming the integrity level of the protective system. This measure should take into account the proportion of failures of the sensor and its process connection which are failures to danger.
Dangerous failures can be minimized by a number of measures such as:
" Use of measurement which is as direct as possible, (e.g. pneumercators provide an inferred level measurement but actually measure back pressure against a head and are sensitive to changes in density due to temperature variations within the process, and to balance gas flow, upon which they are dependant)
" Control of isolation or bleed valves to prevent uncoupling from the process between proof tests or monitoring such that their operation causes a trip
" Use of good engineering practice and well proven techniques for process connections and sample lines to prevent blockage, hydraulic locking, sensing delays etc
" Use of analogue devices (transmitters) rather than digital (switches);
" Use of positively actuated switches operating in a positive mode together with idle current (de-energize to trip)
" Appropriate measures to protect against the effects of the process on the process connection or sensor, such as vibration, corrosion, and erosion
" Monitoring of protective system process variable measurement (PV) and comparison against the equivalent control system PV either by the operator or the control system.
Proof testing procedures should clearly set out how sensors are reinstated and how such reinstatement is verified after proof testing.
Maintenance procedures should define how sensors/transmitters are calibrated with traceability back to national reference standards by use of calibrated test equipment.
Other matters which will need to have been considered are:
" Cross sensitivities of analyzers to other fluids which might be present in the process
" Reliability of sampling systems
" Protection against systematic failures on programmable sensors/analyzers. The measures taken will depend on the level of variability and track record of the software.
" Signal conditioning (e.g. filtering) and which may affect the sensor response times
" Degradation of measurement signals (distance between sensor and transmitter may be important)
" Accuracy, repeatability, hysteresis and common mode effects (e.g. effects of gauge pressure or temperature on differential pressure measurement)
" Integrity of process connections and sensors for containment (sample or impulse lines, instrument pockets are often a weak link in process containment measures)
Use of “SMART” instruments requires adequate diagnostic coverage and fault tolerance, and measures to protect against systematic failures (software design/integration, inadvertent re-ranging during maintenance).
Measures may include use of equipment in non-smart mode (analogue signal output, no remote setting) and equipment of stable design for which there is an extensive record of reliability under similar circumstances.
Actuators and Signal Conversion
Actuators are the final control elements or systems and include contactors and the electrical apparatus under control, valves (control and isolation), including pilots valves, valve actuators and positioners, power supplies and utilities which are required for the actuator to perform its safety function, all of which should be adequately reliable.
A measure of their reliability is used in confirming the integrity level of the protective system. This measure should take into account the proportion of failures of the actuator under the relevant process conditions which are failures to danger.
Actuators are frequently the most unreliable part of the tripping process.
Dangerous failures can be minimized by a number of measures such as:
" Use of “fail-safe” principles so that the actuator takes up the tripped state on loss of signal or power (electricity, air etc.). e.g. held open, spring return actuator
" Provision of uninterruptable or reservoir supplies of sufficient capacity for essential power
" Failure detection and performance monitoring (end of travel switches, time to operate, brake performance, shaft speed, torque etc.) during operation
" Actuator exercising or partial stroke shutoff simulation during normal operation to reveal failures or degradation in performance. Note this is not proof testing but may reduce probability of failure by improved diagnostic coverage
" Overrating of equipment
Other matters which should have been considered are:
" Valves are the most common final control element in safety systems. Valves are problematic in that they represent the largest failure mode while being the most difficult to test and repair. Partial stroke testing of SIS valves is now being employed to provide limited on-line testing.
" Valves should be properly selected for their duty, and it should not be assumed that a control valve can satisfactorily perform isolation functions
" Actuators may also include programmable control elements (e.g. SMART instruments) particularly within positioners and variable speed drives and motor control centers. Modern motor control centers may use programmable digital addressing. This introduces a significant risk of introduction of systematic failure and failure modes which cannot be readily predicted. Such an arrangement should be treated with caution. It is normally reasonably practicable for trip signal to act directly upon the final contactor
" Potential for failure due to hydraulic locking between valves (e.g. trace heated lines between redundant shutoff valves)
Modern logic systems are based on programmable logic controllers (PLCs), or what is now commonly called the logic solver or programmable electronic systems (PESs). As the name implies, the logic systems for protective systems are mostly electronic, but other technology systems (magnetic or fluidic/pneumatic) have been used.
The architecture of the logic system will be determined by the hardware fault tolerance requirements, for example dual redundant channels. Where a high level of integrity for the system is required then diverse hardware between channels may be employed. This should not be confused with diversity of independent protective systems.
Logic systems are likely to incorporate provisions for fault alarms and overrides, for which there should be suitable management control arrangements. They may also provide monitoring of input and output signal lines for detection of wiring (open circuit, short circuit) and sensors/actuators (stuck-at, out of range).
Such monitoring may initiate an alarm, a trip action or, in a voting arrangement, disable the faulty element.
Software based systems should be adequately protected against systematic failures, for example by an appropriate hardware and software safety lifecycles, and suitable techniques and quality systems.
Wiring And Communications (Signal Transmission)
Transmitters, communications devices and wiring systems should be arranged to meet the requirements for survivability, protection against external influences and independence.
Independent systems or redundant channels should not share multi-core cables with each other or power circuits, and may require diverse routes depending upon the safety integrity level to be achieved.
Measures to protect against failures include:
" Use of fail-safe principles such as DC model (e.g. 4 20 mA loop) for analogue signal transmission diagnosis and alarm of out of range, abnormal, or fault states (such as stuck-at) with defined control system responses for both the sensor and transmitter
" Cable selection (screening etc)
" Protection of cables against fire, chemical attack, physical damage etc
" Physical separation or segregation of cables and cable routes
" Routing in benign environments
" Use of optical fibers to protect against electrical interference
" Careful attention to lightning protection of data links between buildings
Use of Fieldbus or other digital communication protocols in protective systems should be considered a novel approach requiring a thorough evaluation and demonstration of the safety integrity.
The probability of failure on demand, or the failure rate of a protective system is critically dependent upon the frequency of proof testing and its ability to detect previously unrevealed failures of the system. The proof test interval should therefore be established accordingly, and as a rule of thumb for low demand systems, should be an order of magnitude less than the mean time between failure of the system and the demand rate.
Proof test procedures should be available which specify the success/failure criteria and detail how the test will be performed safely, including any management arrangements, operating restrictions and competence of personnel.
The tests should be arranged to reveal all dangerous failures which have been unrevealed in normal operation including the following measures:
" Tests performed at the conditions which would be expected at trip. (Where test under trip conditions cannot be performed, for example for safety reasons, then measures to ensure that potential failures at trip conditions will be revealed should be clarified)
" End to end tests at appropriate intervals, including proving sample/impulse lines. (Different elements of the protective system may require proof testing at different intervals)
Procedures should be available which detail the operation of the protective system including:
" Override management (authorization, security, recording, monitoring and review of overrides, reset requirements)
" Operating instruction for trips
" Instructions for response to equipment faults including fault alarms. (There should be procedural arrangements in place to ensure timely repair so that mean time to repair criteria can be met)
Procedures should be available for maintenance activities including:
" Maintenance instructions
" Control of spares (segregation of faulty or non-conforming parts, identification to prevent interchange of similar parts etc)
" Competence of maintenance personnel
" Operating restriction during maintenance
" Control of software back-ups and memory media (E/EPROMS, floppy disks, files on hard disks on portable PCs etc)
" Post maintenance reinstatement and proof testing
For systems where a high diagnostic coverage is claimed, for example high integrity high systems, the probability of failure (expressed as failure rate) is critically dependent upon the mean time to repair the faults revealed. For such systems, the repair performance should monitored and reviewed against the design criteria.
A management system for control of modifications should be available to ensure that:
" Unauthorized modifications are prevented
" Authorized modifications are not ill conceived
" Safety verification to confirm that the required safety function and integrity have been maintained
" Designed and implementation is carried out by competent persons
Remote Diagnostic Systems
Remote diagnostic systems have the potential to cause danger by initiating unexpected operations or by affecting safety functions by software/parameter modification or by diverting the control system processor from time critical functions.
The need for remote diagnosis should be justified, a risk assessment completed, and measures taken to ensure that safety is not affected by normal operation or malfunction of the diagnostic system, including the remote diagnostic terminal and software, communication link, and the control system diagnostic interface and software.
Consideration should be given to:
" Security and control of access
" Communication between diagnostician and plant personnel
" Restricted mode of operation; passive (monitoring only), active (control/operator functions), interactive (software change possible)
" Potential for operation outside restricted mode under fault conditions
" Protection of safety functions from unauthorized modification
" Change control
" Competence of personnel