Lesson Learned Statement:Safety hazards associated with the failure of safety barriers under the control of microprocessor-based computer systems need to be identified, analyzed, and mitigated before work is performed.Discussion:Microprocessor-based control systems have become increasingly popular in the work place because of their versatility and functionality. As a result, microprocessors are being used to monitor and control safety barriers associated with fire protection systems, radiation interlock systems, and manufacturing equipment. These systems tend to be complex and subject to single-point and common-cause failures. These failures occur when the control system can not perform its intended function for any of a number of reasons such as microprocessor malfunction or failure, loss of electrical power, power surge, software logic error, design error, or installation error. Failures of microprocessor-based control systems have the potential to adversely impact the safety of workers, the public and the environment when the microprocessor is exclusively responsible for maintaining a safety barrier or safety system. Some examples of failures of microprocessor-based control systems responsible for safety barriers:
In each of these cases, the microprocessor-based control system behaved in an unexpected or undesirable manner under circumstances that were not clearly foreseen at the time of design and installation. As a result, safety barriers under the control of the microprocessors were compromised, potentially placing workers and members of the public at serious risk. The potential safety hazards associated with failure of microprocessor-based control systems need to be recognized by management and safety professionals so that they can be identified, analyzed and mitigated. When warranted, additional design features or administrative controls should be considered to provide redundancy and defense in depth. Analysis:The lesson for the DOE complex to learn from these incidents and accidents is that potential failure modes of microprocessor-based systems responsible for controlling safety barriers and safety systems need to be identified, analyzed, and mitigated as part of the work control process. When single-point and common cause failures are possible, additional design features or administrative actions may be warranted to control the hazard. These actions are consistent with the core functions of DOE Integrated Safety Management Systems to identify hazards and implement hazard controls (see DOE Policy 450.4). DOE issued a Safety Notice (Issue No. 99-01) in July 1999 to alert personnel at DOE facilities to vulnerabilities associated with microprocessor-based fire protection systems. The notice discusses potential problems and recommended that several actions be implemented as part of site fire protection programs. However, the potential hazards and lessons to be learned were not generalized in their applicability to other safety disciplines where microprocessors are used to control safety barriers. Design features that may be used in conjunction with microprocessor-based control systems to control or mitigate potential failures include mechanical interlocks and hardware alarms. Redundant and independent microprocessors may be used in some high-consequence situations. Interlocks and alarms need to be designed to operate independently of the microprocessor-based control system, so that they provide redundancy and defense in depth. Interlocks and alarms help ensure that safety barriers remain in place or that immediate notification of barrier failure occurs, regardless of the functional condition of the microprocessor-based control system. Hardware alarms should be based on positive indication of actuation or barrier failure (e.g., switch actuation, solenoid position, gas pressure, detection of radiation, or fluid flow), rather than perceived status based on a software command. Interlocks and alarm functions should be designed in a fail-safe manner. Interlocks and alarms should be under a configuration control program. A comprehensive acceptance test program should be implemented for new installations or modification of existing systems. Additionally, interlocks and alarm functions should be routinely and systematically tested in accordance with detailed procedures as part of a preventative maintenance program. With respect to the use of microprocessors in safety systems at nuclear facilities, the Nuclear Regulatory Commission has published Regulatory Guide 1.152, Criteria for Digital Computers in Safety Systems of Nuclear Power Plants. The regulatory guide describes a method acceptable to the NRC staff for complying with the Commission's regulations for promoting high functional reliability and design quality for the use of digital computers in safety systems of nuclear power plants. The term "computer" refers to a system that includes computer hardware, software, firmware, and interfaces. Significant emphasis is placed on defense-in-depth against the propagation of common-cause failures within and between functions. Recommended Actions:Managers and safety professionals should be aware of the potential hazards associated with the failure of microprocessor-based systems responsible for the control of safety barriers. Microprocessor-based systems responsible for the control of safety barriers may be found in numerous applications, including fire protection systems, radiation facility interlocks, x-ray machines, and automated machining and manufacturing equipment. Corporate hazard identification processes and procedures need to be able to identify microprocessor-based systems responsible for the control of safety barriers so that the hazards can be analyzed and mitigated in accordance with core functions of integrated safety management. In cases where the microprocessor-based system represents a single point or common cause failure mode, additional measures should be considered to mitigate, control, or lockout the hazard to protect the safety of workers, the public and environment during routine operations, maintenance, and upset conditions.Originator:Department of Energy, Idaho Operations OfficeValidator:Walter N. Sato, Assistant Manager for Technology Programs and OperationsContact:Richard L. Dickson, (208) 526-0107, dicksorl@id.doe.govName Of Authorized Derivative Classifier:Joel M. TrentName Of Reviewing Official:Joel M. TrentPriority Descriptor:Red / UrgentKeywords:Microprocessor, computer, interlock, safety barrier, alarm, safety design, safety system, radiation protection, industrial safety, industrial hygiene, occupational safetyReferences:Type A Accident Report of the July 28, 1998 Fatality and Multiple Injuries Resulting from Release of Carbon Dioxide at Building 648, Test Reactor Area Idaho National Engineering and Environmental Laboratory, September 1998 DOE Safety Notice 99-01, Microprocessor-Based Fire Protection System Testing, DOE/EH-0560, Issue No. 99-01, July 1999. Inadequate Interlock Fail Safe Capability of Radiation Source Control System at the RSB/RESL Calibration Facility, CF-638; Occurrence Report No. DOE-ID 90-1 Abnormality Identified in X-Ray Facility Fail Safe System; Occurrence Report No. ALO-AO-MHSM-PANTEX-1994-0003 Milling Incident Involving High Explosive; Occurrence Report No. ALO-AO-MHSM-PANTEX-1993-0064 NRC Regulatory Guide 1.152, Criteria for Digital Computers in Safety Systems of Nuclear Power Plants, Nuclear Regulatory Commission, January 1996 DOE Policy 450.4, Safety Management System Policy, October 15, 1996. NRC Information Notice 97-82, Inadvertent Control Room Halon Actuation Due to Camera Flash, November 28, 1997 NRC Information Notice 99-05, Inadvertent Discharge of Carbon Dioxide Fire Protection System and Gas Migration, March 8, 1999 Information in this report is accurate to the best of our knowledge. As means of measuring the effectiveness of this report please use the "Comment" link at the bottom of this page notify the Lessons Learned Web Site Administrator of any action taken as a result of this report or of any technical inaccuracies you find. Your feedback is important and appreciated. DOE Function / Work Categories:Conduct of Operations - Configuration ManagementConduct of Operations - Lock and Tag Conduct of Operations - Work Planning Engineering Design and Construction - Non-Nuclear Engineering Design and Construction - Nuclear Environmental Protection - General Fire Protection Inspection & Testing Laboratory Experimentation Maintenance - Instrumentation and Controls Maintenance - Safety Systems Machining & Fabrication Nuclear Safety Operations - Facility Occupational Safety & Health - General Quality Research & Development Radiation Protection Safety Design ISM Category:Analyze HazardsDevelop / Implement Controls Hazard:Fire / Smoke / NFPALasers Personal Injury / Exposure - Radiation / Contamination
|