Quality of PSA Data
Gennady Tokmachev
WORKSHOP on “PSA Quality for Decision
16-20 February 2004, IAEA, Vienna, AUSTRIA
PSA practice in Russia
Experience from developing rules and procedures in Russia
Results of regulatory reviews
Results of multinational projects and their reviews
Findings of IAEA IPSART missions
Harmonization activity
Documents being developed by IAEA, latest US NUREGs
Data processing task includes analysis of:
• Initiating event (IE) frequencies
• Equipment reliability data (i.e. operating failure rate, failure
probability on demand and/or stand-by failure rate)
• Maintenance and test unavailability
• Common cause failure (CCF) data
• Special (conditioning) event data
• Comment. Data on human errors is usually derived separately within
the HRA task and is not discussed in this presentation .
D a ta A n a ly s is P r o c e s s in P S A (e x tr a c te d fr o m I A E A T E C D O C )
R e lia b ility
m o d e ls
L is t o f
c o m p o n e n ts
C om ponent
b o u n d a rie s
P la n t o p e ra tio n a l re c o rd s /
d a ta b a s e s ,
te c h n ic a l s p e c ific a tio n s , e tc .
L is t o f IE s a n d
IE g ro u p s
C om ponent
fa ilu re m o d e s
S e t o f re c o rd s o n P S A c o m p o n e n ts fa ilu re s a n d
re p a ir/ m a in te n a n c e e v e n ts
S e t o f re c o rd s o n P S A c o m p o n e n ts s ta n d b y /o p e ra tin g tim e s
P S A c o m p o n e n t te c h n ic a l s p e c ific a tio n d a ta
P S A c o m p o n e n t te s t s p e c ific a tio n d a ta
S e t o f re c o rd s o n a b n o rm a l o p e ra tio n a l e v e n ts
O p e ra tio n a l h is to ry d a ta (te rm s o f re fu e llin g
o u ta g e s , p o w e r o p e ra tio n , d e c re a s e d p o w e r
o p e ra tio n , e tc .)
N u m b e r o f fa ilu re s o f a c e rta in fa ilu re m o d e a n d
a s s o c ia te d s ta n d -b y /o p e ra tin g tim e s o r c o rre s p o n d in g
n u m b e r o f d e m a n d s fo r P S A c o m p o n e n t g ro u p s
A v e ra g e d u ra tio n s a n d fre q u e n c ie s o f c o m p o n e n t
te s t/m a in te n a n c e e v e n ts a n d te s t p e rio d s
N u m b e r o f e v e n ts a s s o c ia te d w ith IE s a n d IE g ro u p s a n d
c o rre s p o n d in g p la n t e x p o s u re tim e s
C o m p o n e n t B o u n d a rie s
G e n e r ic
d a ta
E s tim a tio n
te c h n iq u e s
M e a n v a lu e s a n d u n c e r ta in ty p a r a m e te r s :
C o m p o n e n t fa ilu re ra te s /p ro b a b ilitie s fo r d iffe re n t
fa ilu re m o d e s
C o m p o n e n t u n a v a ila b ilitie s d u e to te s t/m a in te n a n c e
IE fre q u e n c ie s
C C F m o d e l p a ra m e te rs
S p e c ia l e v e n t p ro b a b ilitie s
Terminology issues
When PSA methodology came to Russia, some important terms had
already been occupied in Regulatory documents and state standards.
1) Definitions of the following terms, namely «dependent failure» and
«common cause failure», in Russian «General provisions of nuclear
power stations safety …» OPB-88/97 GAN RF and definitions of the
same terms used in PSA differ dramatically!
Functional dependencies and internal hazards are covered by the
definition of CCF. This contradicts to PSA practice.
2) Definition of “frequency” term used in PSAs is not supported by state
standards on reliability, probability and statistics.
Multi-language issues also appear when there is difference in
languages between PSA producers and PSA users (e.g. between
Russian and English)
Cost/benefit analysis of data processing
• Historically much more attention was paid to the collection of reliability
data than data on IEs or CCFs
Paradox: more efforts were spent on less important contributors, e.g.:
– Reliability data - IAEA-TECDOC-478, 508, 636, 756(mainly), 930
– IE frequency - IAEA-TECDOC- 719, 749, 756 (a little)
– CCF data - IAEA-TECDOC-648
• CCFs usually contribute much more to CDF than independent failures
• IE frequency directly affects CDF
Example. Increasing a frequency of one IE in 10 times and
simultaneously reducing a frequency of another IE by factor of 10
can dramatically change a risk profile given the same model
Current tendency. Last years comparison of PSAs for VVERs supported by
IAEA and US DOE has focused on IE frequencies a lot
Initiating event frequencies
Collection of specific data
Problems if a data collection system is used:
• Event descriptions are not formalized for PSA purposes
• Potential underreporting. This is more relevant to shutdown PSA
• Not all IEs that occur at power result in a plant trip
• Several IEs can occur during a single plant disturbance:
– One IE can completely cover another event. Discourage double
counting (Example: reactor scram event followed by LOOP.
Note. Dependent LOOPs are seldom evaluated)
– One event is the direct sequence of another IE occurred early.
(Example: inadvertent opening of BRU-A => closure of all MSIV).
Need in coordination with AS modeling task
• Consequences of fires, floods, etc. have already been counted as an IE
while performing the internal event PSA, e.g. FW rupture
Use of generic data in PSA
Requirements of Russian Regulatory Rules
• It is not recommended to use only generic data in carrying out a Level
1 PSA for an operating NPP
• If generic data is used for, then source of information needs to be
• Selection of generic data should be justified in terms of its applicability
to the specific plant and equipment, component boundaries and failure
modes defined in the Level 1 PSA
• If several sources of generic data are used then an approach to the
selection of information should be presented and justified.
Comment. Several sources can have the same origin
Lessons learnt from IPSART missions:
• In using generic data both directly and for Bayesian updating, a week
point is consistency between the generic data and the plant model
(definition of initiating events, component boundaries, failure modes)
Initiating event frequencies
Issues related to frequency estimation
• Uncertain information from sources of generic data:
– Incompleteness of IE data in international DBs (e.g. IAEA IRS)
Example from IAEA-TECDOC-749: 54 scram events at
Paks NPP and only one event at Novovoronezh +Kola NPPs
– Definitions of IEs in different PSAs or DBs differ considerably
o Size of LOCAs
o Loss of service water – a single train vs. the whole system,
including suction water or not, loss of essential SW vs. nonessential one, etc.
Initiating event frequencies
Issues related to frequency estimation (continued)
• Differences in design of NPP analyzed and other NPPs, e.g.:
– Design of safety/relief valves (PSV, SGSV, BRU-A)
– Configuration of support systems (e.g. 2/3/4 train SWS), etc.
• Impact of modifications
– Algorithm of an accelerated unit unloading applied to VVER-1000
reduced dramatically the number of scram events in the nineties
– Modification of BRU-A relief valves at VVER-1000 reduced
dramatically the number of stuck open events after year 1989
• Time trend
– NUREG/CR-5750: a decreasing time trend was observed for many
categories of events.
– Results of fracture mechanics studies show an increasing time
trend in LOCA frequencies over plant life time
Initiating event frequencies
Comparison of data for medium LOCA
IE frequency
Initiating event frequencies
Comparison of data for medium LOCA (continued)
1) Does design make difference?
• No. Primary pipes are the same, leaks via connections are restricted by
throttles of D 32 mm
2) Does pipe steel make difference?
• No - in these particular estimates. Novovoronezh differs (stainless
steel), but this was not taken into account in estimating
3) Does a leak size make difference?
Leak size:
– Novovoronezh - 32-100 mm
– Kozloduy – 32-125 mm
– Bohunice – 32-200 mm
• No, because the estimation method used ignores a leak size although
there is the surge pipe of the pressuriser having D=200 mm (Bohunice)
Conclusion 1. The same definition can mean IEs which are different
Initiating event frequencies
Comparison of data for medium LOCA (continued)
4) Does specific data differ?
• No, zero event statistics and comparable plant specific exposures
5) Does estimation method differ?
• No, Bayesian method assuming a non-informative prior
• Difference in values was caused by different generic exposures
assigned to zero event prior data:
– Bohunice: f MLOCA = 0.5/115=4.3Е-3/a
– Novovoronezh: f MLOCA =0.5/600=8.3Е-4/a
Conclusion 2. The Bayesian method is very sensitive to the prior data
Conclusion 3. There is a very high uncertainty associated with
frequencies of sufficiently large breaks
Initiating event frequencies
Comparison of generic data for LOCA (continued)
Conclusion 4.
The IE frequencies for rare events need to be
rationalized. There is a need for an international
consensus to make PSA results being comparable
• An attempt was made at an international workshop in Piestany, 2003:
Worldwide experience for all PWR/VVERs shows that no such
events have occurred to date in approximately 10,000 RY of
• This suggests that a frequency of the order of 1.0E-4 per year
could be justified as the sum of medium and large LOCAs
• Suggested values for for VVER-1000s are:
– Large LOCA (200mm-800mm) - 3.2E-5/a (EF 10)
– Medium LOCA (50mm-200mm) - 6.8E-5/a (EF 10)
First year of plant operation
• It is usual to eliminate data from the first year of plant operation since
it represents failures that occurred during the plant break-in period.
• That is right for useful-lifetime of already operating plants:
• What about new plants?
• Their number will increase
• 1st year – highest risk
• Is is necessary to perform a PSA
for trial operation of the unit?
• What regulatory criteria should
be applied to?
Component reliability data
Component boundaries and failure modes
• The coordination with system analysis is critical in many PSAs and
surprisingly with AS and HRA analyses as well
• It is critical to define what the component boundary is to get the right
data. Examples of typical errors in defining component boundaries:
Example 1. A circuit breaker was included in the DG boundary considered
in the PSA model. When deriving DG failure probability from data
collected, failures of the DG circuit breaker were not counted.
Example 2. Double counting of failures– e.g., relay - both in the boundary
of pump and separately.
Example 3. Failures resulting from human errors after testing,
maintenance, and instrument miscalibrations were double counted in
HEP (type 1) and component reliability
Component reliability data
Causes of inconsistency between data and model
• Modification of NPP
– Pre-modification data does not reflect the actual plant design
– Typically, data prior to the design change is deleted from the data
analysis because only post-modification operating history fully
represents current conditions
• Event data logged by different workshops and separately identified in
the plant maintenance records
– For example, circuit breaker is typically included in the boundary
of a pump, however its failures are often recorded by the electrical
workshop rather than the reactor/turbine departments
– Data collection procedure needs to be consistent with the model to
avoid a potential for underreporting
Component reliability data
Causes of inconsistency between data and model (continued)
• Use of manufacturer’s data
– Differences in component boundaries, failure mode definitions,
and conditions of manufactory tests, e.g.:
Example of inconsistency:
– The manufacturer's data for a "pump" failures
typically includes only mechanical parts +
the pump motor because other vendors are
responsible for the other subcomponents
– The PSA data for a "pump" includes, in addition,
the motor circuit breaker, and parts of the
control circuits that operate the circuit breaker
– Finding: in a PSA for VVER the failure probability
of the circuit breaker (subcomponent of the
“pump”) > the total failure probability of the “pump”
Component reliability data
Causes of inconsistency between data and model (continued)
• Testing mode is different from accident demands in terms of
component boundary:
– Partial testing, e.g.:
• idling diesel generator, i.e. some subcomponents included in
the boundary of the component are not tested
• opening of relief valve without reverse
– Short-term testing – e.g., fuel transfer pump can be included in
the boundary of the diesel generator, but not started during a test of
the diesel generator
Collection of plant specific reliability data.
Issue -«improving» test results
• The first start fails followed by a successful start => the second start
only is recorded in a test report
• An operator stops the test if the component is having trouble and does
not consider the event as a test.
• Starts of components one by one before a complex test
• Careful inspection of equipment before a test
• Warming up DG before the test
• Manual initiation of MOV before opening by the motor
NUREG/CR-6823: “For PRA purposes, one is normally most interested in
the failure probability on an actual unplanned demand. To estimate
this, one should use only data from unplanned demands and from tests
that closely mimic unplanned demands”
Treatment of plant specific reliability data.
Allocation of events
• Event allocation may affect dramatically the estimated values
• Requirement of Russian Regulatory Rules:
Criteria for event allocation to failure modes and severity should be defined
and justified
• Problems:
– The requirements for recording the data may use definitions of severity
and failure modes different from the PSA definitions. Need in translation
– Sometimes maintenance records are so unclear that a definite
classification of the severity of a component malfunction is impossible.
• First step - consultation with maintenance staff
• Conservative categorization of uncertain events => significantly
higher failure probabilities
• Extrapolation of categorized events over unclear ones
Treatment of plant specific reliability data.
Allocation of events (continued)
• Event categorization in terms of severity can be complicated if
equipment is stopped by an operator:
Example 1. Trip of a diesel generator due to unwanted hammering.
Cause – a break of two bearings
Example 2. Serial trip of two diesel generators due to surge
Example 3. Pump trip after start due to a rapid temperature excursion in
• If a degraded failure is revealed in a short test duration, an analyst
cannot be sure the component would have succeeded over its mission
Treatment of plant specific reliability data.
Allocation of events (continued)
Failure to start or failure to run?
• Usually in Russian PSAs failures of stand-by pumps, DGs, and fans
discovered in 30 minutes after start are allocated to failure to start
• Actually such failures arise during stand-by period and are discovered
with some delay, but far earlier than 24 hours
Example. Pump trip in 10 minutes after start due to temperature excursion
in bearings due to degradation of lubricating oil
• Easily, because periodical tests are usually lasted for 30 minutes
• Otherwise, failure rate to run – unrealistically high: there are some
failure rates to run higher than 1Е-2/hrs
Comment. T-book and some PSAs used another approach.
Treatment of plant specific reliability data.
Unavailability due to unscheduled maintenance
• Unavailability due to corrective maintenance to be performed to
eliminate degradation was missed in error in Procedures for conducting
Level 1 PSA, №50-P-4, IAEA (1992), refer to Table XI. Only repairs
caused by catastrophic failures are considered
• Coincident outage times for redundant equipment be permitted by
– Should reflect actual plant experience => search while collecting
raw data
– Missing in the model => not collected
– Operational feedback should be used for updating the model
Treatment of plant specific reliability data.
Grouping of similar components/events
• Reason – operating history of a single component does not provide the
number of events which is sufficient for estimation
• Goals of component/event grouping:
– Increase of statistical significance
– Reduction of efforts for data collection and treatment
• Drawbacks:
– Masking trends and peculiarities
– Extension of uncertainty due to differences in design
– Potential for meaningless averages (e.g., circuit breaker + switch)
• Issue. No generally accepted approach to component grouping
Treatment of plant specific reliability data.
Grouping of similar components/events
• Ideally many aspects should be the same, e.g.:
– Component type/design/size/manufacturer
– Operating mode (standby/operating) and environment
– Frequency of demands/maintenance, etc.
• In practice some compromise is to be reached between statistical
insignificance and inhomogeneity of the data by ignoring less
significant criteria
• Approached used in PSAs are very different. It would be beneficial to
develop a list of priorities for component grouping
Software reliability
• Computer based systems are of increasing importance to safety in NPPs
• In practice, expert judgment - to establish a subjective belief in failure
probability of a specific software
• Russian Regulatory requirement (OPB-88/97): software reliability (SR)
must be quantified
• IAEA Safety Guide NS-G-1.1 (2000):
– Para. 2.13 “Quantification of SR is an unresolved issue. Software
testing has its limits and the quantification of SR for computer based
systems may be difficult or impossible to demonstrate.”
– Para. 7.11 “Currently available methods do not provide credible
results” => SG “does not provide guidance on the use of SR models”
– Para. 2.10 “Since software faults are systematic rather than random in
nature, CMF of computer based safety systems employing redundant
subsystems using identical copies of the software is a critical issue”
Statistical methods
• What statistical methods are usually used in PSA?
– Frequentist (classical) method
– Bayesian method
• There is no perfect method. When pooling generic data or constructing
prior distribution, expert judgment is the main source of uncertainty
• Sometimes there seems to be a gap between mathematicians and
engineers responsible for data processing and mutual misunderstanding
NUREG/CR-6813 (April 2003):
“PRA training and associated software tools to perform Bayes’
updating of generic distributions with plant specific data is surprisingly
lacking. There is only one PSA software tool that has a built in
capability to perform Bayes’ updating of distribution”
Statistical methods. Bayesian approach
• The attitude to the Bayesian approach sometimes makes a religion of the
method. As a result, no attention is paid to applicability of the generic
data/Bayesian priors to the plant to be analyzed.
• Harmonization experience shows that differences in defining IEs are a
major cause of variation in IE frequencies from one source to another
• Example from a PSA for NPP with VVER-1000 – specific data: 2 failures
of HP ECCS pumps for 158 hours (2/158=1.3E-2/hr)
– Bayesian procedure was used with a huge bulk of the US prior data.
– Posterior failure rate – 2.13E-5/hr with EF of 3.
– Statistical uncertainty was only considered, but a variability of the
parameter due to differences in design was not taken into account
Plant specific data vs. combination
• When events are recorded the statistical significance can be measured
by the range from the 5th to the 95th percentile confidence bounds
• For a Poisson distribution, this range is on the order of 10, with 2-3
Russian practice:
• For 3 events and more the sample is very loosely comparable to the
lognormal with an EF of 3 => to use only plant specific data.
• This “three event rule” - to make decision whether generic data should
be involved in parameter estimation
Common cause failures
Is it necessary to improve CCF data in PSA?
• CCFs can contribute significantly to the overall risk from NPP, usually
much more that independent failures
• Paradox 1 – reviews usually provide far more comments on reliability data
• Paradox 2 - CCF parameters are usually estimated statistically based on
operational experience associated with one-off type events
• Ignoring CCFs => Example from the reliability calculation of a scram
system designed for VVER-1000. Probability of system failure < 1Е-300
• Operational experience - Surry NPP in USA: multiple hanging of control
rods due to lubricant solidification in scram breakers.
• CCFs – the measure of our ignorance
• Parameters of CCF models – the result of consensus
Data analysis
Special event probabilities
• As a rule - not considered in rules and procedures
• Often - not documented well in PSAs
Special events:
• Typically, apart from IEs, failures, unavailabilities, HEs
• Included in the probabilistic model
• Characterized by probabilities
• Define possibility of some special conditions or system configurations
• Component failures which can not be quantified using statistical
Data analysis
Special event probabilities
Examples of some special events:
• Clogging of the containment sump (safety injection pumps) by primary
thermal insulation following a break of a reactor coolant pipe
• Reactor pressure vessel rupture given an overcooling
System configuration:
• Grid breakdown due to unit disconnection
• Break in a non-isolable part of the primary circuit (use of MIVs at
some plants)
• Specific cause of a scram actuation (signal structure for different IEs)
• Influence of the adjacent unit
• Impact of steam floods, etc.
Do we need guidance to treat them?
• Guidance on data processing needs to be more specific: NUREG/CR6823 is a good example
• Special attention should be paid to coordination with other PSA tasks
and plant staff involved in data collection/treatment
• Frequencies of large/medium LOCAs need to be harmonized
• Is it necessary to perform a PSA for trial operation of a unit?
• Event allocation is a week point of many PSAs
• Is it necessary to develop a PSA software tool that has a built-in
capability to perform Bayes’ updating or something like that?
• Criterion for use only plant specific data to avoid expert judgment?
• There is a need in data on software reliability
• CCF data. Have we reached perfection?
• Special events. Do we need guidance to treat them?

Approaches to ensure PSA quality processing PSA