key: cord-0060484-wm3bxmce
authors: Signoret, Jean-Pierre; Leroy, Alain
title: Dependent and Common Cause Failures
date: 2020-11-09
journal: Reliability Assessment of Safety and Production Systems
DOI: 10.1007/978-3-030-64708-7_5
sha: 05c42223275d209f3c8d59e2b76eeb44880ac01b
doc_id: 60484
cord_uid: wm3bxmce

Definition and description of dependent failures and common cause failures are provided. Several classifications (extrinsic/intrinsic, logic/lineage/dynamic, tangible/non-tangible) are proposed and the link between dependent and common cause failures is explained. Several examples from the real life are given to illustrate these definitions. The need for sharing the common cause failure data collection between several operators from several countries is highlighted. The beta-factor model and the shock model which can be implemented within systemic approaches (Boolean, Markov or Petri nets) and which belong to the most frequently used parametric models for common cause failure modelling are then described.

Unfortunately, the first solution is often not tractable because such items do not exist or are highly expensive. Therefore, the second solution is generally implemented in industry.

For example, the probability of failure, Pr(S), of a system S comprising 2 independent redundant items I 1 and I 2 can be calculated from the probabilities of failure, Pr(Ī 1 ) and Pr(Ī 2 )., by the following formula:

Pr(S) = Pr(Ī 1 ∩Ī 2 ) = Pr(Ī 1 ) × Pr(Ī 2 ) (5.1)

If it is required that Pr(S, T ) ≤ 10 −6 over a given time interval [0, T ], applying formula (5.1) shows that the use of two identical redundant items having a failure probability Pr(Ī 1 , T ) = Pr(Ī 2 , T ) ≤ 10 −3 would allow to reach the target. Formula 5.1 can be extended to any number of redundant items and it seems that reaching a high level of success (i.e. a low probability of failure) is just a matter of adding up a sufficient number of redundant items.

Unfortunately, this is an illusion as the items are not necessarily fully independent and in this case formula 5.1 is no longer valid. For example, whenĪ 1 andĪ 2 are not independent, the following formula 5.2 has to be used instead:

In this formula, Pr(Ī 1 |Ī 2 ) denotes the conditional probability ofĪ 1 given thatĪ 2 has occurred. This implies:

WhenĪ 1 andĪ 2 are independent, formula 5.3 leads to Pr(Ī 1 |Ī 2 ) = Pr(Ī 1 ). This result constitutes a criterion of independency betweenĪ 1 andĪ 2 .

In another way, Pr(Ī 2 |Ī 1 ) can be calculated by the following formula:

Then, whenĪ 1 andĪ 2 are dependent, the dependency exists on both sides: the occurrence ofĪ 1 depends of the occurrence ofĪ 2 and vice versa. Therefore, dependencies described by conditional probabilities such as Pr(Ī 1 |Ī 2 ) = Pr(Ī 1 ) or Pr(Ī 2 |Ī 1 ) = Pr(Ī 2 ) do not imply any link of causality between the eventsĪ 1 andĪ 2 , but rather implies a positive or a negative mutual influence between them. These dependencies are the symptom of underlying (or root) causes affecting both items. When the impact is detrimental, these underlying causes are called common cause failures.

However, an underlying cause is often considered to be a "true" common cause failure (CCF) only if the failures of two or more individual items impacted by this cause occur within a limited interval of time.

Let us consider that I 1 and I 2 are two bulbs which fail to light. This can be due to a loss of the electrical power supply,Ē, which obviously is a common cause leading toĪ 1 andĪ 2 . Therefore,Ē is a cause ofĪ 1 and a cause ofĪ 2 and this is becauseĪ 1 and I 2 share this common cause that they are not independent.

Pr(Ī 1 |Ī 2 ) being greater than Pr(Ī 1 ) most of the time (positive dependency), using formula 5.1 instead of formula 5.2 leads to non-conservative results. Of course, the more dependent the events, the most optimistic formula 5.1. This is the core of the dependent/common cause failures difficulties: as soon as the items are linked by any dependency, the system failure probability cannot be assessed just by multiplying the item failure probabilities: conditional probabilities have to be considered instead.

As already described in Chap. 4 (see Fig. 4 .6), the item failures being never completely independent, decreasing the probability of failure of a system just by increasing the redundancy is limited by existing dependencies due to common cause failures (CCFs). That means that the impact of CCFs increases when the probability of the system failure decreases. At the limit, a CCF negligible for an individual item can become the topmost contributor when redundancy is high. Then, the dependent failures and their underlying common causes often constitute the weak points of redundant systems.

Therefore, it is important to take CCFs into account but, except in explicit (tangible) cases (e.g. the loss of electrical power supply analysed above), assessing conditional probabilities is usually difficult. This is why approximated models have been developed to take them into account to some extent and to provide safeguards against non-conservative results.

As pointed out by Modarres Modarres et al. (2017) , "there is no unique and universal definition for CCFs". However, a fairly general definition of CCF is given by Mosley Mosley et al. (1988) as "a subset of dependent events in which two or more component fault states exist at the same time or in a short time interval, and are direct results of a shared cause".

In the absence of generally accepted definition of CCF, as already mentioned, many information media such as standards, guidelines and textbooks have proposed their own more or less equivalent definition often depending on their application domain and context (Rausand 2014) . Among them are the theses cited in Chap. 4 in accordance with the IEV 192 standard IEC 60050-192 (2015) as follows:

Common cause failures (CCFs): failures of multiple items, which would otherwise be considered independent of one another, resulting from a single cause .

Some other publications define the CCFs by the mechanism leading to them. For example, in Mosley et al. (1988) :

• A susceptibility for items to fail because of a particular root cause.

• A coupling mechanism (or factor).

The root cause is the most basic cause of the failure that, if corrected, would prevent its occurrence (e.g. abnormal environmental stress, design inadequacy, human actions, procedure inadequacy).

The coupling factor characterizes components susceptible to the same causal mechanism which creates the condition for multiple items to fail (e.g. shared hardware design, shared maintenance/test schedule).

In addition, it is expected that the failures of two or more individual items due to CCFs occur within a limited interval of time.

It has to be noted that common cause failures can lead to common mode failures which are defined as follows in the IEV 192:

Common mode failures (CMFs): failures of different items characterized by the same failure mode .

Moreover, the distinction between common cause and common mode failure may be difficult and some sources do not hesitate to consider CMFs as a sub-category of CCFs.

Dependencies can be classified in many ways with regards to the type of failures. This is analysed hereafter where useful classifications are identified.

The first classical dependency classification consists to split them as follows:

• Dependencies intrinsic to the system. These dependencies find their roots in the design of the system. There are two classes:

-Functional dependencies: the functional status of an item depends on the functional status of another one. Failures due to functional dependencies are often due to design errors. -Cascade failures or propagating failures:the failure of one item is the cause of the failure of another item which, in turn, can be the cause of failure of another item, etc. The cascading effect is often identified and taken into account by designers and operators.

• Dependencies extrinsic to the system. Typical extrinsic dependencies are:

-Loss of support systems (e.g. loss of hydraulic, pneumatic, electric power).

-Abnormal working environmental conditions (e.g. operating conditions exceeding the design ones). -Aggression from environment (e.g. excessive heat, corrosive atmosphere, lightning) impacting items located in the same area. -Item inadequacy (e.g. off specifications due to poor manufacturing). -Human interactions (e.g. items maintained by the same maintenance team). This classification is useful to identify the origin of the dependencies and then of the common cause failures.

A second classification can be to split the dependencies between:

• Logic dependencies: they belong to the Boolean framework and can be described by conditional probabilities as this has been done in 5.1.1. The impacted items immediately fail. • Dynamic dependencies: the effect of the underlying common cause is softer. It does not immediately trigger the failures of the impacted items but increases their probabilities of occurrence (e.g. the failure of the air conditioning increases the failure rate of electronic devices which, in turn, decreases the time to failure of the system). Dynamic dependencies can be intrinsic (e.g. functional dependency, soft cascade failure) or extrinsic (e.g. abnormal environment). When the effects begin to be perceptible in a short interval of time, they can be assimilated to "true" CCFs. Due to the non-immediate effect, this type of dependency is sometimes named semi-catastrophic. • "Lineage" 1 dependencies: they are linked to common causes impacting in the same way the probabilistic parameters of all the related components (e.g. when they are got from a bad batch or a good batch of components).

This classification is useful with regards to the available probabilistic models allowing to take them into consideration: the logic dependencies can be handled by Boolean family models (e.g. reliability block diagrams, fault trees, see Part 3 of the book), the dynamic dependencies by dynamic models (e.g. Markovian or Petri net approaches, see Part 4 of the book) and the lineage dependencies by using the uncertainty propagation techniques (see e.g. Chaps. 25, 32, 26 or 38). It has to be noted that the item failures coming from a CCF are expected to occur within a limited time interval (see 5.1.1 and 5.1.2). Therefore, the qualification of CCFs strongly depends of what is meant by "limited time interval". This is in particular the case for dynamic and lineage dependencies. For example, if an external event multiplies by 100 the item failure rates, it is likely to appear as a common cause of failure but, if it multiplies the item failure rates only by 2, it is likely to remain unnoticed. In fact, there is a continuum between failures occurring at the same time and failures occurring over a time interval and the qualification of "true" CCF may be a bit subjective.

A third classification can be to split the dependencies between:

• Tangible or explicit dependencies: they are the result of causes which can be clearly identified by performing in-depth system analyses (e.g. loss of power supply, pipe plugging, fire, flooding, etc.). • Non-tangible or non-explicit dependencies: they are the result of causes difficult to apprehend due to the absence of field feedback or ignorance of the phenomena (epistemic problems).

This classification is useful with regards to the choice of the techniques used to take dependencies into account in the probabilistic modelling and calculations:

• The tangible common cause failures are easy to identify and should be identified, analysed and processed as any other event in the safety and dependability models developed in this book. • The non-tangible common cause failures are difficult to identify or to qualify and generally constitute the residual failures whose causes cannot be explicitly modelled. However, they cannot be ignored without risking over-optimistic evaluations. Then broad approaches have been developed (e.g. beta-factor model, shock model) aiming to raise safeguards preventing to underestimate their impact.

This chapter could not be complete without talking about the human factor which is often considered as the main source of common cause failures. The human interactions with the systems can be roughly classified as:

• Errors in the design, construction and installations: they generally belong to the non-tangible dependencies mentioned above. • Errors in operation: they generally belong to tangible dependencies (e.g. shutting off a manual valve on the tapping of a sensor) and, as such, can be identified and quantified on their own.

Nowadays, the components and systems become more and more smart. That means that more and more software is introduced and software errors now tend to supersede the human factor as source of common cause failures. Unfortunately, no really effective approaches have been developed yet to properly model the interactions between hardware, human factor and software. Then, the software "reliability" is generally evaluated separately by implementing specific techniques (see IEC 62628 (2012)) and often in a purely qualitative way (see IEC 61508 (2010)). This is why, within the reliability model of a system, the software is generally considered as a single isolated item constituting an extrinsic dependency. Without further analyses, it can even be considered as a non-tangible common cause of failure in many cases. Therefore, the software should be, at least, considered through the broad models (e.g. beta-factor model, shock model) by adjusting the parameters (e.g. by engineering judgment) of these models.

The impact of CCFs is not a purely theoretical view and they are involved in many situations which frequently occur. The last in date when writing this book is the emergence of the COVID-19 which in few days has locked almost all the activities of the countries over the planet and has killed several hundreds of thousands of people. This pandemic is the last case of calamity which all over the centuries, unfortunately, periodically impact a part of the planet: plague, leprosy, Ebola, influenza (Spanish, Asian, avian), HIV, SARS, etc. or starvation due to bad climatic conditions.

Hereafter are briefly described some typical accidents where CCFs are involved and some CCFs detected from reliability data collection.

Opening newspapers is all you have to do to hear about the CCFs: they frequently inform about incidents/accidents where CCFs are involved. Hereafter are described some illustrative examples.

Apollo 13 lunar module was provided with 2 oxygen tanks (Jones 2016). On April 13th 1970, one oxygen tank exploded and the explosion destroyed the second oxygen tank, causing the failure of the mission. The 3-man crew managed to return safely to the Earth.

The failure of the second tank is a cascade failure.

On March 3rd 1974, a McDonnell Douglas DC-10 airplane crashed into the Ermenonville forest outside Paris (France) (JO 1976) . The cargo doors of the airplane were designed to open outwards under pressure but a specific latching system was provided to lock shut each of the cargo doors under pressure. After take-off, one of the latching systems failed and the rear left cargo door opened. A section of the cabin floor was ejected. However, even with such damages, the pilots could keep the airplane under control. Unfortunately, all the redundant control cables ran beneath the damaged floor and the pilots were then unable to maintain the airplane in the air. 346 people were killed. A proper zone analysis would have identified that the redundant control cables are located in the same area, thus a design change would have prevented the loss of control of the airplane.

On March 28th 1979, an accident has occurred in unit 2 of the Three Mile Island nuclear power plant and the core partially melted down (Rogovin and Frampton 1979) . This was the result of a combination of equipment malfunctions, designrelated problems and human errors. Among the causes, several pressuriser valves had been mispositioned after a periodic proof test and this constitutes a typical example of common cause failure due to human error. Although no injury had been reported, this accident is often mentioned as a textbook case of common cause failures due to human factor.

A better training of the operators may decrease the probability of such an event.

The Viking Sky is a cruise ship launched in 2016 (Wikipedia Viking Sky 2019). Despite the storm warnings which had been issued, she was sailing from Tromsø to Stavanger (Norway) on March 23th 2019 with 1,373 people on board when she suffered from the failure of her four engines. A loss of lubrication oil pressure has been the common cause for the shutdown of the four engines. Due to rough conditions, the rescue was difficult (another common cause!). 470 passengers have been evacuated by helicopter before that three engines have been restarted in the night of March 24th and that the Viking Sky has been able to sail again. Sixteen people have been injured, three of them seriously. Avoiding common utilities (here the lubrication system) is a good way to prevent such common cause of failures.

The electric blackouts are a typical common cause of loss of electrical power supply for many people at the same time and at the level of towns, regions or even whole countries. The blackout is often itself the result of cascading failures from a single cause (e.g. high-voltage line break, electrical power plant shutdown, over consumption leading to protective circuit breaker opening). This arises rather often and see, for example, Wikipedia Blackouts (2020) or Wikipedia Outages (2020) Beyond the technical failures, the above list highlights the impact of meteorological or environmental (e.g. seism or solar winds) conditions as a source of common cause failures.

The reliability data collection is also an opportunity to identify common cause failures as this is highlighted with the two examples presented hereafter.

A data collection and analysis of CCF data was performed in 2014-2015 on offshore platforms in the North Sea on several safety items (Hauge et al. 2015) . Some of the CCF events recorded for Pressure Safety Valves (PSV) involved:

• Pilot exhaust lines plugged.

• Rust within the valves.

That means that, when analysing the collected data related to similar items having failed within a short interval of time, it has been discovered that these failures were due to a true CCF rather than to independent failures.

The Organisation for Economic Co-Operation and Development (OECD) Nuclear Energy Agency (NEA) has set up the International Common Cause Data Exchange (ICDE) Project to collect and analyse CCF events on nuclear power plants.

The main causes of CCF for batteries are (NEA/CSNI/R (2003)19 2003):

• Battery design or manufacture inadequacy.

• Maintenance-induced failures.

• Internal malfunction.

Once more, that means that when analysing the collected data related to similar items having failed within a short interval of time, it has been discovered that these failures were due to a true CCF rather than to independent failures.

When analysing a system, it is important to identify the potential CCFs as soon as possible to prevent them as far as possible. This is mainly important with regards to redundancy which is likely to be seriously impeded when CCFs are present. Then a basic principle in dependent failure identification is to consider that dependent failures can occur as soon as items are on redundancy.

Any of the analysis techniques described in this book offers, to some extent, opportunities to identify common cause failure candidates provided that they are oriented toward this purpose. Specific approaches have also been designed for CCF identification. Hereafter are described three simple approaches which can be used for this purpose:

All the inductive approaches described in Chaps. 7-12 can be used. For example, hazard identification methods such as checklists (Chap. 11), Preliminary Hazard Analysis (Chap. 8) or even FMEA (Chap. 10) prove to be effective for identifying potential common cause failures (e.g. external events or possible cascade failures).

Reliability block diagram (Chap. 15) or fault tree (Chap. 16) approaches can also be used through the thorough analysis of each minimal cut set (see Chap. 16) with an order equal to or greater than two. This provides an effective technique to help to identify which causes are good candidates to produce CCF (see Chap. 17).

The zone analysis (Desroches et al. 2015 ) is a specific technique focused on the identification of failure modes and scenarios of accidents by considering the geographical arrangement of the system all over its mission. It is the unique method for determining what could happen on redundant items located close to each other, for example, in the same room or the same cabinet (e.g. fire, flooding, overheating, corrosion).

As for any other probabilistic calculations, no relevant result involving common cause failures could be obtained without sound estimations of the related probabilistic parameters. Like other probabilistic parameters, it is expected to obtain them from statistical analysis of field feedback collected through reliability data collection systems (see Chap. 38).

However, CCF events are typical rare events (far scarcer than events involved for ordinary reliability data collection). Then it is beyond the capability of a single operator to collect CCF data alone: it is therefore essential to collect and combine the CCF field feedback from many operators from different countries to perform meaningful estimations. This implies to implement rigorous data collection frameworks to gather, exchange and combine CCF data. As presented hereafter, this is done, for example, in nuclear industry and oil and gas industry.

Experience on CCF is scarce and most of it is coming from the nuclear industry:

• The US Nuclear Regulatory Commission issues many documents:

-explaining how to code, collect and elaborate CCF data (NUREG/CR 4780 1988; NUREG/CR-5485 1998; NUREG/CR 6268 Rev. 1 2007) ; -providing CCF parameters about valves, pumps, heat exchangers, etc.

(NUREG/CR-5497 2016).

• The Nuclear Energy Agency (BEA) from OECD manages an international joint project focused on CCF data exchange and also issues documents like NEA/CSNI/R (2003)19 (2003) In the oil and gas industry, works are also achieved to collect and elaborate CCF data. For example, Hauge et al. (2013) provides CCF data elaborated mainly from the OREDA database (OREDA 2020). This database shared between several operators implements the ISO 14224 (2016) standard which provides some guidance about CCF data and has been developed to collect and exchange reliability data in the oil and gas industry.

The latest CCF data collection exercises related to safety instrumented systems (SINTEF A26922 2015) show that:

• The limited time interval to consider in the definition of CCF is of 1 year (the most common interval between proof tests, see Chap. 36). • Due to the limited information on CCF frequency, only approaches with a low number of parameters are reasonably usable: the higher the number of parameters of the method, the higher the number of assumptions needed to implement the method. • There are few CCF events on the items belonging to the same system so the past data collection exercises have been extended to identical items belonging to different systems. As a consequence, additional assumptions have been introduced for calculating the parameters of the various methods.

It has to be noted that, when dealing with safety systems, the unavailability is caused by dangerous failures and mainly by the dangerous undetected failures. Then, a proper identification and quantification of CCF is crucial for dangerous undetected failures. This is why the data collection exercises mentioned in reference SINTEF A26922 (2015) and NEA/CSNI/R (2003)19 (2003) are focused on such failures. This leads to modelling methods related to dangerous undetected failures. However, they are often considered to be also applicable to dangerous detected failures or spurious failures but this is questionable.

Let us consider a component I 1 belonging to a set of two similar components {I 1 , I 2 }. Its failures can be split into independent failuresĪ I nd 1 and failures due to logic (see 5.1.3) common causesĪ 2 1 betweenĪ 1 andĪ 2 . This leads to:

If the same component belongs to a set of three similar components {I 1 , I 2 , I 3 }, its failures can be split in addition into the common causesĪ 3 1 betweenĪ 1 andĪ 3 and also into the common causesĪ 2,3 1 betweenĪ 1 ,Ī 2 andĪ 3 . This leads to:

It has to be noted that in the above formulaeĪ 2 1 ≡Ī 1 2 ,Ī 3 1 ≡Ī 1 3 andĪ 2,3 1 ≡Ī 1,3 2 ≡ I 1,2 3 For a set of four components, this leads to:

And so on: even if some terms do not exist, their number increases very quickly when the size of the set of components increases and, except when tangible common causes are clearly identified, it is generally not really possible to take into account all the events involved in the above formulae. This is why simplified broad parametric models involving a limited number of terms have been developed to model the impacts of logic CCFs on system failures.

They are the main models used for CCF analyses and they aim to:

• provide a realistic prediction (i.e. conservative but not too much) of the probability of failure of systems comprising redundant components or, more generally, involving minimal cut sets of an order greater than one; • prevent to perform unrealistic non-conservative predictions in case of non-tangible CCF; • aid to determine the weak points involving CCFs and identify which defences against them are the most efficient.

The broad CCF modelling methods can be classified according to the number of required parameters (single or multiple) and according to the impact on item failures (i.e. if they occur with a probability equal to one or not).

The main broad models described hereafter comprise the beta-factor model (single parameter model) and the shock model (three parameters). Both models are devoted to logic CCFs and are recommended by IEC 61508-6 (2010).

This is the simplest parametric model. It is described in IEC 61508-6 (2010), ISO/TR 12489 (2013) , Humphreys and Proc (1987) and SINTEF A26922 (2015) . This approach considers that the failure rate λ of an item is the sum of an independent failure rate λ I nd and of a CCF failure rate λ cc f :

Then the beta factor, β, is defined as the ratio of the CCF failure rate to the (total) failure rate (i.e. the failure rate generally found in the reliability data handbooks OREDA (2015)):

With: 1 > β > 0 Then, from this definition:

Values up to 10% can be considered for the beta factor (IEC 61508-6 2010, Table  D1 ).

This model is interesting because it can be easily modelled by Markov graphs (Chap. 31), Petri nets (Chap. 33) and Boolean models (Chap. 23).

It has to be noted that the same definition is also used to split the probability to fail due to a demand γ such as:

With:

This method being very straightforward is well known and widely used. It can be easily implemented in the various models (reliability block diagrams, fault trees, Markovian approach or Petri nets) described in the other parts of this book.

Its main drawbacks are:

• without further additional assumptions, it can be used only with items exhibiting the same failure rate; • it only models the failure of all the impacted items.

Compared to the formula established in 5.3.1, this approach consists in keeping only the last terms: Of course, the beta-factor model can be extended to more than one parameter in order to take other terms into account as this is done in various ways by the other modelling methods mentioned in 5.5.4.

The shock model, also named the binomial failure rate (BFR) model (Atwood 1986), is a three parameter parametric CCF model.

• ω: occurrence rate for lethal shocks.

• ρ: occurrence rate for non-lethal shocks.

• γ : conditional probability of failure of each item, given a non-lethal shock.

According to the definition of the above parameters, the common cause failure rate, λ cc f , of a given item impacted by a shock (lethal or non-lethal) is given by:

And the total failure rate of this item can be written as:

Like for the beta-factor model, λ cc f and λ I nd can be expressed as percentages of the item failure rate, λ:

• λ cc f = β · λ For the shock model, λ cc f can be split in turn with regards to the lethal and non-lethal shocks:

• Lethal failure rate: ω = β LC · λ.

• Non-lethal failure rate: λ nLc = γ · ρ = β nLC · λ.

This implies:

And finally:

Methods for assessing the parameters of the shock model are given in IEC 61508-6 (2010) (annex D), ISO/TR 12489 (2013) and Leroy (2018) . When a collection of N similar items is affected by a lethal shock, all of the N items fail and this has a similar impact as in the beta-factor model described above. This is the same for the non-lethal shock and when γ = 0, γ = 1or ρ = 0, this approach is equivalent to the beta-factor model which takes only lethal shocks into account. As explained above for the beta factor, this can be easily modelled by Markov graphs (Chap. 31), Petri nets (Chap. 33) or Boolean models (Chap. 23).

Concerning the non-lethal shock, the probability of failure between t and t + dt of one impacted item is given by γ · ρ · e −ρ.t dt: if this can be easily modelled by Markov graphs (Chap. 31) or Petri nets (Chap. 33), this is not directly possible by Boolean models (Chap. 23). Fortunately, integrating this formula over an interval [0, t] gives the probability of failure of an impacted item as γ · 1 − e −ρ.t . This formula, which combines the probability that the non-lethal shock has occurred over [0, t] and the probability of failure of the impacted item, can be handled by Boolean models (see Chap. 23).

Nevertheless, this is an approximated approach which should be used only when the CCFs cannot be clearly identified or when the related reliability data are not easily available (e.g. non-tangible CCFs). When a collection of N similar items is affected by a non-lethal shock, ρ, the probability that k among them fail is equal to C k N · γ k (1 − γ ) N −k where C k N is the number of combinations of k items among N. This binomial formula explains the name of binomial failure rate given to the approach.

With regards to a given item this implies:

because encompassing all the possible cases of failure (from 0 to N − 1) of the N − 1 remaining items. This gives:

And then:

The above formulae can be simplified because, when dealing with a collection of numerous components affected by the same non-lethal shock, the probability of e.g. four component failures is negligible: this has never been observed for non-tangible CCFs and this is generally realistic for industrial systems. A conservative approach may be to consider that the quadruple failures due to a non-lethal shock are certainly at least lower than 10 times the double failures due to the same non-lethal shock.

When γ < 0.1 then γ 1−γ ≈ γ and choosing γ ≈

is certainly a conservative assumption. This allows to "tune" the value of γ to make the impact of CCF vanishing when the number of impacted items, actually failing on a non-lethal shock, increases. Finally, for a set of N similar items with the same global failure rate, λ, and impacted by a non-lethal shock, the parameters can be estimated as follows:

• Estimation of the global beta factor: β.

• Independent failure rate: λ I nd = (1 − β) · λ.

• Estimation of the lethal part of the beta factor:β LC . • Lethal failure rate: λ Lc = β Lc · λ.

• Non-lethal part of the beta factor:β nLC = β − β LC . • Non-lethal shock failure rate: λ nLc = β nLc · λ.

• Lethal failure rate: ω = β LC · λ.

• Non-lethal failure rate: λ nLc = β nLC · λ.

• Conditional probability of failure on non-lethal shock: γ ≈ • Occurrence rate of non-lethal shock: ρ = λ nLc γ · . The above approach is effective when a great number of items are impacted by a non-lethal shock (e.g. a water hammer in hydraulic system): it allows to introduce the repartition between lethal and non-lethal shocks in order that the beta factor be kept constant and that, beyond the double or triple failures, the contribution of unrealistic multiple failures is neglected.

Then, the shock model is closer to the physical reality than the simple beta-factor model. The price to pay is the need for three parameters, β LC , ρ and γ instead of one. But this number is independent of the number of items affected in the system.

From the seventies until now, several methods have been developed to extend the beta-factor model as, for example:

• Basic parameter model (NUREG/CR-5485 1998) : it involves all the conditional probabilities identified in formula 5.7. As they are normally not readily available, other models with less stringent requirements on data were developed. • Multiple Greek letter model (Fleming 1989; NUREG/CR-5497 2016) : it is one of the models used by the US Nuclear Regulatory Commission for the assessment of the parameters of the CCFs. • Alpha-factor model (NUREG/CR-5485 1998; NUREG/CR 6268 2007; NUREG/CR-5497 2016) : it is also one of the models used by the US Nuclear Regulatory Commission for the assessment of the parameters of the CCFs. • PDS method: developed for safety instrumented systems within the framework of the offshore petroleum industry (STF50 A0603 2006).

The above approaches are designed to be implemented at the level of a whole group of impacted items and they model how, among m impacted items, 2, 3, 4 … items fail. They are interesting, on a case-by-case basis, to model the CCFs of such groups. However, they cannot be easily implemented into systemic models (e.g. reliability block diagrams, fault trees or Petri nets) for modelling large systems involving many other items. This is why they are not described in more detail in this book.

The binomial failure rate common cause model

La gestion des risques, Principes et pratiques

Parametric models for common cause failure analysis in advanced seminar on common cause failure analysis in probabilistic safety assessment

Reliability prediction method for safety instrumented systems-PDS method handbook

Common cause failures in safety instrumented systems-beta-factor and equipment specific check-lists based on operational experience

Assigning a numerical value to the beta factor common-cause evaluation

International electrotechnical vocabulary-Part 192: Dependability

Functional safety: safety of electrical/electronic/programmable/electronic safety related systems

(2016) Petroleum, petrochemical and natural gas industries. Collection and exchange of reliability and maintenance data for equipment

Rapport final de la Commission d'Enquête sur l'accident de l'avion D.C. 10 TC-JAV des Turkish Airlines survenu à Ermenonville le 3 mars 1974. (Année 1976. N°27. 12 mai 1976) Journal Officiel de la République Française Jones H (2016) Common cause failures and ultra reliability

Production availability and reliability. use in the oil and gas industry, 1st edn

ICDE Project report: collection and analysis of common-cause failures of batteries

NUREG/CR-5485 (1998) Guidelines on modelling common-cause failures in probabilistic risk assessment, US Nuclear Regulatory Commission

Common-cause failure database and analysis system: event data collection, classification, and coding US Nuclear Regulatory Commission, Washington OREDA Handbook (2015) Ed. 6.0 Offshore and Onshore reliability data. Prepared by SINTEF and NTNU

Reliability of safety-critical systems: theory and applications

Three mile Island: a report to the commissioners and to the public