key: cord-0007094-77u3q43r
authors: Bakir, Niyazi O.; Savachkin, Alex; Uribe-Sanchez, Andrés
title: Two countermeasure strategies to mitigate random disruptions in capacitated systems
date: 2010-03-06
journal: J Syst Sci Syst Eng
DOI: 10.1007/s11518-010-5125-y
sha: 93755aa3e2418836e504cf17e8dff50a019a3372
doc_id: 7094
cord_uid: 77u3q43r

We examine a capacitated system exposed to random stepwise capacity disruptions with exponentially distributed interarrival times and uniformly distributed magnitudes. We explore two countermeasure policies for a risk-neutral decision maker who seeks to maximize the long-run average reward. A one-phase policy considers implementation of countermeasures throughout the entirety of a disruption cycle. The results of this analysis form a basis for a two-phase model which implements countermeasures during only a fraction of a disruption cycle. We present an extensive numerical analysis as well as a sensitivity study on the fluctuations of some system parameter values.

Lean manufacturing philosophy and associated business practices have been widely embraced and deployed by global enterprises. Some estimates assert that the shift to JIT scheduling in the US automotive industry has saved companies more than $1 billion a year in inventory costs, alone. While lean manufacturing has substantially boosted operational efficiency, it has also left enterprises operating in an increasingly risk-encumbered environment. Capacity disruptions triggered by forces of nature, property-and process-related hazards, and man-made interventions have proven to be the most profound influence on enterprise risk. As evidenced in 1995, an earthquake hit the port town of Kobe, Japan, razed to the ground 100,000 buildings and shut down Japan's largest port for over two years. In 1999, an earthquake in Taiwan displaced power lines to the semiconductor fabrication facilities responsible for more than 50 percent of the worldwide supplies of certain computer components, and shaved 5 percent off earnings for major hardware manufacturers including Dell, Apple, Hewlett-Packard, IBM, and Compaq (Wilcox 1999). In September 2002, longshoremen on the US West Coast were locked out in a labor strike for 11 days, forcing the shutdown of 29 ports. With more than $300 billion of dollars in goods shipped annually through these ports, the dispute caused between $11 and $22 billion in lost sales, spoiled perishables and underutilized capacity (Isidore 2002 (Berniker 2003) . Man-made disasters are on the rise, from terrorist attacks to computer viruses . As a result of the above events, according to a recent survey by A.M. Best Company, Inc. of 600 executives, 69 percent of chief financial officers, treasurers and risk managers at Global 1,000 companies in North America and Europe view property-related hazards-such as fires and explosions--and supply chain disruptions as the leading threats to top revenue sources (A.M. Best Company 2006).

Historically, enterprises have lacked appropriate decision support methodologies and computational tools suitable for addressing risk incurred through capacity disruptions. In academia, traditional research efforts on minimizing the cost of supply chain operations and the focus on leveraging economies of scale often yield results that overconcentrate resources. Such optimal solutions can be very sensitive to Lee et al. (1997) ), whereas only a small fraction of the efforts have been dedicated to modeling the impact of various disruptions, such as those affecting demand patterns, supplier and production lead times, prices, imperfect process quality, process yield, and other factors.

One of the most common types of disruption appearing in the literature is that of supply rate changes. An excellent work by Arreola-Risa & DeCroix (1998) explores inventory management of stochastic demand systems, where the product supply is disrupted for periods of random duration. The classic economic order quantity (EOQ) problem with supply disruptions is studied by Parlar & Berkin (1991) and Parlar & Perry (1996) consider a order-quantity/ reorder-point inventory models with two suppliers subject to independent disruptions to compute the exact form of the average cost expression. Mohebbi (2003) presents an analytical model for computing the stationary distribution of the on-hand inventory in a continuous-review inventory system with compound Poisson demand, Erlang distributed lead time, and lost sales, where the supplier can assume one of the two "available" and "unavailable" states at any point in time according to a continuous-time Markov chain. Papers addressing both supply disruptions and random demand include (Chao 1987 , Parlar 1997 , Song & Zipkin 1996 . Chao (1987) J Syst Sci Syst Eng proposes a dynamic model concerning optimal inventory policies in the presence of market disruptions, which are often characterized by events with uncertain arrival time, severity and duration. Parlar (1997) considers a continuousreview stochastic inventory problem with random demand and random lead-time where supply may be disrupted due to machine breakdowns, strikes or other randomly occurring events. Song & Zipkin (1996) , explore an inventory-control model which includes a detailed Markovian model of the resupply system. A number of papers which address supply and demand changes have been developed in the field of oil stockpiling, as there has been grave concern over the oil supply from the Middle East (Teisberg 1981, Chap & Manne 1982 , Murphy et al. 1987 ). Modeling production rate disruptions (machine failures) has been largely addressed by extending classical economic manufacturing quantity (EMQ) models. Rosenblatt & Lee (1986) derive an EMQ model when the production process is subject to a random deterioration from an in-control state to an out-of control state. Lee (1992) models the defect-generating process in the semiconductor wafer probe process to determine an optimal lot size, which reduces the average processing time on a critical resource. Abboud (1997) presents a simple approximation of the EMQ model with Poisson machine breakdowns and low failure rate. study an unreliable production system with constant demand and random breakdowns, with the focus on the effects of machine failure and repair on optimal lot-sizing decisions. Assuming exponentially distributed time between failures and instantaneous repair of the machine, authors derive some unique properties of their model compared to the classical EMQ model. extend their earlier work in to the case where repair times are randomly distributed and excess demand is lost. Kim & Hong (1997) propose an extension to the model in , which determines an optimal lot size when a machine is subject to random failures and the time to repair is constant. They formulate average cost functions for the optimal lot size, and derive conditions for determining the optimal lot size. Hopp et al. (1989) presents a model that assumes the (s, S) control policy. With Poisson failures and exponential repair times, a cost function is derived. Rahim (1994) presents an integrated model for determining an economic manufacturing quantity, inspection schedule and control chart design of an imperfect production process, where he assumes that the process is subject to the occurrence of a non-Markovian shock having an increasing failure rate. Among other notable examples of such works are Henig & Gerchak (1990), Bielecki & Kumar (1988) , Buzacott & Shantikumar (1993) . Finally, Abboud (2001) examines a single machine production and inventory system with a deterministic production and demand rate, when the machine is subject to random failures. The author models the production/inventory system as a Markov chain and develops an algorithm to compute the potentials that are used to formulate the cost function.

At this point, we can summarize that research efforts addressing the disruption of supply are still comparatively new and scant.

Most of the open literature considering various types of disruptions focuses on issues of inventory, ordering, production lot sizing, production scheduling, and cost management of inventory, setup, and backorder costs. To the best of our knowledge, there have been no attempts to consider introducing countermeasure policies for mitigating unpredicted capacity disruptions in a capacitated system, and analyze the benefits of such policies for the system manager. Our paper presents an initial attempt to fill the vacuum in this area.

The paper has the following organization. In Section 2, we introduce notation and problem definition. Section 3 presents analysis of a one-phase countermeasure policy, where a risk-neutral decision maker implements countermeasures during the entirety of a disruption cycle, striving to maximize the long-run average reward. These results are used in Section 4 to examine a richer class of policies, where countermeasures are activated during only a fraction of a disruption cycle. In Section 5, we present a numerical analysis for determining the optimal phase threshold and examine the sensitivity of the optimal policy to fluctuations in system parameter values. Finally, Section 6 offers concluding remarks.

For the rest of this paper, we define throughput as the long-run average of the number of item units per unit time processed by a capacitated system, and the available system capacity at time t, t C , is defined as the maximum throughput that system resources are capable of sustaining at t. Consider a lean (i.e., no inventory) system with a target (demand adjusted) capacity * C experiencing periodic random disruptions, each of which may render a full or partial system capacity loss. We assume that disruptions occur one at a time and that the i th occurrence results in an instantaneous loss of magnitude i C Δ in the remaining system capacity. Following the i th disruption at time t, the system capacity remains at level t i C C − Δ until the next disruption unless the remaining capacity falls below a critical level c upon which the system regains all lost capacity back to * C . For the reason of simplicity, in this paper, we assumed instantaneous recovery. The system is assumed to stochastically regenerate at points of recovery ( Figure 1 ). Capacity dynamics as such can be observed in a number of industrial scenarios including, but are not limited to, (i) shortage of repair personnel and performance degradation caused by failing equipment with a full repair upon a complete failure, (ii) non-self-announcing stepwise system failures, and (iii) gradual equipment phaseout and modernization.

are assumed to form a sequence of i.i.d. random variables. The time of the first disruption is denoted by 1 X , and , 2,3, i X i = …denotes the time between (i-1) th and i th disruptions ( Figure 1 ).

We assume that , 2,3,

variables. The time of the n th capacity loss is expressed as

It then follows that c N is the number of capacity disruptions between two successive recovery epochs. As such,

is the time between two successive recovery events, which marks the beginning and the end of a regenerative cycle. A proactive decision maker has a number of mitigation options to reduce the rate of disruptions. When no countermeasures are implemented, he earns ·

where π is a time independent price factor minus item unit cost. Therefore, the revenue in

We assume that a cost of ( ) m λ per unit time is incurred to activate and operate a set of countermeasures that would maintain a rate of λ capacity disruptions per unit time. In this paper, we are not concerned with the description of the nature of specific countermeasure options but rather we focus on the analytics of the disruption rate reducing impact that those options have on the system performance. We assume that the decision maker has a risk-neutral utility function (Keeney & Raiffa 1993) , and thus, our analysis will be based on the limiting long-run average reward as the criterion for policy assessment. 

In this paper, we first consider a one-phase mitigation policy in which countermeasures are activated throughout a regenerative cycle. Later, we will expand the analysis to examine a two-phase model. 

whereas the cycle length is

we will need the following result to compute ( ) c E N , the expected number of capacity loss events per cycle.

We prove by induction. For 1 n = , the result is trivial. Assuming that the result holds for 1 n − , note that 

and 1 1 ( 

Then, the limiting value of long-run average reward is given by the following expression.

In this section, we have considered a one-phase mitigation policy where countermeasures are implemented during the entire disruption cycle. The expression for the limiting long-run average reward (Eq. 4) will serve as a basis for analyzing a two-phase policy in the next section.

Consider the set of policies under which J Syst Sci Syst Eng countermeasures are activated at the beginning of each system cycle and remain in effect as long as system capacity exceeds a certain higher level l c c > , where

. Countermeasures remain deactivated for levels below l c , where the system becomes exposed to "normal" disruption rate. This model is driven by the idea that from the system manager's viewpoint, it is desirable to stay longer in the "on" zone, closer to the target level * C rather than prolong the "off" portion of the cycle. As in Section 2 and 3, c is the critical lower level that triggers instantaneous capacity recovery ( Figure 2) . The system is said to be "on" when countermeasures are in effect and "off" otherwise. Long-run average reward of this altered process exhibits the same convergence property. Therefore, it is our interest in this

A realization of the system capacity dynamics for a two-phase policy. Disruption rate during the "on" phase ( l λ ) is smaller than the disruption rate during the "off" phase ( λ ).

We first derive the distribution of the initial system capacity for the "off" period in a cycle. The following proposition summarizes the result, Proposition 1 Consider a capacitated system in which capacity disruption interarrival times are exponentially distributed with parameter λ , fractional stepwise capacity losses follow a uniform distribution on [0, 1] , and capacity is restored fully and instantaneously upon falling below level c . Suppose that the system is "on" when Likewise, expected total capacity during the "off" period, s C , can be readily obtained after considering the initial capacity level in the "off" period, 0 C . Note that (6) Proof. Using Theorem 3.6.1 in (Ross 1996), we know that long-run average reward converges to,

The proof follows by substituting expressions for ( ) (7). ■

An optimal two-phase policy maximizes long-run average reward by activating countermeasures that set optimal levels of l λ and . respectively. These terms represent the probability distribution of the number of disruptions during the "off" and "on" phases, respectively. Since the mean disruption magnitude is strictly positive and * C (and hence, l c and c ) is finite, both of these terms go to zero as n → ∞ .

Since the disrupted capacity is regained instantaneously at the end of each cycle, the solution to an optimum policy In what follows, we use the initial parameter values as shown in Table 1 .

The cost of countermeasures is assumed to be of the form ( ) ( / ) . The cost decreases as the disruption rate l λ gets higher, which is used to measure effectiveness of countermeasure technology, and the cost increases in r , which is used to model the marginal cost of installing a more effective technology. As Figure 3 illustrates, * ( ) l α α is increasingly decreasing in r . We also observe that as r gets larger, * ( ) l α α exhibits a higher sensitivity to per unit changes in r . As one can see, in flat regions of Figure 3 , reducing the disruption rate is not economically sound. Therefore, we see that expected increase in countermeasure costs over extended periods outweighs the benefits of better technology. Table 1 . Common wisdom, however, suggests that * C shall be positively correlated with the optimum period of activated countermeasures.

Should all items be sold, increasing * C would lead to higher profits and hence, countermeasures should be engaged for longer periods (plot I in Figure 5 , where * C takes values in [0, 0.1]). As * C approaches 0.1, marginal increase in * ( ) l α α falls off sharply.

This suggests that the optimal period of activated countermeasures is insensitive to changes in maximum capacity, if * C is already high. Also, the region of sensitivity of * C is a function of the unit profit. As illustrated in the second plot of π Figure 6 illustrates that * C is insensitive to changes in π around the original parameter value of π =1000 whereas at lower unit profit levels, marginal changes in π render larger perturbations in * ( ). l α α Changes in system capacity for low value items may require more radical changes in countermeasure policy. Nevertheless, the region of sensitivity is relatively small for both π and * , C which suggests on a larger scale that * ( ) l α α is quite robust to changes in system profitability.

Behavior of * ( ) l α α for different values of α and . π

In this manuscript, we presented one of initial attempts to fill the vacuum in the existing literature focused on development of active countermeasure policies for managing lean capacitated systems in the presence of random capacity disruptions. The system under consideration experienced stepwise partial capacity disruptions with exponentially distributed interarrival times and uniformly distributed magnitudes, followed by instantaneous recovery. Examples of such capacity dynamics include: (i) shortage of repair personnel and performance degradation caused by failing equipment with a full repair upon a complete failure, (ii) non-self-announcing stepwise system failures, and (iii) gradual equipment phaseout and modernization.

We explored two different countermeasure policies for a risk-neutral decision maker, who seeks to maximize the long-run average reward. The initial model considered a one-phase policy, where countermeasures were implemented during the entirety of a disruption cycle. The results of this model served as a basis to analyze a two-phase strategy, where countermeasures were activated during only a fraction of a disruption cycle. For the latter model, we aimed to determine the optimal threshold when the countermeasures should be disengaged. In this paper, we are primarily concerned with analytics of the impact that countermeasure options can have on the system performance. In practice, the countermeasure options could range from purely technological solutions, such as installation of fire prevention water sprinkler systems, to non-technological decisions that could, for example, alleviate labor strikes or prevent terrorist attacks or political unrest. In this investigation, we considered two forms of the countermeasure cost functions. Our sensitivity analysis for the two-phase policy reveals that as the system profitability increases and the costs of countermeasures become smaller, the optimal countermeasure policy becomes less sensitive to changes in the system parameter values.

In this paper, we did not address the question of the best critical threshold that initiates immediate capacity recovery, as we assumed that the cost associated with administering any level of α is zero. Therefore, the problem of obtaining the optimal pair * * ( , ) l α α has a trivial solution (i.e., set * α =0). Rather, we aimed to find the optimal time in each regenerative cycle when countermeasures should be terminated given a capacity recovery threshold of .

α Section 5 presented a numerical analysis to determine optimal * ( ) l α α that maximized long-run average reward under various parametric settings. We presented the results of our sensitivity analysis for an exponential cost function. In general, * ( ) l α α was found to be quite sensitive to exponentially increasing cost, as well as capacity and unit profit changes, if the system was already operating with low profit margins. However, as the profitability of the system increased, * ( ) l α α had a robust response to system parameter changes.

In general, capacity disruption risk can be mitigated by reducing the probability of the hazardous events as well as their severity. In this paper, we considered countermeasures that mostly impact the probability of hazardous events rather than their severity. For risks that render partial capacity disruptions, the model recommends implementation of countermeasures during only a fraction of the operational cycle. In many cases, partial capacity disruptions are caused by risks associated with daily operations, such as small fire events and stoppages due to machine failures. For such events, our results can substantiate that certain countermeasures may be cost prohibitive even when they offer significant reduction in the disruption rate. For example, in a manufacturing facility, installation of costly fire extinguishing systems may be disfavored to employee training programs that raise awareness of overall factory cleanliness.

This paper provides one of the initial attempts for providing closed form solutions for optimal countermeasure policies for mitigation of random disruptions in capacitated systems. We hope that the presented models will be further generalized to address similar questions for capacitated systems evolving under more complex capacity dynamics. We also believe that such single-facility models will form a basis J Syst Sci Syst Eng to approach capacity management issues in large enterprise networks. 

We can now derive expected capacity per cycle. Equation (1) gives 

and Control

A discrete-time Markov production-inventory model with machine breakdowns

Managing Business Risk in 2006 and Beyond. In: Protecting Value. Available via DIALOG

Inventory management under random supply disruptions and partial backorders

Coordination and flexibility in supply contracts with options

SARS Morphs new concerns for Asian IT industry

Stochastic Models of Manufacturing Systems

Inventory policy in the presence of market disruptions

An integrated analysis of U.S. oil stockpiling policies

The inventory benefit of shipment coordination and stock rebalancing in a supply chain

Shared-savings contracts for indirect materials in supply chains: channel profits and environmental impacts

Production lot sizing with machine breakdowns

Production batching with machine breakdowns and safety stocks

The structure of periodic review policies in the presence of random yield

Optimal inventory control in a production flow system with failures

Hope in West Coast port talks

Lot sizing to reduce capacity utilization in a production process with defective items, process corrections, and rework

Information distortion in supply chain: the bullwhip effect

Worm exposes apathy, Microsoft flaws

Slammer" attacks may become way of life for Net

Decisions with Multiple Objectives: Preferences and Value Trade-offs

The authors like to thank the referees for their help to improve the quality of the paper.

( ) E C in Equation 3 . We begin by conditioning on . for cross-regional pandemic outbreaks.