A Simulation Model for Purchasing 
Duplicate Copies in a Library 

W. Y. ARMS: The Open University, and T. P. WALTER: Unilever 
Limited. At the time this study was undertaken the authors were at 
the University of Sussex. 

73 

P1'ovision of duplicate copies in a lib1'at'Y requires knowledge of the de-
mand fo1' each title. Since di1'ect measu1'ement of demand is difficult a sim-
ulation model has been developed to estimate the demand for a book 
f1'om the number of times it has been loaned and hence to dete1·mine the 
number of copies required. Special attention has been given to accurate 
calibration of the model. 

INTRODUCTION 

A common difficulty in library management is deciding when to buy dupli-
cate copies of a given book and how many copies to buy. A typical research 
library has several hundred thousand different works; many are lightly 
used but all are potential candidates for duplication. The problem which 
we faced at Sussex University was how to obtain reliable forecasts of the 
demand for each title and to translate this into a purchasing policy. At 
present Sussex spends between £10,000 and £20,000 ($22,00o-$44,000) per 
year on duplicate copies, and as the university grows this amount is increas-
ing steadily. 

Because of the large number of books in a library relatively little data 
are available about each title. Records are kept of books on loan or re-
moved from the library, but frequently these are the only routine data col-
lected. Few large libraries even manage inventory checks. We therefore 
looked for a system that could be implemented with the minimum of data 
collection, preferably one based on existing records. 

FORECASTS OF DEMAND 

If the demand for a particular book is known, it is possible, though not 
necessarily easy, to determine how many copies of that book are needed to 
achieve a specified level of service, such as a copy being available on 80 
percent of the occasions that a reader requires the book. Unfortunately 
demand cannot be measured directly, even retrospectively. Records of the 


74 Journal of Librm·y Automation Vol. 7/2 June 1974 

number of times that a book is issued from the library contain no infor-
mation about how many times the book was used within the library, nor 
how many readers failed to find a copy and went away unsatisfied. Since 
both these factors are extremely difficult to measure, one of the central 
parts of our work was to develop a method of estimating them from data 
readily available. 

To forecast demand two lines of approach seemed reasonable: subjec-
tive estimation based on faculty reading lists; and forecasts based on the 
number of loans in previous years. In the past, Sussex Library has made 
extensive use of reading lists provided by faculty to decide how many 
copies to buy of each title. As the books most in demand are those recom-
mended for undergraduate courses this seemed a sensible approach, 
though the number of copies required is not obvious even if the demand 
is known. Webster analysed the effectiveness of these lists in predicting de-
mand for specific titles and evaluated the purchasing rule being used, one 
copy for every ten students taking a course. 1 Restricting his attention to 
books known to be in demand and marked in the catalog, he drew a ran-
dom sample of 673 titles, about 4 percent of the books falling into this 
category. He compared the number of loans of each of these titles over a 
term· with data from the reading lists supplied at the beginning of the 
term. As the library had made a special effort to obtain reading lists for all 
courses taught that term, he had data on the number and type of students 
taking each course, the importance given to each text, and the subject areas 
involved. Yet despite a thorough analysis of these data Webster was able 
to find very little relationship between observed demand and reading list 
information. His work shows that faculty at the university have remark-
ably little knowledge of the books that their students read. In the sample 
some books strongly recommended to large groups of students were hardly 
used and some of the most heavily used works appeared on no reading list. 
The results of this study are fascinating from an educational viewpoint 
but less satisfying as operational research. 

The failure of this .. approach led us to predicting demand from records 
of the number of past loans. This divides into two parts: using the num-
ber of loans over a period to estimate what the total demand was during 
that period; and using this estimate of the demand in one period to fore-
cast the demand in another. Various evidence suggests that the latter is a 
sensible thing to do. The main demand for heavily used books comes from 
undergraduate courses. Most faculty are loyal in their reading habits, rec-
ommending books they know rather than new ones, and each course tends 
to be repeated year after year with a syllabus that changes only gradually. 
The use of past circulation to forecast future use is fundamental to a 
Markov model of book usage developed by Morse and Elston and tested 
with data from the M.I.T. Engineering Library. 2 For our work we have 
used the number of loans in a given term to predict the demand in the cor-
responding term a year later. 


Simulation M odelj ARMS and WALTER 75 

Estimating the total demand in a period from the number of loans in 
that period is more difficult. This requires a model of the circulation sys-
tem. 

MATHEMATICAL APPROACH 

Several attempts have been made to apply the methods of inventory con-
trol or queueing theory to the problem of buying duplicates. For example, 
Grant has recently described an operational system using the simple rule 
that the number of copies required to satisfy 95 percent of the demand is 

n (p,. + 2cr.)/t 
where n is the number of times that the book is issued during a period of 
t days and p,8 and cr8 are the mean and standard deviation of the time that 
each book is off the shelf when on loan. 3 

This type of approach has the advantage of being straightforward to 
use. Periodically a simple computer program analyzes the circulation histo-
ry of each book in the library and prints a list of books requiring duplica-
tion. However, the method suffers from difficulties both mathematical and 
practical. To obtain the simple mathematical expression given above, sev-
eral simplifying assumptions have to be made. For example, the expres-
sion ignores use of a book within the library, and identifies demand in a 
period with the number of loans within that period. Practical difficulties 
in arriving at a more exact mathematical expression are discussed in the 
next section. 

DIFFICULTIES IN CONSTRUCTING A MODEL 

The following are the main difficulties that we found in constructing 
a model, either mathematical or using simulation: 

1. The most useful measure of the effectiveness of a duplication policy 
is satisfaction level, the proportion of readers who on approaching 
the shelves find a copy of the book there, but satisfaction level is al-
most impossible to measure directly since, although some unsatisfied 
readers ask that the book be held for them, most go away without 
comment. More or less equivalent is the percentage time on shelf, the 
proportion of time that at least one copy of the book is available. 
This can be measured directly, though a visit to the shelves is needed, 
and was found useful in validating our model. If the underlying de-
mand is random these two measures of effectiveness have the same 
value. 

2. Use of books within the library is also difficult to measure. At Sussex, 
as in most libraries, data are available only on the number of times 
that a book is lent out of the library. If a reader does not find a copy 
on the shelves or if he uses a book within the library but does not 
take it away then no record is generated. Since various studies, notably 
that of Fussier and Simon, suggest that the amount of use within li-


76 ]oumal of Libmry Automation Vol. 7/2 June 1974 

braries often exceeds the number of loans recorded by a factor of 
three or more, if the number of loans is used to estimate demand a 
reasonable knowledge of within-library use is essential.4 

3. The number of copies required to achieve a specified satisfaction lev-
el does not go up linearly with demand. Since a reader is satisfied if 
he finds a single copy on the shelves, proportionately fewer duplicates 
are needed of the books most in demand. At Sussex more than twenty 
copies are provided of several books and this nonlinearity is very no-
ticeable. 

4. The demand for a title is erratic, changing from term to term, from 
week to week, and from day to day, even if the mean demand is con-
stant. Over a period such as a term three different effects might be ex-
pected: a background random demand independent of university 
courses; sudden peaks when a book is required for a course taken by 
several students; and feedback caused by previously unsatisfied read-
ers returning. 

5. The circulation of books is surprisingly complicated. At Sussex some 
books are designated short term loan and can be borrowed for up to 
four days only; the remainder are long term loan books and can be 
borrowed for up to six weeks. Circulation data show that the time 
for which a book is off the shelf is not the same as the period for 
which it is lent, but has a heavily skewed distribution. Few books are 
returned until near the due date; just before the book is due back 
there is a peak when most books are returned but many become over-
due and the tail of the distribution dies away slowly. 

SIMULATION 

As these various factors seemed too complex to derive usable mathe-
matical results, we decided to use computer simulation of the book circula-
tion. Simulation of book circulation is not new. In particular it has been 
used at Lancaster University by Mackenzie et al. to decide loan periods.5 
Their report includes a good description of the general approach. 

The object of our simulation was to model the circulation process so 
that we could study the relationship between three groups of parameters: 

1. 0 bserved data 
Number of copies available 
Number of loans 

2. Total underlying demand 
3. Measures of effectiveness 

Satisfaction of level 
Percentage time on shelf. 

The results obtained from any simulation are only as accurate as the 
values given to the variables used to calibrate the model. As several of 
these values were not known at all accurately when the work was begun, 
special efforts were put into careful validation and calibration of the mod-


76 ]oumal of Libmry Automation Vol. 7/2 June 1974 

braries often exceeds the number of loans recorded by a factor of 
three or more, if the number of loans is used to estimate demand a 
reasonable knowledge of within-library use is essentiaJ.4 

3. The number of copies required to achieve a specified satisfaction lev-
el does not go up linearly with demand. Since a reader is satisfied if 
he finds a single copy on the shelves, proportionately fewer duplicates 
are needed of the books most in demand. At Sussex more than twenty 
copies are provided of several books and this nonlinearity is very no-
ticeable. 

4. The demand for a title is erratic, changing from term to term, from 
week to week, and from day to day, even if the mean demand is con-
stant. Over a period such as a term three different effects might be ex-
pected: a background random demand independent of university 
courses; sudden peaks when a book is required for a course taken by 
several students; and feedback caused by previously unsatisfied read-
ers returning. 

5. The circulation of books is surprisingly complicated. At Sussex some 
books are designated short term loan and can be borrowed for up to 
four days only; the remainder are long term loan books and can be 
borrowed for up to six weeks. Circulation data show that the time 
for which a book is off the shelf is not the same as the period for 
which it is lent, but has a heavily skewed distribution. Few books are 
returned until near the due date; just before the book is due back 
there is a peak when most books are returned but many become over-
due and the tail of the distribution dies away slowly. 

SIMULATION 

As these various factors seemed too complex to derive usable mathe-
matical results, we decided to use computer simulation of the book circula-
tion. Simulation of book circulation is not new. In particular it has been 
used at Lancaster University by Mackenzie et al. to decide loan periods.5 
Their report includes a good description of the general approach. 

The object of our simulation was to model the circulation process so 
that we could study the relationship between three groups of parameters: 

1. 0 bserved data 
Number of copies available 
Number of loans 

2. Total underlying demand 
3. Measures of effectiveness 

Satisfaction of level 
Percentage time on shelf. 

The results obtained from any simulation are only as accurate as the 
values given to the variables used to calibrate the model. As several of 
these values were not known at all accurately when the work was begun, 
special efforts were put into careful validation and calibration of the mod-


Simulation Model/ ARMS and WALTER 77 

el. A separate study was made for a small sample of books, to com-
pare the percentage time on shelf estimated by the simulation with the ac-
tual time for which a copy was available, found by looking at the shelves. 
The results of this study were used to check the amount of use within the 
library. By this means we were able to verify the simulation model and 
calibrate it to a highly satisfactory level of accuracy. 

DESCRIPTION OF PROGRAM 

The basic layout of the simulation is shown in Figure 1. .This is a time 
advance model with a period of one day. The program has been coded in 
FORTRAN and running on the ICL 1904A computer at Sussex takes 
about one second of machine time to simulate two years. This fast speed 
has enabled us to try a wide range of values for most parameters and to 
experiment with a variety of distributions of arrival times and book re-
turn dates. 

1. Satisfaction level 
At the beginning of each day the number of demands for that day 
is generated. The satisfaction level is taken as the proportion of these 
requests which can be satisfied from the books left on the shelf from 
the previous day and those returned during the simulated day. 

2. Within-library use 
The proportion of use that takes place within the library was a key 
parameter in calibrating the model. The first version of the simula-
tion program assumed a figure of 25 percent use within the library. 
This was based on a small survey of the type of books being studied, 
standard texts used for undergraduate courses. The weakness of this 
survey was that it used a count of those books that were left lying in 
the library at the end of the day and did not make sufficient allow-
ance for books reshelved by readers or by library staff during the day. 

The validation experiment showed a consistent difference between 
predicted and observed percentage time on shelf which could be cor-
rected by changing the value of the within-library use parameter to 
60 percent. 

3. Distribution of demand 
Two distributions of demand have been used, Poisson arrivals with 
a specified mean, and a step demand superimposed on a Poisson pro-
cess. In both cases provision is made for a proportion of unsatisfied 
readers to return later. As the effect of this feedback is to introduce 
sharp peaks of demand, the two distributions have proved surprising-
ly similar in the results produced and most of the runs of the pro-
gram have been done with random demand. 

A recent survey showed that 69 percent of readers who fail to find 
a book intend to return, but we do not know how many actually come 
back nor what the time interval is before they return. 6 The simula-
tion proved to be insensitive to moderate changes of these parameters 


78 Journal of Library Automation Vol. 7/2 June 1974 

Advance clock 
one day 

Add returned 
books 

Generate 
requests 

Fig. 1. Outline flowchart of simulation program 

Generate 
:return date 

Generate 
return date 

Reader 
return date 


Simulation Model/ ARMS and WALTER 79 

and for most runs 25 percent of unsatisfied readers were deemed to 
return after a delay which averaged two days. 

4. Period for which the book is off the shelf 
The simulation allows for a book to be borrowed within the library, 
in which case it is available again the next day, or to be lent from the 
library. If the book is lent, the return date is generated from one of 
two histograms which respectively refer to books available on short 
and long term loan. These histograms were derived from an analysis 
of all books returned during one week in autumn 1970, modified to 
reflect changes in the circulation system. 

VALIDATION EXPERIMENT 

Although the structure of the simulation is fairly straightforward sev-
eral parameters used in the model have been estimated indirectly. Valida-
tion of the model took two forms. Firstly we ran the program with a wide 
range of values for the main parameters to see which most influence the 
results. Secondly a small study was set up to measure the percentage time 
on shelf of a number of books. For each book, the actual availability was 
estimated by the simulation from the number of loans during the same 
period. 

Twenty-eight books known to be in heavy demand were selected, half in 
physics and half in sociology. Over a period of eight weeks the shelves 
were inspected once per day, at random times during the day, to see if a 
copy was available. The number of loans of each copy of each book dur-
ing the period was noted and the library staff carried out a thorough check 
to determine whether any copies shown in the catalog had been lost, stolen, 
or had their loan category altered. The simulation was used to estimate the 
percentage time on shelf and this was plotted on a graph against the ob-
served percentage. 

Figure 2 shows the graph for the original values of the parameters. In 
this graph the x axis shows the percentage time on shelf predicted by the 
simulation; the y axis shows the percentage observed. If the model were 
perfect the points would lie near the line y = x, deviations being caused by 
y being a random variable. The graph in Figure 2 is clearly convex down-
wards showing a consistent error in the model, with these values of the pa-
rameters. Knowing that the simulation is sensitive to the parameter giving 
the proportion of use that takes place within the library and that our esti-
mate of its value was not precise, a series of graphs were prepared varying 
this parameter. Figure 3 shows the same observations plotted against pre-
dictions assuming 60 percent use within the library, the value which best 
predicts the observations. This graph is much closer to being linear than 
Figure2. 

The next question is whether the nonlinearities in Figure 3 are the type 
to be expected from y being a random variable. A very rough calculation 
helps to answer this question. If we make the dubious assumption that 


80 I ournal of Lihm1'y Automation Vol. 7/2 June 197 4 

Observed availability 
(percent time on shelf) 

100 

50 

25 

o~----------~~----------~50~----------~75 ____________ -JlOO 

Predicted availability 
(percent time on shelf) 

Fig. 2. Observed percentage time on shelf against predicted ( 25 percent use within library) 

availability of a copy on a given day is independent of the days before 
and afterwards, then, for x given, y should be approximately normally 
distributed with mean x and variance x( 1 - : ) , where n is the number of 
days in the study (forty). If this calculation were exact, 95 percent of the 
observations of y would lie within two standard deviations of x, but, since 
the assumption of independence is definitely false, we would expect the 
number of observations which fall within the range to be less than 95 per-
cent. 

The curves 
y = x ± 2 { x(l- x)/n} ¥. 


Observed availability 
(percent time on shelf) 

100 

75 

50 

25 

Simulation Model/ ARMS and WALTER 81 

Predicted availability 
(percent time on shelf) 

Fig. 3. Observed percentage time on shelf against predicted ( 60 percent use within library) 
with 95 percent probability curves 

have been added to Figure 3. Two points lie well off all graphs and cannot 
be explained except as the result of books being stolen or lost during the 
period of the study. Of the remaining twenty-six all but three lie within 
the curves. This shows that the simulation model as finally calibrated gives 
a very reasonable description of the situation. 

OPERATIONAL EXPERIENCE 

The results of this simulation have been used by library staff since the 
middle of 1971 initially on an experimental basis. A two-stage process is in-


82 Journal of Library Automation Vol. 7/2 June 1974 

volved. From the computer based circulation system caU; be found the 
number of times that each short term loan copy has been circulated. From 
these figures the library staff can estimate the demand for a title, over a 
given period. Once the demand has been estimated the staff can use the 
simulation again to determine how many copies would have been required 
to have achieved a specified satisfaction level, perhaps 80 percent. If fewer 
copies are held by the library orders are placed for extra copies. At present 
these procedures are done manually using tables, but the possibility exists 
of modifying the computer system to identify those titles which need ex-
tra duplication. The actual decision to purchase needs to be done by li-
brary staff who can take account of factors not included in the simulation, 
such as price and changes of undergraduate courses. 

CONCLUSION 

Although this work was carried out during 1971, we shall have little op-
erational experience of the method in action until the computer circula-
tion system is reorganized. In the past, different copies of the same book 
have been processed entirely independently, meaning that the total num-
ber of loans of a given title can only be found by manually adding up the 
number of loans of each copy. In the revised computer system this will be 
done automatically. Experience will probably show that the best procedure 
combines use of the simulation model with reading lists and the skill of 
a librarian. One possible feature of a computer based system is that it 
could automatically indicate which books appear to require duplication. 

The method used here would seem to apply equally well to other li-
braries. Naturally the circulation patterns of other libraries are different, 
which means that a different simulation would be needed, but this work 
has shown that it is possible to calibrate a simulation accurately enough to 
examine the circulation of individual books. 

ACKNOWLEDGMENTS 

We would like to thank the many members of the University of Sussex 
library staff who have helped at various stages, particularly P. T. Stone who 
was closely involved throughout. 

REFERENCES 

1. P. F. Webster, Provision of Duplicate Copies in the University Library, Final year 
project report (University of Sussex, 1971). 

2. P. M. Morse and C. R. Elston, "A Probability Model for Obsolescence," Operations 
Resem·ch 17:36-47 (1969). 

3. R. S. Grant, "Predicting the Need for Multiple Copies of Books," Journal of Library 
Automation 4:64-71 (June 1971). 

4. H. H. Fussier and J. L. Simon, Patterns in the Use of Books in Large Research 
Libmries (Chicago: Univ. of Chicago Pr., 1969). 

5. A. G. Mackenzie et al., Systems Analysis of a University Library. Report to OSTI 
on Project Sl/ 52/02, 1969. 

6. J. Urquhart, Private discussion, 1971.