College and Research Libraries MARIANNE GOLDSTEIN and JOSEPH SEDRANSK Using a Sample Technique to Describe Characteristics of a Collection A sampling procedure is presented which may be employed to identi- fy characteristics of a collection and which then can be used in an eval- uative statement and in the description of the scope of the collection. The main results obtained by applying this sampling technique to the ] ewish history collections in each of seven university libraries are de- scribed in detail. Comparisons among these seven collections relate to the percentage distribution of titles by language and by publication date. LIBRARIANS WHO ARE INVOLVED in col- lection building are regularly called up- on to make statements about the quality of their collections. Subject librarians seek ways to identify and describe sub- ject strengths. The traditional ways of collection evaluation have included both quantitative and qualitative de- scriptions of the holdings in subject fields. The quantitative statement is general- ly based on one of the following meth- ods of measuring library holdings: ( 1 ) measuring linear feet of library materi- als on shelves, ( 2) a physical volume count, or ( 3) use of shelflist measure- ments, i.e., converting cardholdings in inches or centimeters into number of titles. For a qualitative evaluation, the li- brarian attempts to support the quanti- tative statement by ( 1) the checking of appropriate bibliographies, ( 2) the con- sideration of the levels of programs the collection supports, and ( 3) the size Marianne Goldstein is reference librarian, and Joseph Sedransk is professor, statistical science and social sciences, · in the State University of New York at Buffalo. of student body and faculty that uses it. Sometimes by the use of formulas 1 a quantitative expression of the quality of the collection is arrived at based on the number of books, periodicals, and documents a specific subject field should have. The results of bibliographic checking are expressed in number or percentage of titles held out of the number of titles in the list. The prob- lem of providing a qualitative evalua- tion is aptly expressed in the statement that "no easily applicable criteria have been developed for measuring quality in library collections, and this is a sub- ject which should be vigorously pur- sued.';2 In this paper we present a technique to identify collection characteristics that can be used in an evaluative statement and in the description of the scope of the collection. Characteristics of books, such as ( 1) their publication dates; ( 2) their countries of origin; ( 3) the lan- guages in which they are written; ( 4) their publishers (whether private, com- mercial, or academic); ( 5) their for- mats (i.e., book, nonbook, serial, docu- ment); and ( 6) the editions (original, I 195 196 I College & Research Libraries • May 1977 reprint, facsimile, etc.) tell the subject specialist something about the nature of the collection. For example, students of history and the humanities generally rely on the availability of library ma- terials with more varied imprint dates than students in the social sciences or natural sciences. In the sciences, recency of publications is usually critical; in history, philosophy, and the humanities, research more often depends on the availability of primary sources for the period or topic under investigation. The characteristics to be identified are available for each title on the shelflist copy of the catalog card. The catalog card gives in addition to the author and title: the edition, imprint (place, pub- lisher, and date), collation, illustration, the subject tracings (headings), and, often, the format. If the book has been translated this is also indicated. What characteristics a librarian wish- es to identify, using the sample tech- nique presented in this paper, is a decision based on the specific character- istics which enhance that subject area and which, when identified in the par- ticular collection, can lend weight to an evaluative statement. Determination of the number of characteristics to be re- corded is based on the size of the sam- ple, the size of the collection, and the total staff time available to record and analyze the information. What follows is a description of the sample design and estimation methods used in selecting and analyzing a sample of titles from the Jewish history collec- tion in each of seven university li- braries. The seven libraries are those at Cornell University, University of Roch- ester, Syracuse University, and the four university centers of the State Universi- ty of New York (SUNY)-Albany, Binghamton, Buffalo, and Stony Brook. The sample taken was limited to the shelflist for Library of Congress classi- fication numbers DS 101-151. It did not attempt to include all titles in the col- lections which deal with the history of the Jewish people (e.g., exclusions in- clude Jews in the U.S. under E184.J4; or World War, 1939-1945-Jews under D810.J4; or Bibliography under Z). Sta- tistics are available for titles held in these areas of Jewish history. 3 The project was part of a survey con- ducted in the fall of 197 4 to evaluate the Judaic Studies resources of the sev- en university libraries. While the survey also concerned itself with resources in Jewish literature, Bible studies, and Jew- ish philosophy and religion, the Jewish history collections were chosen for the example. Because of time limitations, the characteristics sought in the sample were limited to age of publication and language. The original date of publica- tion was used when the book was a reprint. A systematic sample design was em- ployed in each university's Jewish his- tory collection: a. The total number of cards in the collection was measured in inches (X). b. Using the relationship, 100 cards = 1 inch, there are estimated to be N = 100 X titles in the desired cate- gory. For example, if X = 14 inches, N = 1,400. c. The sampling interval, i, for the selection of sample cards is defined by i = N/n where n is the desired sample size. That is, after a random start, every i-th card is sampled. For each sam- pled title, the date and country of publication, language, format, etc., are recorded. For example, if we wish to have n = 200, then i = 1400/200 = 7, and we would select every seventh card. If we desire n = 300, i = 1400/300 = 4.67, and we would select every fourth card to ensure that our sample size is at least 300. The recording procedure is as fol- lows: Once the size of sample is deter- mined, a lined sheet is numbered from 1 to n. A column is drawn for each characteristic to be recorded and given a heading. A sample sheet is shown as Figure 1: Title 1 2 Call No. (optional) DS- DS- Country U.S.A. Germany Language English German Using a Sample Technique I 197 P, of titles with a spexified characteris- tic is estimated using P. It is desired to select a large enough sample that with ~igh probability the difference between P and P will be sufficiently small. More precisely, the investigator specifies two numbers 1 - a and d. Although the value Date 1920 1910 Format Book Serial Scope, Treatment History Bibliography Fig. 1 Sample Sheet for Recording Characteristics When all the titles in the sample have been recorded, a count is taken of each characteristic (e.g., book with date prior to 1900) of interest. For the given col- lection the proportion, P, of titles in the entire collection having a specified ~haracteristic is estimated using P where P equals the proportion of titles in the sample having the specified characteris- tic. The values of P are the primary analytical tool and are presented in Tables I, 2, and 3. Additional informa- tion can be obtained by forming a 100( 1-{3) percent confidence interval for P, which is a range of values likely to contain the true, but unknown, value of P. Methods to form a confidence in- terval for Pare presented by Cochran.4 The number of cards to be sampled from a given collection is a function of the time available to carry out the sam- pling ( and recording ) and the desired precision of estimation of population characteristics. Because the time avail- able to carry out the sampling was un- known initially, the sample size n was arbitrarily set at about 150 for the first two universities visited (Cornell and SUNY at Binghamton). However, after the experience of the trips to Cornell and Binghamton, we were able to deter- mine sample sizes that are feasible in terms of time available to complete the task and which would yield a desired level of precision. As described above, the proportion, of P is unknown, the investigator may specify the maximum deviation, d (e.g., d = .06) between the sample estimate, P, and P that one would "like" to have. While it cannot be guar~nteed in ad- vance of sampling that P and P will differ by no more than d units, the in- vestigator may specify the value, 1 - a (e.g., 1 - a = .95 ), representing the probability that the maximum deviation will be d units. Then, given values for d .and 1 - a, one may find the value of the sample size n required tQ insure that, with probability 1 - a, P and P will differ by no more than d units. In the Appendix the formulas to de- termine the required sample sizes are given. In addition, the derivation of the sample sizes used in this investigation is described. The three tables that follow give the percentages, P, for the collections for the characteristics outlined above. Com- ments are provided for each table. CoMMENTS oN TABLE 1 The following percentages represent the largest and smallest sample percent- ages held in the various languages in the Jewish history collections of the sev- en university libraries. Largest percentage In English: Albany 85% In German: Stony Brook 21% In French: Buffalo 9% In Hebrew: Binghamton 42% 198 I College & Research Libraries • May 1977 TABLE 1 DS 101-151 JEWISH HISTORY (LANGUAGE DISTRIBUTION) Percentage of Collection in English and Other Languages University N Albany 1,489 Binghamton 2,525 Buffalo 1,4~5 Cornell 4,760 Rochester 1,180 Stony Brook 1,237 Syracuse 1,438 Abbreviations: N = Total number of titles A = Arabic I = Italian L = Latin P = Portuguese n English 355 85% 151 45 269 77 158 49 264 83 284 70 214 81 Smallest percentage German 6% 9 9 10 8 21 9 In English: Binghamton 45% In German: Albany 6% In French: Syracuse 1% In Hebrew: several 1% Among the seven universities studied, Binghamton and Cornell have the larg- est percentages of their titles in Hebrew as would be expected since they were both participants in the Israel PL-480 Program. 5 The percentages of holdings of English.-language titles in the Jewish history collections seem larger where there have been no other influencing factors in collection building, i.e., in the Albany, Buffalo, Rochester, Stony Brook, and Syra_cuse libraries. Stony Brook re~ects to a noticeable extent the impact of faculty and re- search interests in German J udaica. With the exception of Stony Brook, the French Hebrew 4% 2% 3 42 9 1 4 26 5 1 3 1 1 3 Others Distribution 3% S, R, P, Pol 1 A 4 S,L,A 11 .03 R, L;. also S, A 3 R, S, L. I 5 .. 03 S; alsoP, R 6 A,P,Y,R,L n = number of titles in sample Pol = Polish R = Russian S = Spanish Y = Serbo-Croatian percentages of titles in German held in the university libraries are similar enough to suggest the holdings of many German titles in common. 6 In each col- lection the sample percentage of French titles is no larger than that of German titles. From Table 1, it may be seen that Albany, Buffalo, and Rochester have similar distributions of titles among the various languages, offset only by Buffa- lo's larger percentage of French and German titles. CoMMENTS oN TABLE 2 Pre-1900 Cornell, Rochester, and Syracuse have significant special collections and, gen- erally, each has acquired more pre-1900 publications than the other universities. In particular, Syracuse has acquired the collection of the nineteenth-century TABLE 2 University Albany Binghamton Buffalo Cornell Rochester Stony Brook Syracuse DS 101-151 JEWisH HisTORY (CHRONOLOGie DISTRmunoN) Percentage of Collection in Publication Periods Given N n Pre-1900 1901-1950 1951-1960 1,489 355 6% 26% 15% 2,525 151 2 17 9 1,455 269 6 24 16 4,760 158 11 18 11 1,180 264 9 36 15 1,237 284 7 22 10 1,438 214 8 27 17 See Table 1 for explanation of abbreviations. 1961-1974 53% 72 54 60 40 61 48 German historian Leopold von Ranke. 1901-1950 Rochester with 36 percent of its col- lection dated 1901-50 has the largest per- centage in this publication period. 1951-1960 The holdings of titles with 1951- 60 publication dates range from 9 to 17 percent. These percentages are substan- tially less than those for the 1961-74 period. 1961- 1974 Each library has the largest percent- age of its imprints in this period, ac- counting for 40 percent or more of the titles in each library's Jewish history col- lection. Binghamton, Stony Brook, and Cornell have at least 60 percent of their titles bearing 1961- 74 publication dates, indicating sizeable acquisitions in these years. The reasons for this are: ( 1) the general publication explosion, ( 2) rela- tive afHuency, ( 3) similar patterns of acquisitions, e.g., approval plans, ( 4) impact of the Israel PL-480 Program in the cases of Cornell and Binghamton. Si·milarities From Table 2 it is seen that Cornell and Stony Brook have similar percent- age distributions (over the four time periods). The similarity with Cornell may reflect Stony Brook's apparently successful acquisition of a balanced Using a Sample Technique I 199 collection for the study of Jewish his- tory. This is surprising, considering the recent development of Stony Brook's collection. While Albany, Buffalo, and Syracuse may be seen to have similar percentage distributions, they differ from Cornell and Stony Brook in their pattern of acquisition. Dissimilarities From Table 2 it is clear that Bing- hamton and Rochester have quite dis- similar percentage distributions. Bing- hamton has an unusually high percentage (72 percent) of 1961-74 publications, and Rochester has an unusually low per- centage ( 40 percent) of 1961-74 publi- cations. Further, Rochester has a signifi- cantly higher percentage of 1901-60 publications (51 percent total) when compared with the other six university libraries. Rochester's distribution sug- gests a selective acquisition policy and the acquisition of titles with pre-H}61 imprints through gifts or special collec- tions. Binghamton experienced very lit- tle growth until1961-74. COMMENTS ON TABLE 3 For the years 1961-74 publications in English make up the largest part of each collection (except for Bingham- ton). In particular, Albany and Stony Brook have the largest percentages cor- responding to English titles. German TABLE 3 DS101-151 JEWISH HISTORY Percentage of Collection in Various Languages in Years 1961-1974 Others Total University N n English German French Hebrew Dist. Percentage Albany 1,489 355 46% 3~. 2%· 1% 1~S 53% Binghamton 2,525 151 33 1 1 36 1 - 72 Buffalo 1,455 269 37 7 7 1 2 -A, I, S 54 Cornell 4,760 158 27 3 3 21 6 -A, R, S 60 Rochester 1,180 264 33 3 2 1 1 -R 40 Stony Brook 1,237 284 43 12 2 1 3 -P, R, S 61 Syracuse 1,438 214 37 3 1 1 6 -A, I, L, 48 S, R, Y See Table 1 for explanation of abbreviations. 200 I College & Research Libraries • May 1977 and French titles are approximately equal in number, except at Stony Brook which shows strength in German Judai- ca. Buffalo's relatively large percentages of German and French titles reflect an acquisition policy based on recognized research interests. At both Binghamton and Cornell there are large percentages of titles in Hebrew. These reflect the impact of participation in the Israel Public Law-480 Program. Note that Cornell has a more widespread distribu- tion of titles in various languages than does Binghamton which has concentrat- ed primarily on English and Hebrew titles. To compare the distribution of titles by language for two periods, pre-1961 and post-1961, two new tables may be constructed. For example, a table for Albany for the post-1961 period would show the following: English German French Hebrew Others TOTAL 87 percent 5 ' 4 2 2 100 This information is derived from Ta- ble 3, where 87 percent ( = .46 I .53) is the percentage of titles in English in the post-1961 period among all titles in that period. We have constructed the aforemen- tioned tables but include only the fol- lowing comparisons of holdings with pre-1961 and post-1961 publication dates: Albany, Rochester, and Stony Brook show very little alteration in dis- tribution. Binghamton's distribution has changed from ( pre-1961) one having extensive representation for both Eng- lish and German titles to ( post-1961) one with about equal percentages of English and Hebrew titles. Cornell ex- hibits a similar shift from English and German to English and Hebrew, but at Cornell there is, in each period, a mod- erate representation of titles in the "other" languages. For Buffalo the sample percentage of titles in each of German and French changes substantially from 5 percent of the collection in pre-1961 publications to 12 percent in post-1961. As a corollary of this, the percentage of titles in Eng- lish is 86 percent in pre-1961 and 70 per- cent in post-1961. At Syracuse there are some changes in distribution; a smaller representation for English and a larger representation for "other" languages in the post-1961 period. CONCLUSIONS In the university libraries at Albany, Buffalo, Rochester, Stony Brook, and Syracuse the preponderance of titles is in English with German and French titles ranking second and third. By con- trast, both Binghamton and Cornell have substantial percentages of titles in both English and Hebrew. Of particu- lar note at Stony Brook is the high per- centage of German titles in relation to its rather small collection. This indicates specialized interest concerning the his- tory of German Jewry in the nineteenth and twentieth centuries. The percentage distributions of titles by language are quite similar for Alba- ny, Buffalo, and Rochester. However, each of these distributions is substantial- ly different from those at Cornell and Binghamton where there are large per- centages of titles in Hebrew. Books with pre-1900 imprints are found more extensively at Cornell, Rochester, and Syracuse. It is likely that many of these holdings were acquired by gift or by purchase of scholarly col- lections. In addition, Rochester has a larger relative percentage of titles with 1901-60 imprints than the other six li- braries. Thus, -Rochester's distribution suggests a more gradual acquisition of selected titles over a considerable time period. One may note Stony Brook's similari- ty to Cornell in the percentage distribu- tion of titles over the time periods shown, this despite the fact that Stony Brook is the youngest of the university libraries. Strong similarities in distribu- tion of titles by publication date appear for Albany, Buffalo, and Syracuse. Dis- tinct dissimilarities in distribution are observed between Rochester and Bing- hamton, which are not surprising since most of Binghamton's growth has oc- curred since 1950. Binghamton's pre- 1961 holdings are relatively weak. The heaviest acquisition period for all seven university libraries was 1961- 7 4. Except at Binghamton and Cornell, English titles were acquired primarily, with German- and French-language titles ranking next in the number of acquisi- tions. The relatively large percentage of German-language titles acquired at Stony Brook in relation to its small col- lection is unusual. At Binghamton, He- brew titles predominate with English second, while at Cornell, English and Hebrew rank first and second respective- ly. The importance of Hebrew titles at Cornell and Binghamton is, of course, the result of participation in the Israel PL-480 Program which operated be- tween 1964 and 1973. Finally, the study indicates that strengths of collections, special inter- ests, periods of heavy acquisitions and/ or publishing, and book selection poli- cies can be identified by sampling a library's collections. The sample tech- nique used in this study would be par- ticularly useful in a comparative evalu- ation of the holdings in one subject area at a number of similar libraries. APPENDIX Sample Size Determination It is assumed that it is desired to use P p P( 1 - P) .1 .09 .2 .16 .3 .21 .4 .24 Using a Sample Technique I 201 to estimate P so that, with"' probability ( 1 - a), the difference between P and P will be less than d units. The formulas7 for the required sample size n are shown as formu- las A and B: A. n = 0 { Z 2 (1 _ ; J } P(l-P)/d2 B. n = no (1 + ~) In these formulas, P is the proportion of titles in the given collection with the speci- fied characteristic; d is the margin of error (specified by the investigator), N is the number of titles in the entire collection, and z(1 _ ~J is a number completely determined 2 by a specification of the value of the proba- bility, (1 -a). The value of z(1 _ !!:_ J can be 2 read from tables of the normal probability distribution. For example, for a = 0.05, Z(1 _ ~ J = 1.96 while for a = 0.10, Z(1 _!!:_) = 2 2 1.65. The sample size n given by formula B will never be larger than n0 • Thus, if the sample size n is chosen as n = n 0 = { Z 2 r1 -j-J} P(l- P)/d2 the selected sample will certainly be large enough to achieve, with probability 1 - a, the specified margin of error, d. When plan- ning a study, this is often a useful pro- cedure since use of both formulas A and B to determine n requires knowledge of N, the total number of titles in the collec- tion. Suppose that it is desired to have a = 0.05 and d = 0.06. Then, n = 0 (1.96)2 P(l - P) (.06) 2 Now note the relationship of P(l - P) with P, as shown in Figure 2 . .5 .25 .6 .24 . 7 .21 .8 .16 .9 .09 Fig. 2 Rt>lationship of P ( 1 - P) with P 202 I College & Research Libraries • May 1977 Thus, P( 1 - P) assumes its largest value when P = 0.5. Taking P(l - P) = (0 .5) (0.5) = 0.25, the sample size = = (1.96)2 (0.25) = 267 n no (0.06)2 will be sufficient to ensure with probability 0.95 a margin of error not larger than 0.06 irrespective of the proportion, P, being esti- mated. The sample size calculated in this manner may, however, be larger than necessary be- cause the proportion, P, for the character- istic of interest may differ from 0.5; and because formula B has not been used to determine n. To illustrate the latter point assume a = 0.05, d = 0.06, P = 0.5, and N = 1400. Then n0 = 267 and, using B, n = 224. Thus, if it were known prior to sampling that N = 1400 for a specific col- lection, a sample of size 224 rather than one of size 267 would be selected. Since a sample of size 224 is all that is needed, there is a reduction in sample size of 267 - 224 = 43 titles because of knowing the value of N (N = 1400, here). Calculations such as those made above indicated that, for most collections, and for d = 0 .06, a = 0.05, a sample of about 250 titles would be adequate. The actual sample sizes differ from 250 because ( 1) there were differential amounts of time available for sampling and ( 2) there was rounding error. The latter point can easily be demon- strated by considering a collection with N = 1400 titles and a desired sample size n =- 250. Then i (the sampling interval) = 1400/250 = 5.6. If i = 5, the actual sample size will be .1400/5 = 280 titles, while if i =6, the actual sample size will be 1400/6 = 233 titles. REFERENCES 1. Formulas referred to are the following: Verner W. Clapp and Robert T. Jordan, "Quantitative Criteria for Adequacy of Aca- demic Library Collections," College & Re- search Libraries 26: 371-80 ( Sept. 1965 ) . In- terinstitutional Committee of Business Of- ficers, University of Washington, A Model Budget Analysis System for Program 05 Li- braries (Washington State Univ., March 1970). "Standards for College Libraries," College & Research Libraries N eros 36: 277- 79, 290-301 (Oct. 1975). 2. Report of the Advisory Committee on Plan- ning for the Academic Libraries of New York State (Albany: The University of the State of New York, State Education Depart- ment, 1973), p.vii. 3. Marianne Goldstein. A Survey of Library Resources in Judaic Studies in the FAUL and SUNY Center Libraries, With Recom- mendations Toward Formulating Plans for Possible Areas of Cooperative Collection De- velopment (Buffalo: SUNY at Buffalo, Lock- wood Reference Dept., 1976). Available as an ERIC publication (ED 125651). 4. W. G. Cochran, Sampling Techniques, 2d ed. (New York: Wiley, 1963 ), Section 3.6. 5. Israel Public Law-480 Program. "Within the framework of what is commonly referred to as the Public Law-480 Program, the United States Government supplied some 25 Amer- ican Research Libraries with a copy of vir- tually every monograph, book and periodical then published in Israel that was, or might eventually be, of research value. From 1964-1973, approximately 1,665,000 items were supplied, with an average of 65,000 for each full participant." See Charles Ber- lin, "Library Resources for Jewish Studies in the United States," American Jewish Yearbook 75:10 (1974f75). · 6. In the sampling process, books in German on Jewish history listed in major bibliogra- phies were recognized. The Survey men- tioned above in reference 3 also included some bibliographic checking in Jewish his- tory bibliographies . Moreover, most libraries had some approval plan arrangements with the German book firm, Harrassowitz. 7. Note that formulas A and B presume the use of simple random sampling. While we have used systematic sampling, the two sampling methods should be essentially the same for the populations being sampled. (See Cochran, Sampling Techniques, Sec- tion 8.5, p.214.) Further, the "normal ap- proximation" used to derive formulas A and B should be appropriate for most cases since the sample sizes are large.