095_115(16)__.......hwp
A Study on User Satisfaction with CJK
Romanization in the OCLC WorldCat System
도서관 서지정보의 한중일 로마자표기법에 대한 이용자 만족도 연구
YooJin Ha*
ABSTRACT
The purpose of this study is to investigate how individuals assess Chinese, Japanese, and Korean
(CJK) transliterated bibliographic information on current library catalogs. Two separate studies,
a survey and an experiment, were conducted using the WorldCat system. Users noted that Romanization
has many issues which can inhibit user’s ability to understand the transliterated bibliographic
information even when it is in the person’s own native language and even when the individual had
extensive experience with transliteration systems. The experimental results also supported these
findings: participants had better results and satisfaction when looking for information written in
English than when searching for transliterated information written in their native language.
Implications for future research suggests a need to investigate user preferences for translation vs.
transliteration of bibliographic information. This study proposes consideration of using English
translation as a parallel link with CJK Romanization for bibliographic information.
초 록
이 연구의 목 은 정보이용자가 한 일 언어의 음역 표기된 도서 서지정보를 어떻게 이해하고 평가하는지를
알아보는 데 있다. 이용자조사와 실험이 각각 실행되었고, OCLC의 WorldCat시스템이 실험도구로 사용되었다.
조사결과 로마법표기에 여러 가지 문제가 있어 이용자가 음역화된 문장을 이해하기가 어려웠던 것으로 분석되었고,
심지어 음역서지목록에 익숙한 이용자도 자국어의 목록임에도 불구하고 이해하는 데 어려움을 나타냈다. 실험결과
한 이러한 조사 결과를 입증했다. 이용자가 음역화된 자국언어로 된 자료를 찾을 경우보다 어로 기술된 자료를
찾을 때 더 나은 검색 결과와 만족도를 보 다. 향후 번역된 서지정보와 음역화된 서지정보 이용자가 어느
것을 더 선호하는지 비교하는 연구의 필요성을 제안하며, 한 일 로마법 표기의 서지정보에 어번역을 병렬하는
것을 고려해야 함을 제시한다.
Keywords: user study, transliteration, romanization, WorldCat, CJK language
이용자연구, 음역, 로마자표기, 월드캣, 한 일 언어
* Assistant Professor, Department of Library Science at Clarion University of Pennsylvania
(yha@clarion.edu)
■ Received : 15 May 2010 ■ Revised : 4 June 2010 ■ Accepted : 19 June 2010
■ Journal of the Korean Society for Information Management, 27(2): 95-115, 2010.
[DOI:10.3743/KOSIM.2010.27.2.095]
96 Journal of the Korean Society for Information Management, 27(2), 2010
1. Introduction
The ultimate test of a translation system is that
a human in one language would have the same under-
standing of text that a human would have in another
language. Currently, translation can be used to access
information in other languages and the avenues to
do this are limited but expanding. Translation, how-
ever, only partly addresses the complexities of going
from one alphabet to another. Transliteration, the
isomorphic linking of one alphabetic sound symbol
to a symbol in another alphabet, is similar to trans-
lation ― both are attempts to provide bridges for
users to get from one language to another.
Transliteration, however, the substitution of char-
acters from one alphabet to another, succeeds when
one human can use the new text as a transparent
replacement for the original text written in a different
alphabet. Extensive transliteration efforts have been
undertaken for decades at the national libraries of
the world. Using manually produced efforts, humans
have transliterated millions of bibliographic records
with the assumption that end users could traverse
from a non-Roman script to a Roman script. It was
reported in 2007 that WorldCat, the cooperatively
produced online catalog, has over 3.35 million re-
cords with transliteration from Chinese, Japanese,
and Korean (CJK) to Roman script (Wang 2007).
Machine transliteration could add tens of millions
of transliterated records to this store of information.
Yet, all of this assumes that a user can understand
the transliterated records. This paper explores that
possibility by inquiring of CJK users how well they
understand transliterated records. This study also ex-
amines the underlying reasons that transliterated re-
cords may not be transparent to users, even those
knowledgeable in both the original CJK language
and in English.
The transliteration problem includes nontrivial
language considerations. For example, if someone
sought information by Korean poet Sowol Kim (김소
월), the query entry might be either the author’s
name or the exact title of a book, if known, in either
English or Korean. However, if the searcher is not
familiar with Korean, it becomes difficult to identify
an exact query word, such as the author’s name,
Sowol Kim, due to variability in transliteration of
alphabetic characters. It is because the author’s name
could be either “So Weol Kim” or “Souwol Kim”
for a non-native Korean. Other examples could be
offered, going from one language to another, where
ambiguities are present due to contextual meanings
that fail to translate across languages and across
cultures. Even if the system found the correct entry,
the user would not be able to judge whether it was
suitable for his or her needs, because transliteration
could create the Korean sound isomorphically linked
to a written Latin alphabet. For example, if a poetry
book were titled “진달래꽃 (azalea flower)” in
Korean, then the translation would be construed as
an “azalea flower,” instead of “Chindalaekot,” which
is just the sound of Korean written in a Latin alphabet.
Without an English translation, users may not be
able to understand what a transliterated word means,
even for a native speaking Korean in the case above.
The transliterated term no longer maps to the meaning
A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System 97
implied in the original title and it becomes not under-
standable in the target language. Any automatic trans-
literation software would need to be sensitive enough
to capture the original meaning and it may need
to provide translation for users who may be naïve
in such subtleties. The problem extends beyond soft-
ware and character set conversions, and it transcends
language and format issues since it is only resolvable
as each user confronts a text reality specific to a
particular query. One of the concerns for the emphasis
taken here is the way that humans interpret trans-
literated text and not the how database systems or
machine models rank transliterated output. Cultural
and idiomatic nuances can change meaning with
transliterated or translated information which can
create confusion within those seeking information.
Papers presented in Bossmeyer and Massil (1987)
showed requisite attention to the need for stand-
ardization with all types of transliterated scripts. Of
particular concern were ideographical scripts and the
need for technical systems to support vernacular data.
This concern with standardization continued to be
of importance as transliteration and translation ma-
tured within cross language information retrieval
(Oard and Diekema 1998). Today, there is a new
emphasis on comparing machine transliterations using
grapheme, phoneme, hybrid, or correspondence-based
transliteration models (Oh et al. 2006). Within the
evolution of these different approaches it can be posited
that a focus on end-user understanding of transliterated
information becomes a necessary pre-condition for
determining the efficacy of the system’s performance.
2. CJK Romanization Issues
Since the early 1980s, when bibliographic records
were entered with original vernacular data by the two
major bibliographic utilities, the Online Computer
Library Center, Inc. (OCLC) and the Research Libraries
Information Network (RLIN), non-Roman scripts in
OPACs were used according to an agreement on the
process/procedures for transliteration, where symbols
would be transliterated to alphabetic characters and
vice-versa (Taylor 2000, 462-472; Shaker 2002, 3).
In 1987, a meeting that discussed non-Roman alphabet
problems was held by the International Federation
of Library Associations and Institutions (IFLA) in
Tokyo. The results of that meeting were summarized
and published in a work titled Automated Systems
for Access to Multilingual and Multiscript Library
Materials: Problems and Solutions (Bossmeyer and
Massil 1987). The main topics discussed were the
need for standardization with all types of scripts. Of
particular concern were ideographical scripts and the
need for technical systems to support vernacular data.
IFLA continues to have its meetings address these
concerns to cover multilanguage and multi-scripts
practices in the provision of catalog information.
In 1993, the three IFLA Sections held a joint meeting
to combine these separate groups: Information
Technology, Library Services to Multicultural
Populations, and the Division on Bibliographic
Control. The meeting’s main theme was to focus
on multilingual and multi-script problems in organ-
izing and providing access to catalog information.
Unicode issues were discussed in a 1995 meeting
98 Journal of the Korean Society for Information Management, 27(2), 2010
to solve the standardization problems in different
character sets (IFLA 1993, 1995).
Research has continued to explore the role of
Romanization in cataloging and the increased use
of vernacular records. Studies in this area have fo-
cused on the development of logical principles with
concomitant attention to cataloging rules and stand-
ardization guidelines. Among non-Roman scripts
issues, there has been active research done on
Romanization by the Library of Congress (LC) and
OCLC in Chinese, Japanese, and Korean (CJK)
scripts (Arsenault 2005; Shin 2003; Zeng 1992;
Zhang and Zeng, 1998).
There were fewer studies conducted that consid-
ered standardization issues as they related to specific
language areas where international scholars wanted
a uniform system to describe a published work. Zhang
and Zeng (1998) examined practical problems using
the Unicode Standard in library applications to
examine standardization in bibliographic description,
specifically in CJK information processing practices.
Zeng (1991) conducted research comparing the
OCLC CJK system with the RLIN CJK system. The
conclusion of this study focused on the CJK thesaurus
used in the creation of records and it emphasized
the need for strict adherence to standards.
Another evaluation of the OCLC CJK Plus system
was done by Jeong (1998). He conducted an experi-
ment with 32 participants from Chinese, Taiwanese,
Japanese and Korean language backgrounds. Jeong
tried to focus on end users’ searches using three
different versions of the OCLC CJK Plus’ search
mechanism (Roman-derived search, Roman ti-
tle-phrase search and vernacular search). Note that
these were all cataloger specific systems not available
to end-users. Even so, the transliteration issues of
the catalog users were not a focal point of this
research. The experiment did not allow system users
to access the database using their preferred language
Park (2001) also addressed the Romanization issue
with a special attention to using the “McCune-
Reischauer (MR)” for the Korean language in the cur-
rent bibliographic utilities. She claimed there are many
problems in using the “McCune-Reischauer (MR)”
for the Korean language in current bibliographic util-
ities such as OCLC. The MR is a Romanization scheme
for Korean and it is still in use. Park identified the
difficulty in creating a system with less ambiguity
using several real Korean bibliographic utilities made
by MR. There have been attempts to make new soft-
ware available, but it has not been released yet. This,
too, may lead to another standardization issue.
Shaker’s dissertation (2002) investigated how cur-
rent academic library systems can support non-Roman
materials and what should be considered in order
to make it possible to have vernacular characters
in those systems. That work covers various trans-
literation issues related to current cataloging practice
as well as examining many different languages used
in bibliographies (i.e., Cyrillic, Arabic, Hebrew, and
CJK). Ha (2008) examined problems accessing and
using Multilanguage materials from the end-users’
perspective. Users indicated that transliterated
(Romanized) information could not be understood
and that gaining access to records was inconsistent.
This could indicate that the good intentions of those
A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System 99
who created mechanisms to facilitate access had, in
effect, created systems which increased user con-
fusion and frustration. Clearly, this also pointed to
the need to conduct additional research to find out
what users were experiencing when their searches
involved transliterated information.
3. Methodology
Two separate studies, a survey and an experiment,
were conducted to investigate how individuals access
and evaluate transliterated information. The studies
focused on individuals’ seeking and understanding
information obtained from WorldCat. Attention was
given to task, topic and how individual characteristics
and experiences might influence dependent measures
such as usefulness and satisfaction with retrieved
information. The ultimate purposes here are: (1) to
determine if manually produced transliterated sys-
tems are understandable by users, and (2) to inform
researchers on the importance of specific variables
when designing and conducting similar research. The
survey and later experimental procedures were ap-
proved by the Rutgers University Institutional
Review Board for the Protection of Human Subjects
in Research.
3.1 Survey
3.1.1 Sample Information
A convenience sample of 20 individuals who are
fluent in English and another language was con-
structed using a network of colleagues with partic-
ipants being those who are knowledgeable and have
experience in seeking information using multiple
languages. These individuals live in the United States,
Korea, China, and Taiwan. Japanese individuals did
participate but none were living in Japan at the time
of the survey.
The native languages and the number of the in-
dividuals participating in the survey were: Chinese
simplified (n=5); Chinese traditional (n=3); Japanese
(n=2); English (n=3); and, Korean (n=7). All re-
spondents had online searching experience with half
having nine or more years of such experience.
Japanese speakers had the least experience with on-
line searching but this was due more to sample limi-
tations than actual use.
Most subjects in this research had direct experi-
ences with bilingual and multilanguage Online Public
Access Catalog (OPAC) systems. Respondents were
affiliated with universities in various countries or
states, such as Yonsei University Central Library
in Korea and Peking University Library system, and
these all have English based catalog systems. Systems
mentioned by respondents include: EBSCOHost,
ProQuest Digital Dissertations, Web of Science, and
Innopac (HK research libraries), Library of Congress,
Syracuse University Library, and The National
Library of Korea.
3.1.2 Questionnaire
The questionnaire was constructed after extensive
interviews with two experienced individuals who
provided the framework for the content areas of
100 Journal of the Korean Society for Information Management, 27(2), 2010
the survey. Subjects were required to respond in
English. The survey asked about the participants’
basic demographic information and their experience
with online searching including library systems and
other information retrieval tools. A part of the survey
required individuals to conduct searches in WorldCat
for an assigned and a self-generated topic using
known and unknown languages. Then, participants
were asked about their experience using WorldCat.
The survey questionnaire is provided in this paper
as an appendix.
3.1.3 Survey Results
Subjects noted the difficulty in inputting CJK char-
acters and the lack of links among various forms
of transliterated entries; this reinforces the sugges-
tions for modeling proposed by Lindén (2006) to
alleviate such variants. One Chinese subject ex-
pressed annoyance when required to guess the correct
transliterated information using the Pinyin system.
One subject mentioned if there is a non-English jour-
nal and it supposedly has an English name then the
two should be connected to each other. The subject
suggested using some tools such as ‘see also’ so
that if a user only knows one of the two names,
then the other title for the same journal could be
found. Another comment noted that there were too
few English translations for abstracts. Also most re-
spondents stated there is no stabilized cross language
support and that English is too dominant.
Others commented that there were too few English
translations for abstracts in CJK journals. In addition,
although English was seen as too dominant, re-
spondents pointed to the lack of stabilized cross lan-
guage support for existing bibliographic systems.
Such systems can provide the window to multi-mil-
lion volume collections of monographs and journals.
When asked about “Could you please indicate
why you might need information written in other
languages, which might include the language you
cannot read?,” the responses can be roughly classified
into five categories as shown in Table 1. below.
When asked about the ‘efficiency of searching
for a different language,’ three respondents answered
‘don’t know’, and nine indicated that they thought
the system was ‘not efficient.’ Almost 60% of the
subjects judged the system as not efficient in its
language supportiveness. This again, in this prelimi-
nary study, supports the overall consensus that trans-
lation capabilities are important to users of multi-
language retrieval systems. Figure 1 shows the effi-
ciency scores of users and it can be seen that few
individuals place WorldCat as highly efficient at the
time of the study. It might be expected that as
WorldCat evolves, there could be a concomitant in-
crease in satisfaction with its transliterated records.
Nonetheless, the study reported here also addresses
how users confront the inherent characteristics of
transliteration while addressing fundamental alter-
natives to such a system.
Individuals noted that translation of abstracts was
of critical importance but that such information was
often lacking. WorldCat was noted as a core biblio-
graphic utility providing access to Chinese and
Korean information. Japanese access focused on
NACSIS-WebCat at the time of the survey.
A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System 101
Category Comment
Better access finding a book or information in other languages
research
“Related to my research, there might be good source written in other languages”
“In order to expand the list of the literature that I can utilize”
“when doing cross cultural searches”
lost translation
“Because some of the message in the original language cannot be translated into other languages
and therefore becomes a loss of value. It is worthwhile to go back to the source language
and try to understand the meanings of the work that is true to the author's intent”
“when trying to verify the accuracy of information (factual or interpretive) presented in a
translated text”
the only one
strong information need: “when it is the only source of information or when the information
in my own language is not sufficient, which is often the case”
curiosity about and
respect for other
languages
“the information written in languages I am not familiar with is as important as the one in
familiar languages because it might be crucial to someone”
“material from different language version might carry additional/different content”
“To look from the different point of views and supplement each other”
“when seeking different interpretations and perspectives other than English-speaking countries.
E.g., reading stuff about 9/11 and Iraq War from the other countries' perspectives”
Multilingual information system needs
Users self report on efficiency of WorldCat
Respondents pointed out their concerns with
Romanization issues:
∙Difficulty getting from the Romanized language
to the target or native language.
∙Meaning is lost under the current system since
it is not transparent on how to move across
languages.
∙Romanized titles were reported as particularly
difficult to understand; respondents noted that
the system works best when the user knows
both conventions in use.
∙Typing the correct, exact query becomes tanta-
mount to mastering the Romanization problem.
∙Use of Chinese characters in bibliographic de-
102 Journal of the Korean Society for Information Management, 27(2), 2010
scriptions for Korean and Japanese materials.
∙Korean material using Korean Hangul would
be more accessible if titles also carried Chinese
characters linked to Romanization.
∙Linking of original native language, English
translation, and Romanization would facilitate
understanding of bibliographic records.
∙Addition of an English language abstract would
allow users to assess if bibliographic records
meet the original information need for topic
searches.
The survey provided a framework to define the
secondary access problem: how do individuals get
information about information (the bibliographic
problem) as they move from one language to another
and from one alphabet to another? The survey con-
firmed the importance of topic, task, and display
and it offered specific information on how each of
these might be assessed when individuals conduct
searches for information. Thus, the survey funneled
and focused these issues allowing for the design
of an experiment to explore how individuals might
seek such information in a realistic but controlled
environment.
3.2 Experiment
A separate experiment was conducted to explore
the use of transliterated information when searching
for bibliographic information using the WorldCat
system.
3.2.1 Sample
This study used a non-probability convenience
sample of nine individuals whose native languages
were Chinese, Japanese, or Korean, and whose sec-
ond language is English. There were three in-
dividuals who were native speakers from each lan-
guage group. The subjects were selected to include
librarians from Rutgers University Libraries and stu-
dents from three academic departments: Library and
Information Science (LIS), Communication, and
Journalism and Media studies. The subjects were
purposively selected to accommodate the ex-
perimental design; for example, one librarian from
each language group and two students from LIS
and non-LIS areas were selected.
3.2.2 Experimental Design
The main focus of this experiment is to examine
how sensitive the system is to a person’s particular
needs, especially when seeking information across
different languages. Subjects were observed conduct-
ing three searches using the WorldCat system and
this was followed by a personal interview.
The three different search tasks assigned to each
user served as the unit of analysis for this study
with three individuals assigned to different languages
searching three tasks with different topics. The three
topics were chosen from areas of health, information
science, and business because it was assumed that
these areas were considered relatively important for
the subjects conducting the searches given their pro-
fessional or academic positions. After choosing the
subject area, the actual topics were set up. Although
A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System 103
the search results and satisfaction levels vary by
subjects’ interest of these subject areas, topic knowl-
edge, and users’ search experiences with these topics,
all subjects were required to search all three topics
and their search satisfaction levels were recorded
by them and then reflected on the individual’s overall
satisfaction test results.
This design resulted in 27 cases (3 subjects x
3 Tasks x 3 Topics). Embedded within the design
is the use of three different languages, CJK, in addi-
tion to English. Incorporated within the search proto-
col is the use of different languages available through
transliterated records in WorldCat.
The Tasks (T) are defined as follows:
T1: Do a search looking for information written
in your native language.
T2: Do a search looking for information written
in English.
T3: Do a search looking for information written
in a language you do not know.
The Topic was assigned as follows.
Topic1: Food nutrition business in the United
States.
Topic2: Socio-cognitive concept in Information
Science.
Topic3: Globalization in industry.
3.2.3 Hypotheses for the Experiment
A fundamental premise underlying transliteration
from CJK to Romanized script is that seekers would
be able to interpret the Romanized version which
requires knowledge of two languages. Also tested
were individuals’ searches in a language they did
not know to provide preliminary data on how trans-
literation serves those not knowing one of the
languages.
H1: Users will have better results and greater sat-
isfaction when looking for information writ-
ten in English than when searching for trans-
literated information written in their native
language. (T2 > T1)
H2: Users will have better results and greater sat-
isfaction when looking for information writ-
ten in their native language than when search-
ing for information written in a language they
do not know. (T1 > T3)
3.2.4 Data Analyses and Findings for the
Experiment
A profile of the subjects was obtained to capture
demographic information in a pre-test questionnaire
and this revealed that 56% of the subjects have experi-
ence with WorldCat and have an average online
searching experience spanning three to five years.
Note that one librarian was assigned to each language
group and this increased the dispersion in the experi-
ence variable when compared to the experience of
non-librarians. Variables used in this experiment
could be cast as follows: task, subject, and topic
as independent measures and user satisfaction as the
dependent measure.
Overall Satisfaction is a measure encompassing
assessments of Results, Relevance with expectations,
Understanding level, Efficiency of the system, and
Friendliness of the system. The users’ Overall sat-
104 Journal of the Korean Society for Information Management, 27(2), 2010
isfaction value was obtained through a factor analysis
of search scores obtained when evaluating task and
system performance. Table 2 reports that the principal
components, rotated component matrix revealed that
two vectors could be used for each search to represent
overall user satisfaction: one vector representing Task
based satisfaction which included Results, Relevance,
and Understanding; and, the other vector reflecting
System based satisfaction which encompassed users’
search assessments of the Efficiency and Friendliness
of the system. Overall satisfaction was then computed
as the summation of the two individual factor scores
for the 27 searches which represented the unit of
analysis. Separate analyses of each factor were con-
ducted as well.
By using a Generalized Linear Model (GLM), the
three tasks, three topics and nine subjects were parti-
tioned to identify users’ Overall satisfaction with the
results they achieved. The GLM test revealed that
task effect indicated that T2>T1 and T1>T3 (T2:
Beta = 1.770, T: Beta = 1.142, and T3: Beta = 0,
all at p <.001). That is, the two hypotheses achieve
weighted scores that are not likely to occur by chance.
Tests of between subject effects uncovered statisti-
cally significant results for Subject and Task (p<
.05) with non-statistically significant results for
Topic. The entire model is presented in the Table
2. The effect size for this model explains 85% of
the variance in the Overall satisfaction score.
A one-way analysis of variance model with multi-
ple group comparisons was performed to explore
users’ satisfaction ratings by the three tasks to de-
termine if statistically significant differences existed
across and between groups. Results revealed that
there were statistically significant differences among
all groups F(2, 24) = 14.063, p<.001. Post hoc com-
parisons using a Scheffé test showed that there were
statistically significant mean differences (p≤.05) be-
tween all pairs of tasks: task 1 with task 2, task
2 with task 3, and task 1 with task 3. These results
affirm the importance of task when individuals per-
form multilingual information searches.
A separate GLM analysis on System based sat-
isfaction and Task based satisfaction was conducted
to partition the impact of subject, task, and topic
on the original factor derived satisfaction variables.
Table 3 reports the results for System based sat-
isfaction showing that statistical significance for this
Overall Satisfaction Variable Component
Separate Variables 1 2
Satisfaction with the results .930 .089
Relevance with users' expectation .880 .188
Catalog understand level .759 -.085
Efficiency of the system .099 .916
Friendly system .011 .927
Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. Rotation
converged in 3 iterations.
Rotated component matrix for overall satisfaction variable
A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System 105
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model* 44.134(a) 12 3.678 6.546 .001
Intercept .000 1 .000 .000 1.000
Subject* 26.351 8 3.294 5.863 .002
Task* 14.497 2 7.248 12.901 .001
Topic 3.286 2 1.643 2.925 .087
Error 7.866 14 .562
Total 52.000 27
Corrected Total 52.000 26
* statistically significant at p < .05.
R Squared = .849 (Adjusted R Squared = .719)
Tests of between-subjects effects. dependent variable: Overall satisfaction
model rested on the differences among the individuals
participating in the experiment: Chinese, Japanese,
or Korean. The model accounted for 92% of the system
satisfaction variance explained. These results might
be used to inform the design of future research which
could consider developing separate models for each
CJK bibliographic environment. It would be important
in future research to separate the perceived effective-
ness of the system from it friendliness.
A one way ANOVA examined subjects’ back-
ground as an explanatory variable for System based
satisfaction. The results are based on small sample
subgroups but it does show that native language has
a statistically significant effect, F (2,9.25) = .001
(p<. 05). In other words, the individuals’ first lan-
guage corresponds to their level of satisfaction with
how user friendly the system is perceived. The
Chinese group reported higher degrees of satisfaction
than the Japanese group, and the Japanese group
reported greater satisfaction than the Korean group
in both interface satisfaction and system cross-lan-
guage ability satisfaction. These results correspond
to linguistic issues covered later in this report (see
Section 4 Transliteration issues).
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 23.860(a) 12 1.988 13.011 .0001
Intercept .000 1 .000 .000 1.000
Subject* 23.379 8 2.922 19.122 .0001
Task .284 2 .142 .931 .417
Topic .197 2 .099 .646 .539
Error 2.140 14 .153
Total 26.000 27
Corrected Total 26.000 26
* significant at p < .05
R Squared = .918 (Adjusted R Squared = .847)
Dependent variable: System based satisfaction
106 Journal of the Korean Society for Information Management, 27(2), 2010
Table 5 provides the GLM results for Task based
satisfaction and it indicates that Task and Topic are
statistically significant influences explaining 79%
of the effect size for this dependent measure. This
result is not surprising but it does affirm the im-
portance of task and topic when individuals retrieve
information from a multi-language bibliographic
system. These results, based on a small non-random
sample, would need further testing in a larger study
so that the individual effects of topic and task can
be removed systematically to create separate ex-
planatory models.
3.2.5 Observation and Interview Data
Patterns of searching are noted for respondents
to assess differences by language, by country, and
by status of the individual. At the beginning of the
search, most individuals spent three to four minutes
exploring the design of the search page. Although
the page appears simple, it has features that give
it more power when searching. In particular, even
when searching Task 1 for information written in
the subject’s native language, the participants ques-
tioned how to assign the language they were looking
for (the target language). Since there are a number
of different options on the first screen and also on
the “advanced search” page, this required some time
for users to gain familiarity with the system.
The subjects for this experiment were all Asians
who said they were most comfortable searching in
English, their second language which all of them
knew in addition to their primary, native language.
The potential pool of relevant hits in the database
could be perceived as more productive when search-
ing in English which represented the dominant lan-
guage of the database. In Task 3 when looking for
information written in a language that they do not
know, most subjects sought an English word when
they browsed the bibliographic description and re-
ported that they viewed English as a common link
which should span all records in the database. Most
of the bibliographic records retrieved, however, did
not provide English words and these precluded sub-
jects continuing their search. There is one exception
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 20.495(a) 12 1.708 4.344 .005
Intercept .000 1 .000 .000 1.000
Subject 1.378 8 .172 .438 .879
Task* 15.890 2 7.945 20.205 .0001
Topic* 3.227 2 1.614 4.104 .040
Error 5.505 14 .393
Total 26.000 27
Corrected Total 26.000 26
* significant at p < .05
R Squared = .788 (Adjusted R Squared = .607)
Dependent variable: Task based satisfaction
A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System 107
to this pattern: when subjects tried to look in lan-
guages having a similar alphabet to English, such
as French, the subject could sometimes guess the
meanings of particular words and this encouraged
them to continue their search.
After completing the three tasks, a short follow-up
interview was held to assess how users viewed the
search process they had completed. Chinese,
Japanese and Korean individuals expressed serious
reservations using Romanized transliteration systems
when creating or interpreting a search. All but one
individual reported great difficulty searching biblio-
graphic records across languages. Most subjects com-
mented that WorldCat may be well designed for
searching for known items in a known language but
that it is less effective when searching for information
by topic and even less effective when searching or
retrieving information in unknown languages.
4. Discussion
Most Chinese and Korean native subjects claimed
that it is very difficult to understand the descriptive
Romanized text without prior knowledge of the record
or special expertise in the original language. The
problems were less pronounced for Japanese who
were better able to read the Romanization for Japanese
materials. For Korean native subjects, especially those
with more extensive search experience using Korean
words, some confusion might have arisen during the
survey and experiment due to changes in the Korean
Romanization system and in the differences in the
Romanization system used in Korea and in foreign
countries. This is one example of different needs
from different languages and it is assumed there will
be more issues related to such cultural and language
differences that should be addressed when structuring
a Cross Language Information Retrieval system for
target users. It is noteworthy that the respondents
in this study began by preferring to input their query
in their own language and resorted to preferring input
in English.
When users expressed confusion, it became evident
that certain functions would have aided them such
as query expansion with suggestions of other words,
synonyms, thesauri or distinguishing homophonic
words. Most users want to have an abstract or summary
of a document or book in their language ― as well
as in English. Thus, the respondents here preferred
a system whose bibliographic description included
three features: original language, Romanization and
English.
4.1 Study Limitation
This study has several limitations. First, it focuses
on limited language choices involving Chinese,
Japanese, and Korean (CJK). Next, this study used
two convenience samples of individuals whose native
languages are Chinese, Japanese, or Korean. Sample
selection was achieved by identifying individuals
using a network of colleagues. The sample was not
randomly selected, and the sample cannot be said
to be representative of a larger population. This,
then, decreases the generalization available from such
108 Journal of the Korean Society for Information Management, 27(2), 2010
a study and it limits validity beyond the sample.
4.2 Transliteration Issues
The CJK languages differ from languages written
in a Latin alphabet in that CJK include unique writing
and phonological systems. For example, there are
400 syllables in Chinese written by Chinese logo-
grams; 110 different moras or syllables written by
kana or kanji in Japanese; and 2,000 Korean alpha-
betic syllabary in Korean in their writing systems.
One common characteristic shared by these three
languages is the use of Chinese characters although
the frequency of their use is different in each language.
Each Chinese character represents a meaning and
those from Japan or Korea could approximate the
meaning of the Chinese character even when its spe-
cific meaning could change depending on the context.
Japanese and Koreans use about 2,000 Chinese char-
acters (Taylor and Taylor 1999, 17).
The biggest challenge of Romanization is making
accurate isomorphic representations using a Roman
script. Most Romanization systems have attempted
to decode the original script through the use of one
or two methods; either transliteration or transcription:
the former tries to map each character one-by-one
based on the original written script of the language;
whereas, the latter tries to transcribe the sound of
the language. Each Romanization system has its own
defining principles and each causes some confusion
and difficulty of use which, from the results presented
here, is exacerbated during topic searches. Japanese
users experienced fewer problems in this study than
others; yet, as Kudo (2010) reports, Japanese
Romanization still confuses users with word division
issues and lack of application of standardized proce-
dures for transliteration.
The data from the survey and from the experiment
with interviews led to an examination of the under-
lying linguistic structure for Romanization. That ef-
fort then led to areas of concern which might be
tested in research settings in order to provide better
access to the CJK materials in current online database
systems. From user interviews the following emerged
as core topics for further investigation: stand-
ardization, simplification, Rosetta Stone, and provi-
sions for a vernacular search which might include:
∙Exploration of a single standardization system
complete with transparent rules which can be
applied by those seeking information ― both
native and non-native speakers.
∙Studies of traditional vs. simplified Romaniza-
tion for Chinese and Korean languages to assess
user satisfaction and ability to retrieve pertinent
information.
∙Over half the users requested that a standard
language, English, be used in parallel with the
Romanized script and that English language ab-
stracts be provided. This Rosetta Stone prefer-
ence implies that translation might be studied
as an alternative to transliteration.
5. Conclusion
Different native languages often engender differ-
A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System 109
ent perspectives and these may express themselves
in unstated needs for those using bibliographic
systems. Language also embodies culture and this,
too, emerged in the findings as a concern when trans-
literation attempts to mimic spoken language which
includes cultural nuances and regional differences.
Future continuation of this research can take two
directions: (1) providing more in-depth research on
the three countries and three languages using a more
representative sample; (2) expanding the countries
surveyed, the languages used, and the number of
individuals contacted in each country. It would also
be appropriate to explore a third area: comparing
different types of Multilanguage systems, such as
those used by Amazon.com and/or online catalog
systems, by different language backgrounds. Of spe-
cial note will be the socio-cognitive and cultural
perspectives of the individuals from each country.
Another future area for exploration would be the
process of the potential sharing of bibliographic in-
formation across borders. Within this would be some
exploration of the cooperative work now being done,
which much of it under the leadership of Online
Computer Library Center (OCLC), which currently
directs the WorldCat effort.
Future research might also address culture in terms
of its influence on user satisfaction and retrieval
effectiveness.
Currently, WorldCat represents one of the largest
multilanguage databases in existence and its im-
pressive size and content expand our information
boundaries. OCLC continues to advance the features
and friendliness of WorldCat. Transliteration is a
bridge to knowledge but it currently needs more
transparency if it is to satisfy the needs of those
seeking information
References
Arsenault, Clément. 2002. “Pinyin romanization for
OPAC retrieval – Is everyone being served?”
Information Technology and Libraries, 21(2):
45-50.
Bossmeyer, Cristine, Willian R. H. Koops, and Stephen
W. Massil. Ed. 1987. Automated Systems
for Access to Multilingual and Multiscript
Library Materials: Problems and Solutions.
Paper presented at the Pre-Conference held
at Nihon Daigaku Kaikan, IFLA, August
21-22, 1986, in Tokyo, Japan.
Gao, Mobo C. F. 2000. Mandarin Chinese: An
Introduction. Victoria: Oxford University Press
Ha, YooJin. 2008. Accessing and Using Multilanguage
Information by Users Searching in Different
Information Retrieval Systems. Ph. D. diss.,
Rutgers University.
Jeong, Wooseob. 1998. “A pilot study of OCLC CJK
plus as OPAC.” Library & Information Science
Research, 20(3): 271-292.
110 Journal of the Korean Society for Information Management, 27(2), 2010
Kim, Kyongsok. 1999. “Standardizing romanization
of korean hangeul and hanmal.” Computer
Standards Interfaces, 21(5): 441-459.
Kudo, Yoko. 2010. “A study of romanization practice
for japanese language titles in oclc world- cat
records.” Cataloging & Classification
Quarterly, 48(4): 279-302.
Lindén, Krister. 2006. “Multilingual modeling of
cross-lingual spelling variants.” Information
Retrieval, 9(3): 295-310.
Oard, Douglas W. and Anne R. Diekema. 1998. “Cross
language information retrieval.” In Annual
Review of Information Science and Technology
(ARIST), 33: 223-256.
Oh, Jong-Hoon, Key-Sun Choi, and Hitoshi Isahara.
2006. “A comparison of different machine
transliteration models.” Journal of Artificial
Intelligence Research, 27: 119-151.
Park, Jung-ran. 2001. “Information retrieval of Korean
materials using the CJK bibliographic system:
Issues and problems.” In Proceedings of the
Second KSAABiennial Conference: Korean
Studies at the Dawn of the Millennium,
245-255.
Shaker, A. K. 2002. Bibliographic Access to Non-
Roman Scripts in Library OPACs: A Study
of Selected ARL Academic Libraries in the
United States. Ph. D. diss., University of
Pittsburgh.
Shin, Hee-sook. 2003. “Quality of Korean cataloging
records in shared databases.” Cataloging &
Classification Quarterly, 36(1): 55-90.
Sohn, Ho-min. 1999. The Korean Language.
Cambridge: Cambridge University Press
Taylor, Insup, and M. Martin Taylor. 1995. Writing
and Literacy in Chinese, Korean and Japanese.
Philadelphia: John Benjamins Publishing
Company
Wang, Andrew H. 2007. OCLC update. OCLC Online
Computer Library Center, Inc. [cited May 10,
2008]
.
Zeng, Lei. 1992. An Evaluation of the Quality of
Chinese-Language Records in the OCLC
OLUC Database and the Study of a Rule-
Based Data Validation System for Online
Chinese Cataloging. Ph. D. diss., University
of Pittsburgh.
Zeng, Lei. 1991. “Automation of bibliographic control
for Chinese materials in the United States.”
International Library Review, 23: 299-319.
Zhang, Foster and Marcia Lei Zeng. 1998. “Multi-
script information processing on crossroads:
Demands for shifting from diverse character
code sets to the UnicodeTM Standard in library
applications.” IFLA Journal, 25(3): 162-167.
Zhu, Xiaojin. 2001. “Chinese languages: Mandarin.”
In Garry, J. and C. Rubino (Eds.). Facts about
the World’s Languages: An Encyclopedia of
the World’s Major Languages, Past and
Present. 146-150. New York: H.W. Wilson
Company.
A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System 111
Appendix A : Survey questionnaire
Ⅰ. Background question
1. What is your native language?
2. Please indicate all other languages you know.
3. Could you please indicate your current position?
student: (please indicate your major, degree and place) _____________________
librarian (please specify your library’s area and your subject area) ___________
Others: ______________________________________________________________
4. When was the last day you used a library system to search for information in a language other than
your own? Please respond to ONE of the below:
a. ____ days ago
b. ____ weeks ago
c. ____ months ago
d. ____ years ago
5. Please indicate below your use of online library systems which can provide information in languages
other than your own language. Include the extent to which you have used such systems.
___________________________________________
6. Have you ever tried to use OCLC’s WorldCat online library catalog?
Yes ____________ No _______ (If yes, could you please comment on your use of this system? If
you have not used WorldCat, then please skip the next question 7.)
Your comment about the WorldCat system: ___________________________________________
7. When you conduct a search, which of the below factors are your greatest concern?
a. misspelling b. ambiguity of a term c. hard to understand a term d. no problem
e. other: ___________________________________________
8. Imagine if you could design a new information system which had the ability to support cross language
information searching and retrieval. Which of the below would be the most helpful to search your
query?
a. system would provide translation dictionary in query
b. translation would be available of the abstract in the target language
c. provide highlighting of the indexing words
d. support synonyms with a top down menu
9. Could you please indicate why you might need information written in other languages, which might
include a language you cannot read?
112 Journal of the Korean Society for Information Management, 27(2), 2010
10. Overall, for how many years have you been doing online searching? _____________ years.
Ⅱ. WorldCat usage
Please conduct a search on any topic of interest to you using OCLC’s WorldCat system. For purposes
of this study, you are being asked to make sure that your search results are written in a language different
from the country where you now live. For example, if you are in the US, please try to find certain
information written in languages other than English. Please record your search experience by responding
to the following questions.
(If you are belong to Rutgers University, you can visit to the library website such as go to
http://www.libraries.rutgers.edu/rul/rr_gateway/catalogs.shtml and then find WorldCat.)
1. Query you searched for (Please type in the same language that you used in the search) _______
2. Translate to English if your topic statement was not in English (if possible)
3. What language you were looking for and from what language? (i.e. Korean–English)
a. From what language _____________ b. To what language ___________
4. How long did it take you to get a satisfactory response to your original question?
__________ minutes. (Please fill in number of minutes)
5. How satisfied are you with the description of each retrieved document? (circle appropriate response)
a. not satisfied b. somewhat satisfied c. I don’t know d. satisfied e. very satisfied
6. Was the retrieved document relevant to your information needs?
a. not relevant b. somewhat relevant c. I don’t know d. relevant e. very relevant
7. Do you think this system is efficient, especially when searching for documents in different languages?
a. not efficient b. somewhat efficient c. I don’t know d. efficient e. very efficient
8. Is there any word that you could not understand even if it was in your native language?
Yes ____ No _____ (If yes, please give an example.)
(example: _______________________________________________________)
9. When you conduct a search, which of the below factors are of your greatest concern?
a. misspelling b. ambiguity of a term c. hard to understand a term d. no problem
e. other: ___________________________________________
10. All things considered, I am satisfied with the system services.
a. Strongly Agree b. Agree c. Undecided d. Disagree e. Strongly Disagree
11. Please describe in detail any difficulties you encountered.
A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System 113
Appendix B : Experiment Questions
Ⅰ. Background questions
1. What is your native language? (please circle)
1: Chinese 2: Japanese 3: Korean
2. Have you ever tried to use OCLC’s WorldCat online library catalog?
0: No 1: Yes
3. Overall, for how many years have you been doing online searching?
0: none, 1:1-2 years, 2:3-5 years, 3: 6-8 years, 4: 9-10 years 5: more than 11 years
4. Are you a librarian?
0: No 1: Yes
Ⅱ. Task questions
3 Tasks will be assigned with different topics.
Task 1: Do a search looking for information written in your native language.
Topic will be given at the experiment.
T11. How familiar are you with the topic
0: I don’t know 1: none 2: little 3: somewhat 4: familiar 5: very familiar
T12. How many queries did you retrieve to find the final answer for this task? _______
T13. How much time did this task take to get the result? ___________ Minutes
T14. How many catalog records did you examine? _______
T15. How many catalog records did you save? _______
R1: Are you satisfied with the result?
0: I don’t know 1: not at all 2: little 3: satisfied 4: very satisfied
R2: How much , related information did you retrieve?
0: I don’t know 1: not related at all 2: slightly related 3: Fairly related 4: Perfect match
R3: Was the information on the retrieved catalogs understandable to you?
0: I don’t know 1: not at all 2: little 3: understandable 4: very understandable
114 Journal of the Korean Society for Information Management, 27(2), 2010
Task 2: Do a search looking for information written in English (2nd language).
Topic will be given at the experiment.
T21. How familiar are you with the topic
0: I don’t know 1: none 2: little 3: somewhat 4: familiar 5: very familiar
T22. How many queries did you propose to find the final answer for this task? _______
T23. How much time did this task take to get the result? ___________ Minutes
T24. How many catalogs did you examine? _______
T25. How many catalogs did you save? _______
R21: Are you satisfied with the result?
0: I don’t know 1: not at all 2: little 3: satisfied 4: very satisfied
R22: How much related information did you get on what you were looking for?
0: I don’t know 1: not related at all 2: slightly related 3: Fairly related 4: Perfect match
R23: Was the information on the retrieved catalogs understandable to you?
0: I don’t know 1: not at all 2: little 3: understandable 4: very understandable
Task 3: Do a search looking for information written in language you don’t know.
Topic will be given at the experiment.
T31. How familiar with the topic?
0: I don’t know 1: none 2: little 3: somewhat 4: familiar 5: very familiar
T32. How many queries did you ask to find the final answer for this task? _______
T33. How much time did this task take to get the result? ___________ Minutes
T34. How many catalogs did you examine? _______
T35. How many catalogs did you save? _______
T36. What language of materials you were looking for?
0: English 1: Chinese 2: Chinese (traditional) 3: Japanese 4: Korean 5: French 6: Arabic
7: Parisian, 8: Spanish 9: Otherlanguages
R31: Are you satisfied with the result?
0: I don’t know 1: not at all 2: little 3: satisfied 4: very satisfied
R32: How much relevant, related information did you get on what you were looking for?
0: I don’t know 1: not related at all 2: slightly related 3: Fairly related 4: Perfect match
A Study on User Satisfaction with CJK Romanization in the OCLC WorldCat System 115
R32: Was the information on the retrieved catalogs understandable to you?
0: I don’t know 1: not at all 2: little 3: understandable 4: very understandable
Ⅲ. Overall questions
R4: Do you think this system is efficient, especially when searching for documents in a different language?
0: I don’t know 1: not at all 2: little 3: efficient 4: very efficient
R5: Do you think this system is user friendly?
0: I don’t know 1: not at all 2: little 3: friendly 4: very friendly