6 

Japanese Character Input: 
Its State and Problems 

Ichiko MORITA: Ohio State University, Columbus. 

Computer processing of information is highly advanced in japan, and it con-
tinues to be researched and improved by the cooperative efforts of the govern-
ment, private corporations, and individual scientists, who are among the best 
in the world. This paper introduces various approaches to the computer input 
of information currently developed in japan, and discusses the possibility of 
their applications to the processing of East Asian-vernacular language mate-
rials in large research libraries in this country. 

Processing of catalog information through an on-line shared-cataloging 
system has become a part of American libraries' common practice, and 
its financial and temporal savings have been proven. However, there are 
some materials not yet considered appropriate for computer processing. 
The Library of Congress' plans for romanizing catalog information for all 
non-roman language materials and putting them on MARC tapes for 
quick distribution of information have been objected to by a large num-
ber of specialists in the field. The opponents' reason has been that com-
puterization of vernacular languages by means of transliteration is not 
satisfactory. Such materials are best handled in their own writing sys-
tems (the languages in this category include Chinese, Japanese, Korean, 
Hebrew, Arabic, and various languages in India). Those specialists in 
the field who see systems working for roman-alphabet materials general-
ly agree that automated systems are very efficient and useful for their 
research. It would be best if non-roman language materials could be 
processed through computers using their own writing systems. 

As far as technology goes, it is possible to process such materials in 
their original form. Systems that have the capability of handling those 
languages directly have been developed; among the most advanced are 
the Japanese systems. Japan has overcome numerous difficulties in de-
veloping systems that are capable of handling Japanese characters. 
Although automation of libraries is not as widespread as in the United 
States (due perhaps to a delay in the development of computers), some 
Japanese libraries have already a decade of experience with advanced 

Manuscript received August 1980; accepted December 1980. 


japanese Character Input/MORITA 7 

systems. Many others have recently started to adopt them. Wide utiliza-
tion of these systems seems to be just a matter of time. 

It will be beneficial to review Japanese methods and consider possible 
adaptation of them to our systems. In the following sections, various 
Japanese approaches to inputting the Japanese language are explained 
with an eye to future automation of non-roman language materials in 
this country. 

THE JAPANESE LANGUAGE AND THE COMPUTER 

It should be noted, first of all, that the Japanese language is an entire-
ly different language from Chinese, although they are often confused be-
cause they both use the same Chinese ideographs in writing. Each 
Chinese ideograph , or character, symbolizes a certain object or denotes 
a certain meaning. The Japanese use them in the Japanese language 
with its own pronunciation in the context of its own grammar, whereas 
the Chinese use them in the Chinese language with its own pronuncia-
tion in the context of its own grammar. This means that a Chinese ideo-
graph could mean the same thing in both languages, but be pronounced 
or read differently and used in different grammatical environments. The 
Chinese ideographs used in Japanese are referred to as Kanji, which 
are, to complicate the matter, used along with Japanese syllabaries 
called Kana. Kana, in two styles called Hiragana and Katakana, total 
about 170 characters. Depending on whether ,a Kanji is used with 
another Kanji or Kana, the reading of it varies. At different times one 
set of Kanji may be read in two or three different ways. 

The total number of Kanji is about 50,000. In comprehensive dic-
tionaries, about 40,000 or more Kanji are included. Medium-sized ones, 
such as Ueda's Daijiten, include about 15,000; concise ones about 8,000 
to 10,000. 1 According to several tests on frequency of Kanji occurrence 
made in various Japanese institutions, approximately 3,000 Kanji appear 
in high frequency, 3,000 are of moderate frequency, and several 
thousand more are of infrequent occurrence. As for geographical names, 
2,279 Kanji will cover most of Japan and 1,500 Kanji will suffice to cover 
personal names, except for very unusual names. 2 Approximately 6,300 
characters are needed for major newspapers such as The Asahi and The 
Nikkei. 

The trends in the use of Kanji are to simplify the characters them-
selves, and not to use difficult Kanji with many strokes. In 1946, the 
Japanese government established 1,850 Kanji as those for daily use, 3 and 
today newspapers and official documents use only those Kanji, except 
for some personal and geographical names. The implication of this trend 
for computerization of Kanji is that, depending on the documents to be 
covered, the need in number and kind of Kanji varies. That is, institu-
tions that deal with scientific or current information do not need as 
many Kanji as other types of institutions that handle documents cover-


8 journal of Library Automation Vol. 14/1 March 1981 

ing longer periods and larger areas of knowledge . For example, Japan 
Information Center for Science and Technology, which mainly handles 
the latest scientific information, claims that with approximately 6,000 
Kanji it can function satisfactorily. An example from the other extreme 
is the National Institute of Japanese Literature, whose collection covers 
older historical periods, during which a great number of Kanji were 
used and many Kanji went through changes, mostly simplification m 
style. The latter institute is constantly adding new Kanji to its system. 

It is obvious then that the first problem in the computerization of 
Japanese materials is the number and kind of Kanji to be included in 
the system. This is a problem of hardware. 

The other problem concerns software. When Japanese is written, its 
words are not divided as in English, for combination of Kanji and Kana 
helps visually to make sentences understandable without word division. 
Also, compound nouns are made by adding other words to a noun, so 
that, if a set of Kanji represents one noun, one can expand its meaning 
by adding another Kanji to it. Though word division has been a problem 
in transliteration and not new in computerization, both arbitrarily di-
vided words and undivided words in particular become serious problems 
in the computer files and in the retrieval of information . 

A question may be raised as to why we need Kanji processing in spite 
of these problems; why isn't computer handling of alphanumerics and 
Kana, which is in use today, sufficient? The answer to this is mainly that 
Kanji possess a definite visual effect. Also, if only romanized languages 
or Kana alone are used, many homonyms may make the meaning am-
biguous. While it is quite possible to write Japanese only in Kana or in 
the'"romanized forms, as proven by the systems in use, it is better, for 
efficiency and precision, to express the language in the way it is actually 
written. 

As for the problem of word division, study is in progress on methods 
of dividing words systematically and automatically, incorporating the 
latest research in the field of applied linguistics. This is more concerned 
with the development of software, and this paper will not delve into it. 

INPUTTING 

Various Japanese approaches to inputting Kanji and Kana are organized 
below into six major groupings according to different inputting devices. 
They are: (1) full keyboard, (2) component pattern input , (3) Kana 
keyboard, ( 4) stenotype, (5) optical character recognition, and (6) voice 
recognition . These six methods are further divided into subvariations as 
shown in table 1. 4 

Full keyboard 

The main feature of this approach is use of a full character keyboard 
as the inputting device. The operator uses the full character keyboard 


japanese Character Input/MORITA 9 

Table 1 . Input Systems 

Major 
Approaches 

Full 

Variations 

keyboard Kanji teletypewriter 

Subvariations 

Japanese typewriter Character location 
Coded-plate scanning 
Coded typeface 
Modified coded typeface 

Tablet style Electromagnetic 

Component 
pattern input 

Kana 

Electrostatic 
Photoelectric 

Training Characters/ Characters 
Needed Minute Accommodated 

Medium-
Extensive 
Medium 

Medium-
Small 

40-100 

30-50 

30-70 

2,300-4,000 

2,205 
2,863 
2,200-3,000 

3,000-4,096 

2,800-4,000 
2,800-4,000 

keyboard Two-key stroke Location correspondence Extensive 60-120 4,096 

Stenotype 
Optical 

character 
recognition 

Voice 
recognition 

Association memory 
Display selection Small 20-30 
Kana-Kanji 

conversion Word conversion 
Sentence conversion 

1,000-2,500 

rather than codes or other symbols. The keyboard varies depending on 
models, usually consisting of frequently used Kanji and both sets of 
Kana, supplemented by Arabic numerals, Roman, Cyrillic, and Greek 
alphabets in upper and lower cases, often with italics, signs, and diacrit-
ical marks. To each character, a two-byte binary code (expressed by a 
four-digit numeral) is assigned, so that when the inputter types a charac-
ter the code for the character is punched on paper or cassette tape. 

Kanji Teletypewriter 

The oldest method for Kanji inputting, still widely in use, is the Kanji 
teletypewriter system or multishift system. One variation of this 
approach, developed by the National Diet Library at an early stage of 
its computerization, has 192 character keys, each having fourteen char-
acters in three columns and five lines, as shown in figure 1. In addition, 
there are fourteen selection keys arranged in three columns and five 
rows on the lower left of the keyboard to correspond to the pattern of 
characters on each character key . When an operator strikes the charac-
ter key B with the right hand and the selection key A with the left hand 
at the same time, the code for the character C is punched on the tape. 


10 journal of Library Automation Vol. 14/1 March 1981 

000 
000 
ooo 
000 
QOO 

\ 

' \ 
\ 
\ 
\ 

:>'1":111l"JWIJ I '_''l'H.l-T tt1UL UM ~~i :f'i :t~ 
;jl;lt'{>,'f r r_r,f 
rx 15 1 lf~ 1 1-E 
--····--j·-- ·· ·-- .. 
lf [l{rl I i'Yliilj f'li: 
·r1 1JGt)f I *Y:nt 

: ii!J,ii¥1.1 I 9;j.;,~;1: 
; ~!z1tt '?" ~- ~:.,;.· .-.t r •.. ~,, •. x ~.:r, r_ x ,r,; ,r;; 
I ~~i_i_ 1 if r. 

---· - . - - -- L -··· ·- -

r'i!..r':~ I tM~m~x 
<¥~1 t :k J~,] f:k {I~ 
~ ri fR t1>/ ilM ~'!<. 
#.l'li!iii *t9.t !k 

IX X: . rEl ~ 

\ 

\ 

\ 

\ 
\ 

\ 
\ 

• 

\ 

.II Character key B 

Character C 

1rSelection key A 

Fig. 1. Kanji Teletypewriter Keyboard of the National Diet Library. 

Included on this keyboard are : 
Kanji 
Kana 
Western alphabets 
Numerals 
Symbols and marks 
Kanji pattern s 
Kanji components 
Space 

2,006 
90 

144 
20 

210 
40 

139 

Total 2,6506 

By using shift keys on the upper left of the keyboard, Kana in both 
styles and alphabets in upper and lower cases can be input. For satisfac-


Japanese Character Input!MORITA 11 

tory operation, the keyers must be professionally trained, and it is said 
that one to three months are necessary for them to be fully trained and 
able to input an average of fifty to sixty Kanji per minute. This is not as 
fast as most other methods discussed. 

Japanese Typewriter 

The second of the full keyboard approaches is the Japanese typewriter 
method, which uses a modification of the standard Japanese typewriter 
with a tray filled with Kanji printing types. The operator finds a charac-
ter in the tray and punches it by moving a metal handle as the type bar 
is punched down to print the character. This is rather primitive and 
different in its operation from the English typewriter, which uses the 
ten-finger touch method. There are four variations: 

Character Location Method. Kanji are arranged on a keyboard by 
their codes, so that when a key is punched, the Kanji is typed on regu-
lar paper as if it had been done by a regular Japanese typewriter. At the 
same time, the code is automatically read from the location of the key 
and is punched on tape. 

Code-plate Scanning Method. Each type bar has a plate attached on 
its side, and the code for the character is marked on its plate . When a 
key is typed, the Kanji is printed on paper and the code from the plate 
is optically scanned at the same time. 

Coded Typeface Method. Each typeface is made with a character on 
the upper half and a code for it on the lower hale When a key is typed, 
both the character and code are printed. The code on the bottom half is 
optically scanned from the printed paper. 

Modified Coded Typeface Method. Instead of typing both characters 
and codes on the paper, this method prints only the characters on the 
front of the paper and, at the same time, prints a bar code on the back 
of the paper. The machine capable of doing this is complicated. The size 
of the character on a typeface can be bigger than in the variation above, 
and the bar code can be larger to make the scanning of the code easier 
and more precise. 

As the discussion of the four variations indicates, the Japanese type-
writer offers the advantage of being able to monitor input at the time of 
keying. 

Since the Japanese typewriter has been in use for a long time in 
offices where a quantity of official documents are dealt with, and since 
ordinary Japanese typists can use this system without any additional 
training, the use of equipment similar in operation was considered 
advantageous . However, it should be noted that Japanese typewriters 
have never become as prevalent as English typewriters, and the de-
mand for computers comes from more areas than just those where 
Japanese typewriters are used . For this reason, the use of Japanese 
typewriters is not as advantageous as its proponents claim . An obvious 


12 Journal of Library Automation Vol. 14/1 March 1981 

disadvantage is its slow speed of operation-thirty to fifty characters per 
minute on the average. Another disadvantage is that the number of 
characters on the keyboard is limited to about 3,000. 

Tablet Style 

This method, also known as pen-touch method, was recently developed . 
Each character has a key, and characters are arranged in a certain order. 
The location of the characters on a matrix sheet determines the two-byte 
binary code, which consists of a two-digit numerical abscissa and two-
digit numerical ordinate . The operator touches the key with a pen-
shaped detector and the code for the character is punched on the paper 
tape. The operation is one-handed, requiring only a light touch of the 
key by a detector. Keys are on one flat keyboard and are color-coded by 
sections to make it easier for the operator to locate them. Light touch 
operation reduces operator fatigue. This method does not require spe-
cial training. However, the number of Kanji on a keyboard of reasonable 
size is limited to approximately 3,500. By shifting, twice as many char-
acters can be handled, though all characters are not indicated on the 
keyboard. Speed of input is not very high-thirty to seventy characters 
per minute. This system, already used in many libraries, is becoming 
increasingly popular because of its easy operation. There are three differ-
ent technologies used: electromagnetic, electrostatic, and photoelectric. 
There are no differences in actual input operation for those electronical-
ly different methods. 

Component Pattern Input 

Although not a full keyboard method, component pattern input is 
closely related to these methods. 

The idea behind this approach is that most Kanji are composed of one 
or more basic component units, two or more of which can be put 
together into one Kanji according to one predetermined pattern out of 
forty general patterns. The inputting device has keys for those forty 
patterns along with keys for individual components on a special key-
board. To compose a Kanji, a key for an appropriate pattern is selected 
and typed, and components are chosen to fill each individually num-
bered block of the selected pattern, following the established order as 
shown below. 7 Each pattern has a code, and so does each component . 
When a key is typed, the code is punched on a paper tape as shown in 
figure 2. There are cases where a Kanji with two components can be a 
component of another Kanji, as shown in the first and second examples 
in figure 2. A Kanji is constructed by punching at least three codes : one 
for a pattern and at least two for components. Then, a Kanji dictionary 
consisting of several thousand master-code combinations (see figure 3) is 
stored in a magnetic drum, and the several codes to compose a Kanji 
punched on paper or cassette tapes are converted through this diction-


japanese Character lnput!MORITA 13 

K&njl nol on 
Pattern a Componenl Parlo (radiula) 

lhe Keyboard• 

;1§ *-D! [E] 
. f§ ---- . .J 

2804 38D 2723 --·-- C od eo 

~t§ 
!-.~f~~ 

--:~ . : 
.... .: . ... ! 00 "J * EJ 2806 3813 1638 1938 -- Codu t-t ;f:t:~ lm * ~t ~ ~ ~' ,,.~- ; u : __ ~~-; 4 2807 1638 1138 1138 1138 --- Cod eo 

ffe ~*,L; ~ [1@ * 
;-1-1 y {! -l __ m1 ___.. 4 

2807 1o3a 1817 142A 08Z4 ---- Cod eo 
Fig. 2 . Component Pattern Input. 

Z804 3813 Z7ZB 0000 0000 0000 8118 • ~-m 

Z806 3813 1638 193!1 0000 0000 B 118 -- ao 

Z607 1638 1138 1138 1138 0000 6117 -~ 1A 
Z807 1638 1817 l4ZA 08Z4 0000 9815 - .. t~ 

Fig. 3. Kanji Dictionary. 

ary to a two-byte binary code assigned to that particular Kanji. These 
are then handled as other Kanji with an individual code. 

Though this can be a stand-alone approach to inputting Kanji, the 
principle has been adopted by the National Diet Library to supplement 
the inputting of Kanji on the full keyboard Kanji teletypewriter. The 
National Diet Library uses this system when inputting Kanji that are 
not included in its keyboard. Instead of having a special separate 
keyboard, the Kanji teletypewriter of the National Diet Library inte-
grates patterns and components as equivalents to other characters. Its 
keyboard includes forty patterns and approximately 140 components. 

This was the most elementary approach to computerize Kanji . Con-
ceived in the early developmental stage of Kanji processing, it used one 
of the characteristics of Kanji, the composition from several components. 
In actual situations, this technique requires at least three key strokes for 
one Kanji and consumes time to locate the needed component on the 


14 journal of Library Automation Vol. 14/1 March 1981 

keyboard. Furthermore, it requires the complicated extra step of put-
ting input codes through a Kanji dictionary to combine component 
codes into a code per Kanji. No library is currently using this system by 
itself. 

Kana Keyboard System 

The keyboard of a Japanese syllabary typewriter has adapted the con-
ventional English typewriter keyboard and has standard roman alphabet 
keys that contain Katakana in shift (figure 4). Since the number of Kata-
kana exceeds that of roman letters, the Katakana keys are extended to 
keys for numerals and punctuation marks. This means that this typewrit-
er can be used either for Kana or roman letters by changing its mode. 

Fig. 4. Kana Typewriter Keyboard. 

Two-key Stroke Method 

This variation of the Kana keyboard system is referred to as the two-
key stroke system, and uses Kana as codes not as letters . Roman letters 
can be used as codes, too. There are two different subvariations. They 
are: 

Location Correspondence. Keys are divided into two sections : one for 
right hand, and the other for left hand. If two keys are to be stroked, 
there will be four possible combinations of key strokes: (1) left hand 
twice, (2) left .and right, (3) right and left, and (4) right twice. The key-
board is accompanied by a Kanji table in which characters are arranged 
in several blocks and in a certain order within each block. Each block, 
which contains twenty-six Kanji in a four-by-six arrangement, is made 
according to each combination of strokes: first block is left and left; 
second block is left, right, etc. Within each block, the ordinate consists 
of keys for the first stroke and the abscissa for the second . A Kanji 
which is at the intersection of the above indicates which keys are to be 
typed. When Kanji A is to be typed (see figure 5), since it is in block A 
indicating the stroke combination as left and left, the operator types A · 
and W by left hand. If Kanji B is to be typed, the operator types key A 
by left hand and key P by right. Each key has a byte code and a com-
bination of two key strokes makes a composite, a two-byte binary code, 
for a Kanji. The bit may be changed by shifting, and different Kanji can 


Block A 

(For left, left) 

g j- .,;( '7-. 
(Q) (w) (E) (R) 

~ ( 1) 0000 
'! (Q) 00 00 
4- (A) o• 00 
ll) 0/0 0 0 (Z) 

' ,. 
Kanji A 

japanese Character Input!MORITA 15 

'IJ / 
(T) (Y) 

0 0 
0 0 
00 
0 0 

~ (1) 

"' (Q) 

4- (A) 
''l (Z) 

Block B 

(For left, right) 

7-.:::.. 7--e" o 
(U) (I) (0) (P) ($) (C) 

000000 
oooooo 
ooo.oo 
0 0 0/0 0 0 

,. / / 
Kanji B 

Fig. 5. Kanji Table for Location Correspondence Method. 

be typed if another table is prepared for Kanji with different bits. 
Association Memory Method . In this method, each Kanji is given two 

Kana which usually represent a reading of that Kanji. The operator 
associates a Kanji to be input with two Kana assigned to that Kanji, and 
types them with two strokes using the Kana keys. 

Both of the key-stroke methods are economical as well as convenient 
because of the wide availability of Kana typewriters . Mainly for that 
reason, both of these systems . have been well accepted and are expected 
to grow further. Since this touch method does not require the operator 
to look for the character on the keyboard to input, it is the fastest to 
operate and is considered suitable for input in quantity. It is possible to 
input 60 to 120 characters per minute. The only drawback is that the 
operator must get acquainted with the arrangement of Kanji in the first 
variation, and must memorize all the associated Kana spelling for many 
Kanji in case of the second variation. In either case, the operator must 
be professionally trained. 

The Japan Information Center for Science and Technology, which in-
dexes many scientific publications, employs a vendor who uses the loca-
tion correspondence variation of this system for inputting information. 

Display Selection 

This also uses a Kana typewriter with a screen in front . When a word 
is typed in Kana, a group of Kanji with that sound are displayed on the 
screen. The operator chooses the right Kanji with a light pen-a slow 
but accurate operation. The operator does not have to be specially 
trained for this. 

Kana-Kanji Conversion 

In contrast to the conventional approach of full keyboard inputting, an 
entirely new method for inputting Kanji is gaining popularity as the 


16 journal of Library Automation Vol. 14/1 March 1981 

availability of sophisticated software increases. This uses a Kana type-
writer keyboard to input Japanese in syllabary or romanized form, con-
verting them to Kanji by software. There are two ways of conversion: 
one that converts word by word, and the other sentence by sentence. 

Stenotype 

The stenotype is a typewriterlike device. The operator must be able 
to take shorthand. When the stenotype is used, it punches words in 
paper tapes. Therefore, inputting is high speed. However, the operator 
must receive proper training. 

Optical Character Recognition 

This system, developing quickly and expected to gain wider use, can 
scan a maximum of 2,500 printed Kanji. 8 One variation connects a writ-
ing tablet to a computer so that as the operator writes Kanji on the tab-
let, the computer scans them in stroke order. This function of scanning 
by the stroke order is considered to be an advantage for processing 
some types of Japanese documents. The drawbacks are that the system 
is still very expensive, and the number of recognizable characters is few-
er than 2,000. 

Voice Recognition 

This is an oral-visual system, in which the human voice is read by a 
computer. Obviously the most difficult to develop, this system is still in 
an experimental stage . However, a prototype has been demonstrated at 
various exhibitions, and the system apparently possesses great potential. 

Summary 

Pattern configuration and output devices for Japanese characters are 
basically the same as those for English. However, the pattern genera-
tion of characters is mechanically more complicated than that of the ro-
man alphabet, because Kanji has a more complicated structure than the 
roman alphabet and the number of components is greater. Each Kanji is 
represented by a two-byte binary code rather than one byte as in roman 
alphabet. Because of this, the efficiency of retrieval is low. Presently, 
hard copy and typesetting for printing of hard copy are the major output 
forms, and very little on-line retrieval of information with Kanji is in 
current operation. 

PROBLEMS PARTICULAR TO KANJI PROCESSING 

Among numerous problems in processing Kanji through computers, 
major ones are: (1) which Kanji are to be included; (2) how many charac-
ters are to be handled; (3) what code should be assigned and how it 
should be arranged on the keyboard or table; and (4) how the Kanji not 
included on the keyboard should be treated. 

In the early stage of Kanji computer development, different institu-


japanese Character Input/MORITA 17 

tions handled the problems in ways best suited to their individual 
needs, according to the nature of the literature covered, the amount of 
literature processed, and the kinds of output needed . They ex-
perimented with the then best available capabilities. As a result, the 
finished systems are all independent and mutually incompatible. Stan-
dardization is obviously necessary for exchange of information among the 
systems. 

In order to set standards for selection of characters and assignment of 
codes, JIS (Japan Industrial Standard) C6226-1978 has been compiled by 
the Japan Association for Development of Information Processing. This 
is a table of characters designed for information exchange (a portion of 
which is shown in figure 6). It has a one-byte code as its abscissa and 
another as its ordinate. Characters are arranged so that the intersection 
of abscissa and ordinate determines a Kanji whose code consists of four 
numerals, two from the abscissa and two from the ordinate. Included in 
the table are Kana in both styles, Roman, Greek, and Cyrillic alphabets 
in upper and lower cases, diacritical marks, numerals, and punctuation 
marks, as follows: 

1. Special characters 108 
2. Numerals (Arabic) 10 
3. Roman alphabets 52 
4. Hiragana 83 
5. Katakana 86 
6. Greek alphabets 48 
7. Cyrillic alphabets 66 
8 . Kanji 6,349 

Total 6~8029 

In the first section of the table , numerals, alphabets., Kana, and special 
characters are grouped . In the second section, the total of 2, 965 fre-
quently used Kanji are arranged as the first priority group, and an addi-
tional 3,384 Kanji are selected as the second group 10 in the bottom half 
of the table. Kanji are printed in the preferred style for printing type-
face. This table will resolve problems 1 to 3 mentioned above. Institu-
tions that had arranged their own codes for Kanji, including the Nation-
al Institute of Japanese Literature, are now automatically translating 
their own codes into JIS codes. 

In cases where needed Kanji are not included on the keyboard, han-
dling varies. With the Japanese typewriter, because each Kanji is in-
scribed on a typeface, only the Kanji on that typeface is printed when 
the type bar is stroked . Therefore , only Kanji that have typefaces can be 
input in this system, while some other handling is possible in other 
methods. 

While the number of characters that can be accommodated on 
keyboards is limited to 2,000 to 3,500, depending on the type of equip-


18 journal of Library Automation Vol. 14/1 March 1981 

b7 D D DID D D D D 0 D 0 0 0 
b6 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 

~ bs D D D D D 1 D 0 D D D 0 D D 2 -
"' b4 D D D D 0 D 0 1 1 1 1 1 1 1 bJ D 0 D 1 1 1 1 0 0 0 0 1 1 

1-
b2 D 1 1 D 0 1 1 0 0 1 1 D 0 
bt 1 0 1 0 1 0 1 0 1 0 1 D 1 

~ 1 "'1 1-

~ b4 1 2 3 4 5 6 7 8 9 10 11 12 13 b; b6 b5 b3 b2 bt 
0 1 0 0 0 0 1 1 :·s P: I Jl r-f II ' lll-i . . ? I ~ ~ lJ ' 0 Ji' 1 • _; ' 1... ---' . . 
0 1 D 0 1 0 1 0 2 ~ OIC'JI6 A. \l v * ' T -- i t 
0 1 0 0 0 J1 1 3 
0 1 0 I Q i 1 0 0 4 ... ..J.. ~.--. ) ;{_ I .z H tJ' /J{ ~ "> d) "' 7 }; 
0 1 0 0 1 0 1 5 7 711 1 '/ rf .:r.. .X. ;;t ;t IJ IJ~ ~ 
0 1 0 j 0 I 1 II 0 6 A!Bir t.IEIZ H 8 r KIA M N 
0 1 0 j 0 1 1 I 1 I 7 A 6 1 8 rln E E )K 3 l1 i1 K JI 
0 1 0 1 0 olol 8 

0 1 ol1,oloi11 9 

0 1 0 1 0 II! 0 10 
0 I 1 0 1 0 1 ' 11 J. 
0 1 ol 1 1JO 0 12 
0 1 0 1 1 0 1 13 

0 1 0 1 1 1 0 14 
0 1 o 1 1 1 1 1 15 - r- · 

:!fi P.§. k ~ n -~ "' 1 t-'· ttr; j ;~ ;_,~ -ftt !ffi 0 1 1 0 0 0 0 16 5.P.. ' t ,u a. ').{ * _[§ ~c...· 
0 \1 ' 1 0 0 0 1 17 v- I,"'- I ~~ ,., .. P-[ :tJI r'-· ft•- ;J;J I rr: 1fN .!Jfl ·~.C.~ >j(; ,>_l;;, , .~. lit (j • 1 -'f- J•--;;1 
0/ !_11/ 0 0 1 0 18 tftl B.fltitti l~j [£}:\ £ fJil ~~ n ~;_rj :& !iii] :l~ j . 

f--""· I . --:-- - ---·-·· ~q.- ~~ t~r-i~ 1 Jf( fE .r.t: ''"' iF~~ rm 0 1 1 0 OJ1/l 19 IS •r ·1,. i \. 1,- El ;r-.L; j,~ ~ 
0 1 1 0 1 0 0 20 5''5 ;\ J ....:: I Ji "'~ fn - . f I )(IJ • 'f-l- J!t. ret Jf~ ;flj /fJJ •WJ .;LJ: p~ 
n I 1 1 n 1 I n I 1 ?l ~ .J~ I M I ~ \.;!cr J:.Jt ~rr ~ i.Gi ~;J 14!.~ H:l :=r 

Fig. 6 . Code of th e japanese Graphic Character Set for Information Interchange. 


Japanese Character Input!MORITA 19 

ment, character generators have the capability of outputting more than 
the number of characters on the keyboard. Figure 7 shows their rela-
tionship. Characters that are in the generator but not on the keyboard 
must be frequently processed, because the number of characters needed 
for most documents could reach 6,000 to 6,500. Using a shift key to en-
ter another mode is a fairly common technique for inputting uncommon 
Kanji. The keyboard may not have a character but, if the character 
generator has it, the code for that character can be input by shifting. 
For example, if a character on the keyboard has a code 0117, a bit is 
changed so the code 8117 can be typed by shifting and typing that key. 
If the code 8117 is assigned to another Kanji not on the keyboard but 
indexed in the dictionary, it can be input. This applies for the Kanji 
teletypewriter, tablet style, and the two-key stroke variations of the 
Kana typewriter. 

In the Kanji teletypewriter system used by the National Diet Library, 
the keyboard accommodates 2,650 characters, while its character gener-

I 
I 

I 

I 

,---- ...... 
' 

/ -'--

Fig. 7. Kanji Creating Capability. 

Outside system capability 

System capability 

Character generator capability 

Keyboard characters 

ator has the capability for 5, 717. Operators in the National Diet Library 
input Kanji that are not on the keyboard by using component pattern 
input method. Or, if the operator finds the Kanji code in the specially 
compiled dictionary in which codes for Kanji are indexed, a shift key is 
used to change the bit, thus creating the code for Kanji not on the 
keyboard. Most other tablet systems use code dictionaries. In the two-
key stroke variations of Kana typewriters, tables of Kanji for second and 
third or more shifts can be built, especially when the location associa-
tion method is used. 

The handling of Kanji that are not in character generators is more dif-
ficult. Only the digital character generator, the kind that uses either dot 
or stroke, can add characters fairly easily. In the flying spot system, 
characters can be added, but it must be done professionally with an 
additional character cylinder and is very costly. The National Diet Li-
brary, which now uses flying spot, limits addition of Kanji to a mini-
mum. Because its output is solely in printed book form, the National 
Diet Library inputs a fill character for Kanji not in the system . When 


20 journal of Library Automation Vol. 14/1 March 1981 

the phototypeset masters are made, the fill characters are replaced by 
typeset characters . The use of a fill character suffices only when the out-
put is phototypeset, because there is a step to replace fill characters by 
typeface. However, as long as the data base includes many fill charac-
ters on the magnetic tapes, the on-line retrieval of information or later 
utilization of tapes becomes unsatisfactory . 

The National Institute of Japanese Literature uses a dot matrix and 
prints by wiredot impact . If a Kanji is not in the character generator, 
the institute's staff composes the Kanji in an enlarged dot matrix and 
creates the capability for printing in the generator. If the Kanji made in 
such a way is used only once, the Kanji pattern is not stored in the 
character generator, so that the generator does not reach its full capacity 
quickly. The enlarged dot composite for Kanji created in the institute is 
filed and indexed for future use. 

Most other institutions simply do not use those less commonly used 
Kanji, and substitute Kana for them . 

In addition to the problems common to any character output, such as 
size and number of dots, the problem of the space for Kanji in relation 
to other characters and the choice of vertical or horizontal printing of 
Japanese sentences with Kanji must be considered. 

Kanji have many strokes and, as mentioned before, are expressed by 
two-byte codes . Each Kanji needs a double space when displayed on 
screens or printed. When a Kanji is used with numerals or Kana, the 
Kanji part looks fine but the numerical part has too much space be-
tween each numeral. Therefore, input of Kanji is done in a Kanji mode 
and input of Kana, roman alphabets, and numerals are in a Kana-
numerical mode. In this way a multidigit figure looks like one whole 
figure rather than a line of one-digit figures . 

Some formal documents must be printed in the traditional vertical 
arrangement. To cope with this situation, some line printers have the 
capability to precompose a vertical page before printing it. 

There are multicolor CRTs · on the market that can be used for the 
retrieval of library-related information, e. g., main entry in red, series 
statement in yellow. 

One last problem that must be considered is that most of these sys-
tems require trained operators, or else the operation is very slow. The 
information is edited and compiled by the editors and prepared for in-
put in the form of worksheets. So are the revisions. At various stages of 
revising the text, the information must be printed, given to the editors, 
and revised . Further developments in simplifying input and revising 
texts for efficient flow are to be expected. 

APPLICATION OF KANJI SYSTEMS 

Processing of vernacular-language materials in their own writing sys-
tems is considered vital for research libraries in this country. In adopt-


japanese Character Input/MORITA 21 

ing the Kanji systems in such libraries, there are three major factors 
that must be considered: the objectives and needs of the institution, the 
cost, and the personnel. 

First, the institution must know what it must accomplish by means of 
such a system. The needs may not be the same for all institutions . Is 
the system for retrieving catalog information, or for inputting catalog and 
other information? Is it for internal processing or patron use? Is it for a 
large bibliographic utility to distribute information to its subscribers, or 
for an individual institution to process its own information? Could the 
system be shared by the department of Asian studies in any way? The 
character set needs· of the institution are a major factor in choosing the 
system . 

Since input and output devices are different, i.e., one cannot input 
Kanji on a CRT and retrieve Kanji from the same CRT, the institution 
must consider how much it will need to input, or whether it can rely on 
available data bases. Some institutions may not need any input equip-
ment if they utilize available data bases . If Japan MARC and other tapes 
are made accessible by a large bibliographic utility in this country, the 
institutions will be able to obtain bibliographic information in Kanji on 
the screen. If they want only catalog cards or a COM catalog, they will 
not need any equipment except the terminals supported by the utility. 
If they want to input, they must consider what form or forms of output 
they need, how to create the characters not included in the system, in 
addition to which system to choose. 

Second, cost is an important factor. Is the expense jl.lstified in terms 
of the other needs of the library? What can be accomplished per dollar 
spent? The Kanji systems are still expensive, though the cost will even-
tually be reduced. How much can be spent and how much continuing 
support can be expected are factors that modify system expectations. 
The budget must include not only the one-time hardware cost , but also 
the software, maintenance, and personnel. 

Third, the availability of personnel will affect the choice of system. 
What degree of language expertise does the system require in each 
stage of operation, such as inputting, maintenance , and programming? 
Does it need terminal operators trained in those languages? What other 
personnel does the system need as far as language-related qualification 
is concerned? 

Apart from the three major factors discussed above, there are some 
technical aspects that must be adjusted to library situations in this coun-
try. Since Japanese, Chinese, and Korean use the same Chinese ideo-
graphs to different degrees and in different ways, libraries considering 
automated processing of these language materials are probably expected 
to handle all three languages by the same system, to say nothing about 
the other non-roman scripts. Problems will arise in selecting characters 
for inclusion in the system. As pointed out earlier with regard to 


22 journal of Library Automation Vol. 14/1 March 1981 

Japanese character processing, there are simply too many characters for 
the present capacity of any computer. If Korean and Chinese languages 
are to be handled by the same computer, this problem multiplies. The 
Korean alphabet, called Hangul, would have to be included. Chinese 
has more characters than Japanese. Worse yet is the fact that some Kan-
ji are simplified in different ways in Japan and China, so that they are 
neither recognizable nor interchangeable between them . It will be an 
enormous task to accommodate both in the same system. 

Another problem is the arrangement and indexing of Kanji. If a full 
keyboard, a Japanese typewriter keyboard, or two-key stroke system, 
especially its location association method by Kana typewriter, is consid-
ered for Japanese, Chinese, and Korean, the arrangement of the char-
acters must be indexed and accessed for the three languages, in addition 
to the multiple readings found in Japanese. For example, Kanji on the 
Japanese keyboard are usually arranged by the initial sound of the 
Japanese reading of the Kanji . This arrangement will be useless for 
Chinese and Korean, because Japanese readings are not the same as 
Chinese or Korean readings. The arrangement of Kanji on the 
keyboards must be on some new principle common to these languages. 

Even if the Kana-Kanji conversion is used, and roman alphabet-Kanji 
conversion software is adopted, software to handle those three languages 
must be developed. Such software would have to be highly sophisti-
cated. The presence of many homonyms in Chinese will cause a great 
problem to the extent that the system relies on transliterated or roman-
ized forms of the language . Recognition of the many identical spellings 
in different language contexts will be extremely difficult. 

The above discussion is based on what is currently available in Japan . 
The combination of existing inputting, generating, and outputting equip-
ment developed by Japanese technology opens up various possibilities 
for us to build effective systems in this country . 

ACKNOWLEDGMENT 

This article is based on a study conducted in Japan as a Japan Foundation professional 
fellow, and as a visiting re search fellow of the Center for Research on Information and 
Library Science, University of Tokyo. 

REFERENCES 

l. National Institute of Japanese Lite rature, Implementation of a Computer System and 
a Kanji Handling System at NI]L (Tokyo: NIJL, 1978), p.16. 

2. Toshio Ishiwata, "Kanji Shori Kenkyu ni Motomerareru Mono " ("Requirements for 
Study on Kanji Processing"] Computopia no.9 (1977) , p.35 . 

3. Gendai Yoga no Kiso Chishiki , 1980 {Basic Knowledge on Current Terms , 1980] 
(Tokyo: Jiyukokuminsha, 1980), p .999. 

4. Figures are taken from the following two sources and compiled by the author: Hase-
gawa, Jitsur6. "Kanji Shari Sochi" ("Kanji Processing Devices"] ]aha Shari [In -
formation Processing] 19, no.4:353 (April 1978). 


japanese Character lnput!MORITA 23 

Sugai, Kazur6. "Kanji Nyii.-shutsuryoku Sochi mo Kaihatsu Doko" ["A Trend in De-
velopment of Kanji Input-Output Devices"] Business Communication 16, no. 7:41 
(1979). 

5. Used for the pattern input mentioned in the following component pattern input 
system . 

6. National Diet Library, Library Automation in the National Diet Library (Tokyo: 
The Library, 1979), p.4 . 

7. Ibid., p.7 . 
8. Asia Business Consultants is using an optical character recognition system that can 

scan handwritten Kana and numerals in a small scale to input and process catalog 
information for a library collection. 

9. "Joh6 Kokan no Tame no Kanji Fug6 no Hy6junka" ["S tandarization of Kanji Code 
for Information Interchange"] Kagaku Gijitsu Bunken Siibisu [Scientific and Techni-
cal Documents Service] no.50 (1978), p.29. 

10. Ibid., p .28. 

Ichiko Morita is assistant professor in library administration and head, Automated Pro-
cessing Division, the Ohio State University Libraries . 

EDITOR'S NOTES 

Most ]OLA readers are aware of significant delays in publication in the last 
volume. Susan K. Martin, a former editor of ]OLA, and Richard D. John-
son, a former editor of College & Research Libraries , gave freely of their 
time and energy to bring the journal back on schedule. Mary Madden, 
Judith Schmidt, and the members of the Editorial Board under the lead-
ership of Charles Husbands all worked closely with Sue and Richard in this 
effort. This was a second time around for Sue, who undertook a similar task 
when she assumed the JOLA editorship in 1972. The ]OLA readership and 
this editor owe debts of gratitude to Sue, Richard, and all the others who 
helped. 

We do not foresee major changes in the format of the journal as estab-
lished principally under the editorships of Kilgour and Martin. We look for 
increased strength in our Book Reviews section under the editorship of 
David Weisbrod. The addition of Tom Harnish as assistant editor for Video 
Technologies indicates our recognition of the growing importance of video-
based information systems. 

We encourage reader suggestions. W e welcome brief communications of 
successes or failures that might be of interest to other readers. Letters to 
the editor about any of our feature articles or communications are solicited.