PII: S0957-4174(03)00066-6


An application of expert systems to botanical taxonomy

W. Fajardo Contreras
a
, E. Gibaja Galindo

a,*, A. Bailón Morillas
b
, P. Moral Lorenzo

a

a
Universidad de Granada, E.T.S. de Ingenierı́a Informática, Departamento de Ciencias de la Computación e Inteligencia Artificial,

C/Periodista Daniel Saucedo Aranda, 18071 Granada, Spain.
b
Universidad de Almerı́a, Escuela Politécnica Superior, Departamento de Lenguajes y Computación, Carretera Sacramento s/n,

04120 La Cañada de San Urbano, Almerı́a, Spain.

Abstract

The implementation of intelligent systems is not particularly widespread in the field of Botany and even less so on Internet. At present, we

can currently only find hypertext documents or databases which store unprocessed information. The GREEN (Gymnosperms Remote Expert

Executed Over Networks) System is presented as the application of Artificial Intelligence techniques to the problem of botanical

identification. GREEN is an Expert System for the identification of Iberian Gymnosperms which allows online queries to be made. It can be

consulted in: http://drimys.ugr.es/experto/index.html

q 2003 Elsevier Ltd. All rights reserved.

Keywords: Gymnosperms; Identification keys; Expert Systems; Artificial Intelligence; World Wide Web; Iberian Peninsula

1. Introduction

Plant Taxonomy is a complex, meticulous science which

allows taxa to be identified by retrieving information

contained on them in a classification system. There are

various ways which this identification may be carried out,

although the one most commonly used employs dichotomic

keys (a process which requires knowledge of botanical

terminology and organography). As a result of the complex-

ity of this process, botany-related activities are not

particularly automated. In fact, the systems which exist

are basically databases which store files on the specimens.

Artificial Intelligence can offer a more productive approach

to these systems by processing the information they contain

in order to obtain knowledge which has not been stored

explicitly in the database.

Within the wealth and variety offered by the plant

kingdom, the subject of scientific disclosure has been dealt

with using Artificial Intelligence techniques with a specific

study of the group of Gymnosperms (Gymnospermae) in the

Iberian peninsula. This group was chosen due to the

presence of important forest species which it contains. In

addition, many of these offer resources or are cultivated as

ornamental, which makes their identification useful for non

botanical expert users.

This has all given rise to GREEN (Gymnosperms

Remote Expert Executed Over Networks), a pioneering

system in the application of Artificial Intelligence

techniques to the field of botany. GREEN is an online

decision aid system, resulting in a much greater and

faster diffusion of knowledge and a broader receptor

spectrum.

2. Material and methods

We have divided this study on GREEN into 5 parts:

† A first part (Sections 2.1 and 2.2) in which we describe

the structure of the system, and defines the main modules

which comprise the system and the knowledge gathered.

† A second part (Sections 2.3, 2.4, 2.5, 2.6) in which we

develop the process for acquiring and validating the

knowledge available on the problem domain until a

knowledge base is finally obtained. In this part, the

processing of imprecise information, common to this

type of problem, is also discussed.

† A third part (Section 2.7) is devoted to the reasoning

process which the System uses.

0957-4174/03/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.

doi:10.1016/S0957-4174(03)00066-6

Expert Systems with Applications 25 (2003) 425–430

www.elsevier.com/locate/eswa

* Corresponding author. Tel.: þ34-958240468; fax.: þ 34-958243317.

E-mail addresses: gibaja@decsai.ugr.es (E.G. Galindo).

http://www.elsevier.com/locate/eswa


† A fourth part (Section 2.8) in which we discuss other

important features of the System.

† Finally, we finish (Section 3) with conclusions drawn

directly from what has been presented in this article and

from the bibliography used.

2.1. System structure

The system structure is directly derived from the way in

which botanical experts work. Dichotomic keys of the type

IF – THEN are used for the classification and recognition of

plant species. That is to say, that each key leads to either

another key or a plant species. In this way, when a botanist

wants to classify a particular species, it is possible to

distinguish:

† A source of knowledge comprising all the available

information on each plant species in the form of

dichotomic rules.

† A process of the use of this knowledge in order to solve

the particular problem (keys are searched until a

particular species is identified).

This description coincides perfectly with that of a

Knowledge-Based System and more specifically with that

of a rule-based Expert System (Luger and Stubblefield,

1993) with: a Knowledge Base which stores knowledge

about the domain of the problem in the form of rules and an

Inference Engine which extracts information from the

Knowledge Base.

In addition to the two essential modules described in the

previous paragraph and reflecting the ideal structure of a

Knowledge-Based System, the System has:

† An uncertainty processing module fitting the nature and

subjectivity of the observer.

† A justifying module which will explain the results

achieved to the System in a language close to the natural

language.

† We will also add user support modules.

† A multimedia database to reference known species.

† A glossary of scientific terms to make the System

more accessible to users who are not botanical

experts.

Additionally to design and implement a server which

will deal with user (or client) requests and send the results

by Internet is needed.

In Section 2.2 we outline the process for the design and

implementation of the System, detailing the Artificial

Intelligence techniques which have been applied.

2.2. Knowledge gathered by the system

The first stage is to determine its application domain, that

is, the type of knowledge the System will manage. As we

have mentioned, the group of Gymnosperms has been

chosen from which information is provided on 46 taxa

present in the Iberian Peninsula (Castroviejo et al., 1986)

both autochthonous and cultivated.

In addition to the Knowledge Base, which has been

optimized in order to obtain results in the queries, the

System gathers information on the System domain in other

formats and these are incorporated into a multimedia

database which provides images and data about its

distribution and ecology and a glossary of botanical terms

which make the arduous task of species identification easier

and more enjoyable.

2.3. Knowledge acquisition and elicitation

The first problem when developing the System is that the

information available on the problem domain does not have

a structure which may be directly translated to a Computer

System. The information is dispersed, incomplete; it is

imprecise and unstructured. In order to be able to represent

the knowledge in an appropriate way, a process of

knowledge acquisition and elicitation is needed, and on

which the final functions of the System depend to a large

extent. In order to begin the acquisition and elicitation

process, we begin with different keys (Blanca and Morales,

1991, Font Quer 1979, López González 1982, Garcı́a

Rollán, 1983, Krüssmann, 1972). We gather and summarize

their information, thereby producing a list of diagnostic

characters (descriptors or attributes) at family, genus,

species and subspecies level. This hierarchical organization

of the information offers the advantage of multilevel

answers so that, even with little information, some objective

may be reached in the higher levels of the hierarchy. This

has a simple explanation:

Generally, in order to reach an objective in the higher

levels of the hierarchy only a small amount of information is

needed, which is also what is observed more easily.

Heuristically, this leads us to suppose that the minimum

amount of information which the user knows will be that

which will allow inference in the highest levels. As

information becomes known, the response will be refined

until the lower and less general levels of the hierarchy are

reached. The more information we have, the more we will

know, nevertheless, results may generally be obtained with

little information. All information has subsequently been

compared by observing nature and consulting herbalist

documents and experts.

The most important taxonomical characters in Gymnos-

perms have been divided into different groups: general

aspect of the taxon, characteristics of the leaf, of the

branches, of the shoots, monoecious or dioecious, charac-

teristics of the fructification (cone and ‘berry’ cone), of the

seeds, and ecology of the taxon.

With these characters, decision tables have been

compiled (Durkin, 1994), which gather the identifying

diagnostic characters for each taxon ‘Table 1’. As it is not

W.F. Contreras et al. / Expert Systems with Applications 25 (2003) 425–430426


advisable for these tables to have many empty cells, they

have been filled in since many were not necessary when the

taxon were identified using the traditional method.

Although initially filling in a table of this type supposes a

greater effort than using dichotomic keys directly, this

investment is easily compensated for since these will enable

us to apply Artificial Intelligence techniques in order to

obtain keys which are different from the standard ones.

Botany uses identification keys, whereas applied Artifi-

cial Intelligence techniques determine the minimum set of

diagnostic characters in order to recognize the different taxa.

Artificial Intelligence allows us to find determining

characters, which exclude others, and this enables quicker

identification than that provided by the traditional method.

2.4. Obtaining the Knowledge Base

A set of rules (represented in the Knowledge Base) is

obtained automatically from the tables. For this, we apply

Artificial Intelligence learning techniques (Machine Learn-

ing), in particular we modify the ID3 algorithm proposed by

Quinlan (Ignizio, 1991), so that it allows us to obtain more

than one rule per objective. For this:

† We use Occam’s razor criterion as a heuristic for

ramification (simple explanations are preferable to

more complex explanations) quantifying this criterion

through the use of the concept of entropy. In this way,

rules of minimum length are created which exclude

irrelevant knowledge, since irrelevant descriptors will

not be taken into account.

† We obtain a Knowledge Base, the content of which is

more complete than that of the dichotomic keys, since

it contains all the consistent rules which may be

obtained according to the selected descriptors in order

to determine the objectives.

The rules provide a structuring of the knowledge which

the user can understand and which is similar to the

dichotomic keys used by expert botanists. When the System

presents its conclusions in the form of rules, the user

understands the reasoning followed by GREEN perfectly

and the user becomes familiar with the reasoning process

followed by the human experts who have contributed their

knowledge to the System (learning).

2.5. Treatment of uncertainty

Information about the domain is based on what normally

happens, but every rule has its exceptions. As it is usual for

some data not to be known with absolute certainty and since

expert knowledge is not always defined with complete

certainty, errors of measurement may be committed. But

this does not mean that the information that we have should

be rejected as not only are experts able to work with

uncertainty but good results can also be obtained regardless.

Given this large amount of sources of uncertainty,

GREEN incorporates a module to deal with uncertainty.

Uncertainty is modeled using certainty factors (Shortlife &

Buchanan, 1975) since it is a simple computational model

which allows experts to estimate confidence in each

hypothesis and in the conclusion, facilitating the expression

of subjective certainty estimations. This model also enables

knowledge to be represented easily in the form of rules and

has successfully been used in many other systems.

2.6. Consistency reinforcer

During the development of the Knowledge Base,

inconsistencies may arise mainly due to errors during the

knowledge acquisition and elicitation stage or during the

design or implementation of the technique for automatically

obtaining the rules.

Another important note is that GREEN is capable of

accommodating uncertainty which is why inconsistencies

about the certainty of results cause an additional impact.

Consequently, this makes it necessary for GREEN to

incorporate a consistency reinforcer which systematically

analyzes each of the rules in the Knowledge Base in order to

be able to detect possible errors (Gonzalez and Dankel

Table 1

Decision table

Arrangement of

the ‘berry’ cones

Color of the

‘berry’ cone

Pruinose

‘berry’ cone

Size of the

‘berry’ cone

No. of seeds in

the ‘berry’ cone

Juniperus communis

subsp. communis

Axillary Bluish-black Yes Between 0.6 and 1 cm 3

Juniperus communis

subsp. hemisphaerica

Axillary Bluish-black Yes Between 0.6 and 1 cm 3

Juniperus communis

subsp. alpina

Axillary Bluish-black Yes Between 0.6 and 1 cm 3

Juniperus oxycedrus

subsp. oxycedrus

Axillary Brown No Between 0.6 and 1 cm 1 – 3

Juniperus oxycedrus

subsp. badia

Axillary Brown No More than 1 cm 1 – 3

(· · ·) (· · ·) (· · ·) (· · ·) (· · ·) (· · ·)

W.F. Contreras et al. / Expert Systems with Applications 25 (2003) 425–430 427


1993) which have been introduced during the design process

thereby guaranteeing that the Knowledge Base has been

correctly designed and implemented.

2.7. System reasoning

The Inference Engine provides the control mechanism

and knowledge inference (a process used in an expert

System in order to derive new information from information

known). It combines the input facts with the knowledge

gathered in the Knowledge Base thereby responding to user

queries. In order to design the Inference Motor, Ignizio’s

BASELINE with forward chaining has been taken as a

model (Ignizio, 1991).

The Inference Engine incorporated into the System is

quite a different module from the Knowledge Base. This

differentiation is important since:

1. Knowledge may be represented more naturally. The

knowledge model together with the inference process

reflects the problem-solving mechanism followed by

a human being better than a model which incrusts

knowledge within the inference process.

2. The System designers can focus on capturing and

organizing the knowledge common to the problem

domain independently of its implementation.

3. It enables the content of Knowledge Base to be changed

without the need to change the control System so that a)

the Knowledge Base may be updated without changing

the Inference Engine b) a single Inference Engine may be

used to solve different problems.

2.8. Other characteristics

As we have already mentioned, GREEN is extremely

easy to use (see Fig. 1). The specimen descriptors are

grouped into general categories (general appearance, leaf,

branch, cone, etc.) with names which are familiar to all

users. Within each category, users select the descriptor they

know and enter a value for the degree of belief.

The System has been provided with two methods for

entering the query: basic and advanced. In the basic mode,

the user has a set of options, so that the use of certainty

Fig. 1. A screen shot for the user interface for introduction of data. Author: Eva Lucrecia Gibaja Galindo.

W.F. Contreras et al. / Expert Systems with Applications 25 (2003) 425–430428


Fig. 2. A screen shot for the user interface for identification results. Author: Eva Lucrecia Gibaja Galindo.

Fig. 3. A screen shot for the user interface for additional information. Author: Eva Lucrecia Gibaja Galindo.

W.F. Contreras et al. / Expert Systems with Applications 25 (2003) 425–430 429


factors is clear. In the advanced mode, the user must

manually enter the certainty value of the observation.

After entering the data, the inference process is executed

and the System gives the user a set of results ordered

according to how well they fit the query and an outline of the

reasoning followed in order to reach these conclusions. If

the user wishes, it is possible to increase the information

about the specimen by accessing the multimedia database.

GREEN is specifically designed to work on Internet which

is why interaction with the user is carried out using forms

which send the data and the queries to a remote server. The

entire transfer of information online has been minimized so

as not to overload the server and in order to obtain a

satisfactory System response time for the user.

GREEN has been designed independently of the type of

botanical database on which it is employed, so that it may

be easily adapted in order to classify species other than

Gymnosperms. Figs. 1 – 4.

3. Conclusions

1. Computing offers new advantages to the popularization

of Botany, including the production of automatic keys or

computer-generated keys, which will make it possible for

non-experts to identify plants.

2. In this paper, an expert System is presented which will

offer the user a new ‘interactive’ species identification

method.

3. The GREEN System is a practical tool which may be used

online and which will enable different taxa comprising

the Iberian Gymnosperm flora to be recognized.

References

Blanca, G., & Morales, C. (1991). Flora del Parque Natural de la Sierra de

Baza. Granada: Servicio de Publicaciones de la Universidad de

Granada.

Castroviejo, S., Laı́nz, M., López González, G., Montserrat, P., Muñoz

Garmendia, F., Paiva, J., & Villar, L. (1986). Flora Ibérica.

Plantas vasculares de la Penı́nsula Ibérica e Islas Baleares (Vol. 1).

Lycopodiaceae-Papaveraceae, Madrid: Real Jardı́n Botánico.

Durkin, J. (1994). Expert systems. Design and development. London:

Prentice Hall International.

Font Quer, P. (1979). Diccionario de Botánica. Barcelona: Labor.

Garcı́a Rollán, M. (1983). Claves de la flora de España (Vol. I). Penı́nsula y

Baleares, Madrid: Mundi-Prensa.

Gonzalez, A. J., & Dankel, D. D. (1993). The Engineering of knowledge-

based systems. Theory and practice. Englewood Cliffs, NJ: Prentice-

Hall International.

Ignizio, J. P. (1991). Introduction to expert systems. The development and

implementation of rule-based expert systems. New York: McGraw-Hill.

Krüssmann, G. (1972). Manual of cultivated conifers. Portland: Timber

Press.

López González, G. (1982). La Guı́a de Incafo de los árboles y arbustos de

la Penı́nsula Ibérica. Madrid: INCAFO.

Luger, G. F., & Stubblefield, W. A. (1993). Artificial intelligence. Structures

and strategies for complex problem solving. The Benjamin/Cummings

series in artificial intelligence, Redwood City: Benjamin/Cummings.

Shortlife, E., & Buchanan, B. G. (1975). A model of inexact reasoning in

medicine. Mathematical Biosciences, 23, 351 – 379.

Fig. 4. Other screen shot for the user interface for additional information. Author: Eva Lucrecia Gibaja Galindo.

W.F. Contreras et al. / Expert Systems with Applications 25 (2003) 425–430430


	An application of expert systems to botanical taxonomy
	Introduction
	Material and methods
	System structure
	Knowledge gathered by the system
	Knowledge acquisition and elicitation
	Obtaining the Knowledge Base
	Treatment of uncertainty
	Consistency reinforcer
	System reasoning
	Other characteristics

	Conclusions
	References