A knowledge-based multi-layered image annotation system Expert Systems With Applications 42 (2015) 9539–9553 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa A knowledge-based multi-layered image annotation system Marina Ivasic-Kos a,∗, Ivo Ipsic a, Slobodan Ribaric b a Department of Informatics, University of Rijeka, Rijeka, Croatia b Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia a r t i c l e i n f o Keywords: Image annotation Multi-layered image annotation Knowledge representation Fuzzy Petri Net Fuzzy inference engine a b s t r a c t Major challenge in automatic image annotation is bridging the semantic gap between the computable low- level image features and the human-like interpretation of images. The interpretation includes concepts on different levels of abstraction that cannot be simply mapped to features but require additional reasoning with general and domain-specific knowledge. The problem is even more complex since knowledge in con- text of image interpretation is often incomplete, imprecise, uncertain and ambiguous in nature. Thus, in this paper we propose a fuzzy-knowledge based intelligent system for image annotation, which is able to deal with uncertain and ambiguous knowledge and can annotate images with concepts on different levels of ab- straction that is more human-like. The main contributions are associated with an original approach of using a fuzzy knowledge-representation scheme based on the Fuzzy Petri Net (KRFPN) formalism. The acquisition of knowledge is facilitated in a way that besides the general knowledge provided by the expert, the computable facts and rules about the concepts, as well as their reliability, are produced automatically from data. The rea- soning capability of the fuzzy inference engine of the KRFPN is used in a novel way for inconsistency checking of the classified image segments, automatic scene recognition, and the inference of generalized and derived classes. The results of image interpretation of Corel images belonging to the domain of outdoor scenes achieved by the proposed system outperform the published results obtained on the same image base in terms of average pre- cision and recall. Owing to the fuzzy-knowledge representation scheme, the obtained image interpretation is enriched with new, more general and abstract concepts that are close to concepts people use to interpret these images. © 2015 Elsevier Ltd. All rights reserved. 1 p d i m v p a c w o w t s a a t n F i c c f i J t t f t h 0 . Introduction Digital images have become unavoidable in the professional and rivate lives of modern people. In recent years, the frequent use of igital images has become necessary in different fields like medicine, nsurance and security systems, geo-informatics, advertising, com- erce, as well as in other business areas. On the other hand, in pri- ate life, digital images are used for documenting people close to us, ets, sights and events such as birthdays, parties, trips, excursions nd sporting activities. This widespread use has caused a rapid in- rease in the number of digital images that, today, on specialized ebsites, can be counted in the millions. However, a large number f images leads to problems with searching and retrieval, as well as ith organizing and storing. As the majority of images are barely documented, it is believed hat we could retrieve and arrange images simply if they were ∗ Corresponding author. Tel.: +38551584710. E-mail addresses: marinai@uniri.hr (M. Ivasic-Kos), ivoi@uniri.hr (I. Ipsic), lobodan@zemris.fer.hr (S. Ribaric). h s n i l ttp://dx.doi.org/10.1016/j.eswa.2015.07.068 957-4174/© 2015 Elsevier Ltd. All rights reserved. utomatically annotated and described with words that are used in n intuitive image search. However, the task of mapping image fea- ures that can be extracted from raw image data to words that users ormally use for articulating their requirements is not a trivial one. or example, it seems natural to use a destination name when retriev- ng holiday images or some terms that describe a scene, such as the oast, mountains or activities like diving, skiing, etc. A major research hallenge is bridging the semantic gap between the low-level image eatures available to a computer and the interpretation of the images n the way that humans do (Smeulders, Worring, Santini, Gupta, & ain, 2000). In addition, one should take into account that image in- erpretation inherent to humans includes concepts associated with he content of the image on different levels of abstraction. This is re- erred to as the multi-layered interpretation of image content. To sys- ematically describe visual content of an image and its semantics, we ave defined a knowledge-based image representation model con- isting of multiple layers of image representation. Layers are orga- ized according to the amount of knowledge needed to automatically nterpret the image using inference about concepts belonging to the ayer. http://dx.doi.org/10.1016/j.eswa.2015.07.068 http://www.ScienceDirect.com http://www.elsevier.com/locate/eswa http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2015.07.068&domain=pdf mailto:marinai@uniri.hr mailto:ivoi@uniri.hr mailto:slobodan@zemris.fer.hr http://dx.doi.org/10.1016/j.eswa.2015.07.068 9540 M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 a & i t a o i t a e L b t s a i t T i b l t u t k l p b l B g t t L b a a 2 i F l s o T ( t b f i & v i l o c o A s a t a According to the defined image representation model, an intel- ligent system for multi-layered image annotation is proposed. The first layer of the image interpretation contains concepts obtained by the classification of image segments using conventional supervised classification method. Higher levels of image interpretation involve concepts that are more abstract. These concepts are difficult to in- fer directly based on low-level features and without knowledge rel- evant to the problem domain. Therefore, we have defined the fuzzy knowledge-representation schemes based on fuzzy Petri net (KRFPN) formalism to represent knowledge about concepts that can appear in an image. Fuzzy Petri nets combine fuzzy set theory and Petri net the- ory to provide the representation of knowledge, which is in context of image interpretation often incomplete, imprecise, uncertain and ambiguous in nature. The KRFPN formalism is originally supported with a fuzzy infer- ence engine that deals with approximate reasoning. The reasoning capability of the inference engine was used in an original way to draw conclusions about classes of image scenes and more abstract classes. The system can handle the ambiguity and uncertainty about concepts and relations, so decisions about more abstract concepts can be made even when input information about the concepts present in an image are imprecise and vague. To reduce the propagation of errors through the hierarchical structure of concepts and to increase the reliability of conclusions, as well as to improve the precision of image annotation, a consistency-checking procedure is proposed. The acquisition of knowledge used by inference engine is facil- itated in a way that all the facts and rules of the composition and distribution of concepts as well as their reliability are produced auto- matically from data. Both new relationships and new concepts with appropriate measure of reliability are stored into the knowledge base and used by the inference engine. The paper is organized as follows: First, in Section 2, different approaches to image-content interpretation are explained and a de- tailed overview of related work is given. The layers of the multi- layered image representation with respect to the amount of knowl- edge needed for the image interpretation are given in Section 3. A system for the multi-layered image annotation is proposed in Section 4. A fuzzy-knowledge representation scheme adapted for the outdoor image domain is presented in Section 5. Inputs to the scheme are concepts obtained as the results of an image-segments classifi- cation using a Bayesian classifier. The application of the fuzzy infer- ence engine for checking the consistency of the obtained results of the image segment classification and the recognition of scene con- text is given in Sections 6 and 7, respectively. The fuzzy inference al- gorithm used to derive more abstract concepts associated with the image is described in Section 8. The experimental results of the im- age interpretation at the layer that corresponds to automatic image annotation are given and compared to previously reported methods in Section 9. Additionally, in Section 9, an improvement to the results of the automatic image annotation after checking the inconsistency of the concepts obtained during the image-segments classification is presented and discussed. 2. Related work Image interpretation is a complex task that strongly depends on purpose of annotation. Moreover, human interpretation is limited by the knowledge, culture, experience and point of view of the person. Therefore, in the development of the automatic image annotation system, types of concepts that would be used for image interpretation should be decided first, depending on the purpose of the annotation. Among the oldest models for image annotation is Shatford’s image-content classification of general-purpose images drawing on theory from art history that classifies image content into general, specific and abstract concepts (Shatford, 1986). Additionally, the con- tents of an image are associated with aspects of objects, with spatial nd temporal aspects and aspects of activities or events. In (Eakins Graham, 2000), a multilayer interpretation of the image content s considered in the context of image search. The authors defined hree semantic layers of image interpretation. At the first level, im- ge interpretation is based on the presence of certain combinations f features, such as color, texture or shape, while at the second level, mage interpretation deals with the presence and distribution of cer- ain types of objects. At the third level, image interpretation includes description of specific types of events or activities, locations and motions that one can associate with the image. The authors (Hare, ewis, Enser, & Sandom, 2006) provide a simplified hierarchical view etween the two extremes, the image itself and its full semantic in- erpretation. At the lowest level are the image and its “raw” data. The econd level consists of low-level features related to a part of an im- ge or to the whole image. A combination of prototype feature vectors s part of the third level. If these image parts can be associated with he corresponding objects, then this would make the fourth level. he top level of image interpretation, referred to as full semantics, ncludes concepts that describe the events, actions, emotions and a roader context of the image. This model, particularly in layers re- ated to visual image content, mostly influenced the image represen- ation model that we propose. The main difference is in higher layers sed to model the image semantics. There are two major approaches widely used for image anno- ation, one using statistical methods and the other mostly using nowledge-based methods belonging to the field of artificial intel- igence. Both approaches are used in our systems: the statistical ap- roach in the first layer of the image interpretation and knowledge- ased approach in the higher layers. In the statistical approach, most methods can be grouped as trans- ation or classification models. In the translation model of (Duygulu, arnard, de Freitas, & Forsyth, 2002) the co-occurrence of image re- ions and annotation words are used to model the relationship be- ween annotation words and images or image regions. In classifica- ion methods, such as (Barnard et al., 2003, Li & Wang, 2003, Hu and am, 2013), words used for image annotation correspond to class la- els for which classifiers are trained. Due to the intra-class variability nd inter-class similarity, usually class labels correspond to objects in n image, but can correspond to scenes as well. In (Fei-Fei & Perona, 005) natural scenes were learned by a Bayesian hierarchical model n unsupervised way from local image regions. In (Yin, Jiao, Chai, & ang, 2015) discriminant scene features were learned using single- ayer sparse autoencoder (SAE) and then SVM classifier is used for cene classification. Some methods use multi-label learning for solving the problem f annotating images with more than one word (Feng & Xu, 2010). o improve the accuracy of multi-label classification algorithm, in Yu, Pedrycz, & Miao, 2014) correlation among the labels and uncer- ainty of classification between feature space and label space have een considered and in (Hong et al., 2014) selection of discriminative eatures has been proposed. Lately, deep neural networks are exam- ned for the task of multi-label image annotation. In (Chengjian, Zhu, Shi, 2015) multimodal deep neural network pre-trained with con- olutional neural networks is proposed. Such statistical methods commonly use quite simple vocabular- es that can be large but are generally not structured because no re- ations are defined between the concepts in the vocabulary. On the ther hand, methods that rely on knowledge bases used sophisti- ated, structured vocabularies in which geometrical, hierarchical or ther relations between concepts are established (Tousch, Herbin, & udibert, 2012). We have defined a vocabulary of this kind that is uitable for image retrieval to be used in our system. A few approaches have explored the dependence of words on im- ge regions (Blei and Jordan, 2003) or exploit the ontological rela- ionships between annotation words, demonstrating their effect on utomatic image annotation and retrieval (Maillot, 2005). M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 9541 Objects sand, sea, sky plane, sky, trees, building snow, polar bear Scenes Coast Scene Plane Scene Polar bear More general (abstract) concepts Natural scene, Outdoor Vehicle Man-made object, Outdoor Wildlife, Mammal, Outdoor, Natural scene Derived concepts Beach, SeaShore, Tallinn, Estonia Meeting Transportation Arctic Fig. 1. Examples of images and their annotation at different levels of abstraction. a & m T n v t t p f t e c e k c p i T s a l v f o a i ( r i l e 2 b 2 m k m i m t t f t m c t a i t k s a g w n 3 a t a d a p h b t i j r s g d n p F t c t T w T i l a A comprehensive survey of research made in the field of statistical utomatic image annotation methods can be found in (Liu, Zhang, Lu, Ma, 2007; Datta, Joshi, & Li, 2008; Zhang, Islam, & Lu, 2012). For a multi-layered image annotation, several approaches that use odels for knowledge representation and reasoning were proposed. he authors (Benitez, Smith, & Chang, 2000) described a semantic etwork to represent the semantics of multimedia content (images, ideo, audio, graphics and text). The basic components of the seman- ic network are concepts that correspond to real-world objects and he relations among them, such as generalization, aggregation and erceptual relationships based on the similarities of their low-level eatures. The authors (Marques & Barman, 2003) propose the model with hree levels. The lowest level contains vectors of low-level features xtracted from images. The feature vectors are classified into the con- epts from flat vocabulary using Bayesian networks. On the high- st level is the RDF ontology that contains knowledge about the eywords and information about the relations between different oncepts. The authors (Srikanth, Varner, Bowden, & Moldovan, 2005) pro- osed using a hierarchical dependency between annotation words to mprove translation-based automatic image annotation and retrieval. he hierarchy is derived from the text ontology WordNet and repre- ents the various levels of generality of the concepts expressed in im- ge regions and words. To predict the likelihood of assigning a class abel given an image, statistical language models defined on a visual ocabulary of blobs, represented by region feature vectors, are used. In (Ivasic-Kos, Ribarić, & Ipsic, 2010) an image content analysis ramework based on Fuzzy Petri Net is proposed for classification f image segments into objects. Also, a formal description of hier- rchical and spatial relationships among concepts from the outdoor mage domain is described. Fuzzy formalism was also applied in Nezamabadi-pour & Kabir, 2009) where fuzzy k-NN classifier with elevance feedback was used to assign semantic labels to database mages. In (Athanasiadis et al., 2009, Simou, Athanasiadis, Stoilos, & Kol- ias, 2008) an ontology and the inference engine FIRE (Fuzzy Infer- nce Reasoning Engine) (Stoilos, Stamou, Tzouvaras, Pan, & Horrocks, 005) were used for analyzing the image content belonging to the each domain. Later, the same group of authors (Papadopoulos et al., 011) compared different approaches attempting to use spatial infor- ation for semantic image analysis. Unlike the above described approaches, we propose a model of a nowledge-based multi-layered image annotation system. We have erged the statistical approach for classification of image segments nto objects and knowledge-based approach to infer concepts that are ore abstract. We took advantages of statistical methods to facilitate he knowledge acquisition, so that computable facts and rules about he concepts as well as their reliability are automatically generated rom data. The key components of the proposed multi-layered image anno- ation system are the KRFPN scheme based on Fuzzy Petri Net for- alism and integrated fuzzy inference engine. We have exploited the apability of the KRFPN inference engine for reasoning with uncer- ainty to infer scenes and concepts that cannot be mapped to im- ges without using the domain knowledge. In addition, to refine the mage annotation and to reduce the propagation of errors through he hierarchical structure of concepts we have included the novel, nowledge-based, consistency-checking procedure into the proposed ystem. For image annotation refinement, the correlation between nnotated keywords has been used in previous research, and lately raph-based algorithms for image analysis have been investigated as ell. A comprehensive survey on image annotation refinement tech- iques is given in (Dong, 2014). . Multi-layered image representation An image representation includes the visual content and the nnotation of an image. The visual content of an image refers to he information that may be collected by analyzing low-level im- ge features while the image annotation includes concepts that may escribe both the content and the context of an image. The task of utomatic image annotation is challenging because the number of ossible concepts that one can use to describe most images is large, ighly dependent on application, user’s knowledge, needs, cultural ackground, etc. and it is hard to choose the right type of concepts hat would be universally appropriate. For instance, to annotate the mages in the Fig. 1, one can use concepts that are related to the ob- ects that appear in the image (sand, sea, sky, snow), concepts that rep- esent the scene (beach, coast, coastline, shore, seashore), more general cene concepts (wildlife, outdoor, natural scene) or activities (walking, et wet feet). If the user is familiar with the context of an image, its escription will be more subjective and will probably include the ame of a place (e.g. Tallinn, Estonia for Fig. 1a), names of the peo- le appearing in it, description of the relevant event (e.g. Meeting for ig. 1a) or evoked emotions, etc. Although different people will most likely use different concepts o annotate the same image, used concepts can be organized ac- ording to the amount of knowledge needed to reach each abstrac- ion level of image interpretation (Ivasic-Kos, Pavlic, & Pobar, 2009). herefore, we propose a multi-layered image representation model in hich layers correspond to concepts at different levels of abstraction. he layers reflect the increase of the amount of knowledge included n the automatic image annotation (Fig. 2) from the lower to higher ayers, where the lower layers (V1 - V2) represent the visual content, nd the layers MI − MI represent the image semantics. 1 4 9542 M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 Fig. 2. Layers of image representation in relation to the knowledge level. t b m l c t r A i m v 1 i m a i v c t P t d s w o t m s f a t r m p r f r i The initial layer of an image representation is the layer V0, and it represents the raw image. The image is usually segmented (layer V1) for analysis, and the low-level features are extracted from the image segments (layer V2). The amount of knowledge required for segmen- tation (layer V1) and feature extraction (layer V2) is low. It is assumed that a multi-layered image annotation includes concepts ranging from elementary classes EC (layer MI1) in which image segments are classified, scene classes SC (layer MI2) that describe the scene, ending with generalized classes GC (layer MI3) and derived classes DC (layer MI4). For instance, the proposed multi-layered image annotation re- lated to Fig. 1c is EC = {snow, polar bear}; SC = {Scene-Polarbear}; GC = {Wildlife, Mammal, Outdoor, Natural scene}; DC = {Arctic}. Elementary classes are obtained as results of image-segments classification and are used as flat vocabulary for automatic image annotation. It is assumed that instances of elementary classes corre- spond to objects in the real world. Spatial relations, spatial locations and co-occurrence relations can be defined for elementary classes, like EC1 is-above EC2, or EC1 is-on-top, EC1 occurs-with EC3. Scene classes are used to represent the context or semantics of the whole image, according to common sense and expert knowledge. A part-of relation or, its inverse, relation consists-of, can be defined between an elementary class and a scene class, e.g. EC1 is-part-of SC2 or SC2 consists-of EC1. Generalized classes are defined as a generalization of scene classes. The is-a relation can be defined between a scene class and a generalized class, e.g. SC2 is-a GC1. There can be multiple levels of generalization so the relation is-a can be defined between gener- alized classes too, e.g. GC1 is-a GC3 is-a GC5. Derived classes include abstract concepts, activities, events or emotions that can be associ- ated with an image. Different types of relations, such as associate-to or is-synonym-of relation can be defined between derived classes and generalized or scene classes. 4. A multi-layered image annotaton system The architecture of our intelligent multi-layered image annota- tion system (MIAS) is depicted in Fig. 3. The system deals with all the layers of image representation given in Fig. 2, ranging from the segmented image at layer V1 to the multilayer image interpreta- tion at layer MI4. The input to the system is an image belonging to the V0 layer of the image representation and the system output is a multi-layered interpretation of the image that consists of concepts obtained from four layers of image interpretation, i.e., layers MI1, MI2, MI3 and MI4. A raw image I at layer V0 is first segmented with a normalized-cuts algorithm (Shi & Malik, 2000). The segmented image corresponds to he V1 layer of the image representation. Formally, the relationship etween the raw image I and the image segments si, i = 1, . . . . . , m ay be written as V1(I) = {s1, s2, . . . , sm}. From each image segment, ow-level features are extracted (such as size, position, height, width, olour, shape, etc.) which should represent the geometric and pho- ometric properties of a segment. Each image segment is then rep- esented by the k-component feature vector x = (x1, x2, . . . , xk)T . ccordingly, an image at the V2 layer of the image representation s described with as many feature vectors as there are image seg- ents. Thus, the relationship between the raw image I and the feature ectors xi, i = 1, . . . . . , m obtained from the image segment si, i = , . . . . . , m is given as V2(I) = {x1, x2, . . . , xm}. Each image segment is then classified using the Bayes classifier nto one of the elementary classes ECi ∈ EC according to the maxi- um posterior probability (cMAP). The Bayes classifier was trained on training set of image segments annotated with labels correspond- ng to natural and artificial objects. For each occurrence of the feature ector x, a classification is based on the Bayes theorem: MAP = argmax ECi ∈EC P(x|ECi)P(ECi) P(x) . (1) The conditional probability P(x|ECi) of a feature vector x for he given elementary classes ECi ∈ EC and the prior probability (ECi), ∀ECi ∈ EC are estimated according to data in a training set. It is aken into account that the evidence factor P(x) is a scale factor that oes not influence the classification results. The result of the image-segments classification is m annotated egments of the image I in such a manner that each one is annotated ith one of the elementary classes. The union of elementary classes, btained by the classification of the image segments, forms an au- omatic image interpretation at layer MI1, often referred to as auto- atic image annotation. The classes or elements of the interpretation et MI1(I)⊆EC are also called labels, annotation words, or keywords. A knowledge-representation scheme based on the Fuzzy Petri Net ormalism (Ribarić & Pavešić, 2009) and the fuzzy inference engine re the key components of the proposed multi-layered image annota- ion system MIAS. Defined fuzzy knowledge-representation scheme epresents the knowledge about concepts and relations in image do- ain, which is often uncertain, imprecise, and ambiguous. The fuzzy knowledge base contains the following main com- onents: fuzzy relationships between elementary classes, fuzzy elationships between elementary classes and scene classes and uzzy relationships between scene classes and generalized or de- ived classes. The fuzzy relationships are defined using the train- ng set and expert knowledge. One of the components of the M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 9543 Fig. 3. Architecture of a multi-layered image-interpretation system (MIAS). s p p T c r t c c d a c s l fi t l n M e 5 G l m a i T a I e g a t 5 k k a k f F K l 5 i k i a K w ( ystem MIAS is an inference engine (IE) used for image inter- retation on the layers MI1 − MI4. The inference engine sup- orts the fuzzy inheritance and fuzzy recognition procedures. he fuzzy inheritance is used for inconsistency checking and for lass generalization and the fuzzy recognition is applied for scene ecognition. The facts in the fuzzy knowledge base, particularly those related o relationships among elementary classes, are used to check the onsistency of the set MI1(I). An elementary class for which it is oncluded that it does not belong to a likely context, obtained e.g. ue to inaccurate segmentation, can be discarded or replaced with nother elementary class that has similar properties and fits the ontext. The elementary classes of an image that have passed incon- istency checking are the inputs into the MI2 image-interpretation ayer for scene recognition. Each scene in the knowledge base is de- ned based on a training set as an aggregation of typical elemen- ary classes. Thus, it is possible to conclude which scene is the most ikely one from the elementary classes from the set MI1(I). The recog- ised scene class makes the image interpretation at the layer MI2, I2(I)⊆SC. Based on the scene class from the set MI2(I), more abstract gen- ralized classes are inferred by the inference engine (see Section .1) using generalized relationships from the fuzzy knowledge base. eneralized relationships as well as heuristics about this particu- ar domain are explicitly specified by a human expert. Once deter- ined, the generalized classes can be further generalized to a more bstract generalized class. Inferred generalized classes form image nterpretation at the layer MI3, so for a given image I, MI3(I)⊆GC. he analogous inference procedure can be applied on generalized nd scene classes to obtain derived classes related to a given image , MI4(I)⊆DC. The outputs from the proposed system are classes at different lev- ls of abstraction that include elementary classes, scene classes and eneralized classes as well as derived classes. The defined KRFPN scheme can be independently used, modified nd connected with other KRFPN schemes in a hierarchical structure o expand the knowledge base with new concepts. . A knowledge-representation scheme To model objects and their relationships in an image, some nowledge-representation formalism has to be used and domain nowledge needs to be included. However, considering that im- ge segmentation is often imprecise and subject to errors, and that nowledge about the concept is often incomplete, an ability to per- orm conclusions from imprecise, fuzzy knowledge is necessary. or this purpose, a knowledge-representation scheme based on the RFPN formalism (Ribarić & Pavešić, 2009) is defined for a multi- ayered annotation of images. .1. Definition of the KRFPN scheme for the multi-layered mage annotation We have defined the KRFPN scheme to present elements of the nowledge base used for inferring concepts on higher layers of image nterpretation. The KRFPN scheme for multi-layered image annotation is defined s 13-tuple: RF PN = (P, T, I, O, M, �, μ, f, c, α, β, λ, Con), (2) here the first ten components are of the marked Fuzzy Petri Net FPN) (Li & Lara-Rosano, 2000): 9544 M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 pi pktj d1 d2r1 c(m1)>0 f(tj)>0 m1 Fig. 4. A generic form of a chunk of knowledge in the Fuzzy Petri Net formalism. pi pktj d1 d2r1 c(m2) = c(m1)* f(tj)>0f(tj)>0 m2 Fig. 5. A new token value is obtained in the output place after firing. f c s a m t t T e e p e a O n ( i s t a e a c m 5 o t v s l o s t s t t p P = {p1, p2, . . . , pn}, n ∈ N is a set of places; a function α: P → D maps a place from a set P to a concept from a set D used for multi- layered image annotation. It is set that D = EC ∪ SC ∪ GC ∪ DC where the subset EC includes 28 elementary classes such as {Airplane, Train, Shuttle, Ground, Cloud, Sky, Coral, Dolphin, Bird, Lion, Mountain, etc.}, the subset SC includes 20 scene classes such as {Seaside, Inland, Sea, Space, Airplane Scene, Train Scene, Tigre Scene, Lion Scene, etc.}, the sub- set GC includes generalized classes such as {Outdoor Scenes, Natural Scenes, Man-made Objects, Landscape, Vehicles, Wildlife, etc.}, and sub- set DC includes {Savannah, Africa, Safari, Vacation, etc.}. T = {t1, t2, . . . , tm}, m ∈ N is a set of transitions; a function β: T → � maps a transition from a set T to a relationship from a set � defined according to expert knowledge; a set � includes a relation- ship occurs_with between elementary classes that models the com- mon occurrence of elementary classes in the image and its negation not_occurs_with, then the aggregation relationship consists_of defined between a scene class that has a role of aggregation and elementary classes that have the role of components of aggregation, then a gen- eralization relationship is_a that is defined either between a scene class and generalized class or between generalized classes or derived classes and in addition a is_synonim_of relation defined between syn- onyms of concepts. For a relationship consists_of an inverse relation- ship – (consists_of) = is_part_of is defined. I: T → P∞ is an input function, while O: T → P∞ is an output func- tion for a transition. In our scheme, the co-domain of input and out- put functions is a set P instead of a bag P∞ as defined in (Peterson, 1981). M = {m1, m2, . . . , mr}, 1 ≤ r < ∞ is a set of tokens used by the inference engine. The inference procedure is based on the dynamic properties of the Petri Net, i.e. by firing of the transitions (Peterson, 1981). The tokens’ distribution within places is given as �(p) ∈ P(M), where P(M) is a power set of M. The initial distribution of tokens defines the initial marking vector μ0 = (μ1, μ2, . . . , μn) and μi = μ(pi) ∈ {0, 1}, i.e. in the initial marking a place can have no or at most one token. In case of scene recognition, μ0 corresponds to elementary classes obtained at the layer MI1. c: M → [0, 1] is an association function that gives a token value that corresponds to the degree of truth of the concept mapped to a place marked with that token. The value of a token in an initial distri- bution can be set to the estimated posteriori probability of a concept that is associated with that marked place or set to 1. f: T → [0, 1] is an association function that gives a transition value that corresponds to the degree of truth of a relationship mapped to a transition. The measure of truthfulness of the relationship depends on the relationship kind and is computed using data in the training set in case of pseudo-spatial and spatial relationships based on co- occurrence of elementary classes in images. Also the function f can be defined by an expert in case of more abstract classes (SC, GC and DC). λ ∈ [0, 1] is a threshold value related to transitions firing. If the threshold value λ is set, the truth value c(m1) of each token must ex- ceed the value of λ if the transition is to be enabled. Con ⊆ (� × �) is in this scheme defined as a set of pairs of mutually contradictory relations. It is defined on a set of relations occurs_with, not_occurs_with between elementary classes. It can be also defined between concepts if necessary. The KRFPN scheme can be visualized by a directed graph contain- ing two types of nodes: places and transitions. Graphically, the places pi ∈ P are represented by circles and the transitions tj ∈ T by bars. The directed arcs between the places and transitions, and the transitions and places represent the transition input I(tj)⊆P and output O(tj)⊆P functions, respectively (Fig. 4). In a semantic sense, each place from the set P corresponds to a concept di ∈ D and any transition from set T to a relation rk ∈ �. A dot within a place represents a token m1 ∈ M. To a token at the input place pi ∈ I(tj) and the transitiontj ∈ T, the values c(m1) and (tj) are assigned, respectively. The assigned values implement un- ertainty and fuzziness in the scheme and can be expressed by truth cales, where 0 means “not true” and 1 “always true”. Semantically, value c(m1) expresses the degree of uncertainty of a concept di ∈ D apped to a particular place pi ∈ P, and the value f(tj) corresponds to he degree of uncertainty of a relationship ri ∈ � mapped to a transi- ion tj ∈ T. A place that contains one or more tokens is called a marked place. he tokens give dynamic properties to the Petri Net and define its ex- cution by firing an enabled transition. A transition is enabled when very input place of the transition is marked, i.e., if each of the input laces of the transition has at least one token and if each token value xceeds the threshold value λ. An enabled transition tj can be fired. By firing, a token moves from ll its input places pi ∈ I(tj) to the corresponding output places pk ∈ (tj). In Fig. 4, there is only one input place for the transition tj, I(t j) = pi and only one output place O(t j) = pk. After the transition firing, a ew token value c(m2) at the output place is obtained as c(m1)f(tj) Fig. 5). The dynamic properties of the scheme are important for the nference-engine definition. The inference engine on the KRFPN cheme consists of two automated reasoning processes: fuzzy inheri- ance and fuzzy recognition. All the steps of the inference algorithms re given in (Ribarić & Pavešić, 2009). The complexity of both infer- nce algorithms is O(nm), where n is the number of places (concepts) nd m is the number of transitions (relations) in the knowledge base. The algorithms are used in novel and original ways to check the onsistency of classes, for scene recognition and for reasoning on ore abstract classes, as described in detail below. .2. Modeling the truth value of relationships Given that the mapping between concepts and image features is ften unreliable, and due to incomplete knowledge of the concepts, he uncertainty is implemented into the scheme by associating a alue with a transition and with a token in a marked place. A tran- ition value expresses the degree of truth or the reliability of the re- ated relationship, while a token value corresponds to the truth value r the reliability of the concept. The degree of truth of the relation- hips depends on the type of the relationship and is set according o the expert knowledge or it is computed using data in the training et. For example, the degrees of truth of the relationships that model he generalization of classes are determined by the expert, while the ruth value of the relationships consists_o f and occurs_with is com- uted using data in the training set, as explained below. M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 9545 grass p11 consists_of t80 Seaside p45 tree p25 consists_of t85 water p26 consists_of t86 0.40 0.95 cloud p6 consists_of t79 0.40 sky p20 consists_of t84 rock p17 consists_of t82 sand p18 consists_of t830.38 0.85 0.90 0.47 building p4 consists_of t78 Fig. 6. Relations among the scene “Seaside” and its components. Fig. 7. Position of objects sky, water, grass, trees and the spatial relations between the objects in the image. 5 c a e p r t c b m B s M o t c t i E c c w P P r n { E t a d m a 5 a o a P r E o r v w t 5 u i d w f E n o t t a .2.1. Relationship consists_of To define the truth-value of the aggregation relationships onsists_of, here it is assumed, that a scene may contain several char- cteristic elementary classes. Thus, the relation among the scene and lementary classes is an aggregation relationship where the scene lays the role of the aggregation and the elementary classes have the ole of the components of the aggregation. Analyzing the data in the raining set, common occurrence of elementary classes in the scene lass is determined and used for creation of the rules on relationships etween scenes and elementary classes. Instead of choosing an ele- entary class with a maximum posterior probability, the modified ayes rule is used to form a set MS that corresponds to the specific cene class. A set MSSCi for a specific scene class SCi∀i is given by: SSCi = { ECk : arg i P(SCi|ECk ) ≈ arg k P(ECk|SCi ) P(ECk) ≥ ε } . (3) The Eq. (3) mirrors the idea of finding a most representative set f elementary classes for a given scene class. MSSCi is a set of all hose elementary classes ECk, k = 1, 2, . . . that participate in a scene lass SCi with the posterior probability P(SCi|ECk), ∀k ECk exceeding he marginal value ɛ ≥ 0.05. The marginal value is determined exper- mentally. The prior probability P(ECk) for a given elementary class Ck is computed from the training set to bring in the degree of dis- rimination of each elementary class for a given scene class. The truth value attached to the aggregation relationship onsists_of between the elementary classes and the scene class as determined using the Bayes rule for the posterior probability (SCi|ECk ), ∀k ECk ∈ MSSCi for the specific scene: (SCi|ECk ) = P(ECk|SCi )P(SCi)s∑ j=1 P(ECk|SC j )P ( SC j ) , (4) s = |SC| is a number of scene classes. In Fig. 6, a part of a knowledge base is presented, showing the elationships among a particular scene class seaside and its compo- ents that correspond to elementary classes from the set MSseaside = sky, cloud, water, grass, tree, rock, sand, building} defined by the q. (3). The degree of truth f(tj) of the transition tj that corresponds o the relation consists_of between a particular scene class “Seaside” nd its components, is given by P(Seaside|ECk), ECk ∈ MSseaside and is etermined by Eq. (4). For instance, truth value of relation consists_of apped to transition t86 between a scene class “Seaside” of place p45 nd elementary class “water” of place p26 is f (t86) = 0.95. .2.2. Relationship occurs_with To create the rules on relationships between elementary classes nd to define the truth value of the relationship occurs_with, mutual ccurrence of classes ECj and ECi in each image in the training set is nalyzed. This can be formally defined as: (EC j|ECi) = P(EC j ∩E Ci) P(ECi) . (5) If the P(ECj|ECi) is less than the threshold value τ = 0.1 then the elationship not_occurs_with is defined between elementary classes Cj and ECi, i = j with the truth value of 0.9. Otherwise, the truth value f the not_occurs_with relationship is 1 − P(EC j|ECi). The occurs_with elationship is used in proposed inconsistency checking procedure to alidate the results of the image segment classification and to check hether the results obtained on all the image segments are consis- ent. .2.3. Spatial relationships Spatial relationships like at the top, at the bottom, have not been sed in this experiment since the relationships between the objects n the image differed from the natural relations. In images from the omain of natural scenes that we have used, the sky, trees, grass and ater can appear both at the bottom and at the top of the image, so or example, water can appear above the grass and trees, as in Fig. 7e. llipses in the Fig 7a–e show the positions of the segments that are ot in line with the common knowledge about spatial relationships f objects in nature. For example, the grass segment in Fig. 7c is above he tiger segment. If it turns out to be useful, spatial relationships, as well as fuzzy emporal relationships or new concepts can be added to the scheme nd used by inference engine. 9546 M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 Fig. 8. Example of image representations at layers V0 , V1 , MI1 . shuttle p19 not_occurs_with t393 0.75 rock p170.80 water p26not_occurs_with t396 not_occurs_with t394 sand p18 0.75 not_occurs_with t395 tree p25 0.90 not_occurs_with t381 lion p10 ... not_occurs_with t371 coral p7 shuttle p19 not_occurs_with t393 0.75 rock p170.80 water p26not_occurs_with t396 not_occurs_with t394 sand p18 0.75 not_occurs_with t395 tree p25 0.90 not_occurs_with t381 lion p10 ... not_occurs_with t371 coral p7 (a) (b) Fig. 9. A part of KRFPN scheme related to elementary class shuttle and the relationship not_occurs_with (a) before firing and (b) after firing the transitions. c p u c s p m t s ( t f t e p T A n w i n p s h 6. Knowledge-based approach to inconsistency checking It is to be expected that some of the elementary classes obtained using the Bayes classification rule (Eq. 4) do not fit the image con- text. To check for inconsistency of the obtained elementary classes, the inconsistency checking procedure is proposed that uses the facts in the knowledge base related to the occurs_with and not_occurs_with relations. The relations occurs_with and not_occurs_with for each ob- tained elementary class can be analyzed using the fuzzy-inheritance algorithm. Based on the results of fuzzy inheritance, the classes which are elements of domain of relation not_occurs_with are eliminated from the set MI1, the first semantic layer. In order to illustrate the proposed procedure for inconsistency checking, an example follows. Let the image I in Fig. 8 be given for a multi-layered image annotation. After the segmentation, us- ing a normalized-cuts algorithm, the image is segmented into 7 ar- eas: V1(I) = {s1, s2, . . . , s7}. For each image segment the low-level features are extracted and a feature vector is formed, so the im- age is represented at level V2 by the set of feature vectors: V2(I) = {x1, x2, . . . , x7}. Then, using the Bayes classification method, each feature vector is classified into one of the elementary classes ECi EC according to the maximum posterior probability (cMAP, Eq. (1)). For image I in Fig. 8, the obtained result, after the classification of all the image segments, is: “sky, water, water, shuttle, rock, water, sand”. Thus, the set of obtained elementary classes forms an au- tomatic image interpretation at the layer MI1 of the image I, as MI (I) = {sky, water, shuttle, rock, sand}. Note that the elementary 1 lass shuttle is a result of misclassification, because a shuttle is not resent in the image. Every obtained elementary class can be checked for inconsistency sing the not_occurs_with relationships defined between elementary lasses in MI1(I) and the fuzzy-inheritance algorithm. For instance, to check the inconsistency of the elementary class huttle, the fuzzy-inheritance algorithm is used as follows. The ap- ropriate place in the knowledge-representation scheme is deter- ined by the function α−1(shuttle) = p19, shuttle ∈ EC (Fig. 9). On he Fig. 9, presented are those not_occurs_with relations for which huttle is the input place. The initial token distribution is �0 = ∅, ∅, . . . ., {p19, 1}, . . . , ∅), i.e. the initial token is placed only on he place p19. For inconsistency checking only relations with outputs rom the set MI1(I) are useful and are shown in black. According to he original FPN algorithm all transitions related to these relations are nabled and can be fired because the number of tokens in the input lace (shuttle) is equal to the number of input arcs of the transitions. he transition values are obtained from the training set using Eq. (5). fter firing, the token is removed from the input place (shuttle) and ew tokens are created and distributed to output places (sand, rock, ater, …) as shown in Fig. 9b. The inheritance tree is formed starting from the root node which s for this example π 0(p19, {1.0}). Firing of the transitions creates ew frontier nodes of the inheritance tree that correspond to out- ut places of transitions. This step is repeated until the condition for topping of the algorithm is satisfied or the desired depth of the in- eritance tree is reached. The frontier nodes are converted by the M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 9547 Fig. 10. Inheritance tree for the class “shuttle” (Fig. 9). i t n t o f ( F a t t w p v f m o w c c t F i { 7 t ( t O 2 p i c i d t c T t i f { t c i v o { t ( t α – t d t π s i t p c w r D t n d nheritance tree algorithm into the frozen node (marked F), k- erminal (marked k-T) or identical (marked I), or one of the types of odes defined for the reachability tree (terminal, duplicate and in- erior). The inheritance tree of the KRFPN is similar to the concept f reachability tree of Petri Nets (Chen, Ke, & Chang, 1990), except or the stopping conditions that are integrated in the KRFPN scheme by the set �\{is_a}) or defined by the desired number of tree levels. ig. 10 shows a 1-level inheritance tree on the KRFPN scheme and the ppropriate semantic interpretation of the inheritance paths for he elementary class shuttle. The nodes of the inheritance tree have he form (p j , c(ml )) j = 1, 2, . . . , p, l = 1, 2, . . . , r, 0 ≤ r ≤ |M|, here c(ml) is the value of a token ml in place pj, computed as the roduct of the token value at the input place and the corresponding alue f(tj). The arcs of the inheritance tree are marked with a value (tj) and the label of a transition tj ∈ T, where, for example, t396 = 0.8 eans f(t396) = 0.8. For each of the inheritance paths the measure f truth is determined by the token value in a leaf node (the node in hich the algorithm stops). The obtained inheritance tree for the concept shuttle gives the onclusion that the class shuttle does not occur with the elementary lasses from the set MI1(I), so it can be concluded that the class shut- le most likely does not match the context of the image depicted in ig. 8 and therefore can be discarded. Accordingly, after checking for the inconsistency, the refined mage interpretation at the semantic layer MI1 is: MI1(I) = sky, rock, sand, water}. . Scene recognition For the task of scene recognition for a new, unknown image, he fuzzy-recognition algorithm based on the inverse KRFPN scheme marked as –KRFPN) is originally used. The –KRFPN scheme is ob- ained by interchanging the position of the input I and the output grass p11 -(consists_of) t80 S -(consist t85 water p26 -(consists_of) t86 0.40 0.95 cloud p6 -(consists_of) t79 0.40 sky p20 -(consists_of) t84 0.90 SceneAirplane p29 -(consists_of) t68 0.95 -(con sists_ of) t69 0.95 0.5 0.6 Fig. 11. A small part of the –KRFPN scheme for the s functions for the transition T in the 13-tuple (Ribarić & Pavešić, 009). Additionally, by changing the position of the input and out- ut functions, the relation mapped to the transition is transformed nto its corresponding inverse relation. For example, for the relation onsists_of in the KRFPN scheme its inverse relation is_part_of is used n the –KRFPN scheme, i.e., –(consists_of) = is_part_of. Also, the co- omain of the associated function c: M → [0, 1] that assigns values o the tokens (see 5.1) is expanded by cr : M → [−1, 1] so that in the ase of an exception, a token may be associated with a negative value. The proposed procedure for the scene recognition is as follows. he results of the image interpretation at layer MI1, after inconsis- ency checking, are the input to the scheme used for further image nterpretation at the layer MI2. The obtained elementary classes ECi rom MI1(I) are treated as components of an unknown scene class X. The elementary classes ECi are mapped to the places p1, p2, . . . , pn} using the function α−1 : ECi → pk. If defined, he reliability based on a posterior probability of each elementary lass ECi can be used as the token value cr(ml) in the place pk, whose nterpretations correspond to the given class ECi. Otherwise, a token alue is set to 1. For instance, let us take an image I depicted in Fig. 8. The results f the image interpretation at the layer MI1(I) are elementary classes sky, rock, sand, water} that exist in the knowledge base. Based on he Bayes classification rule (Eq. 1) the degrees of truth are assigned: sky {0.5}, sand ({0.7}), rock ({0.4}), water ({0.6}). By using the func- ion α−1 the initially marked places are determined (α−1(sky) = p20, −1(sand) = p18, α−1(rock) = p17, α−1(water) = p26). A small part of a KRFPN scheme with initially marked places and the corresponding oken value is given in Fig. 11. According to the initially marked places and the corresponding egrees of truth, four root nodesπ0 i, i = 1, . . . , 4 of the recognition rees will be formed: 1 0 (p20, {0.5}), π 20(p18, {0.7}), π 30 (p17, {0.4}), π 40(p26, {0.6}). Fig. 12 shows four corresponding recognition trees in the -KRFPN cheme with enabled transitions, starting from the root node. By fir- ng of the enabled transitions on the -KRFPN scheme, new nodes at he following higher level of the recognition tree are created and ap- ropriate values of the tokens are obtained: r(mk+1) = cr(mk) f (tl ) (6) here tl is the transition between concepts ECi and SCl, cr(mk) is the eliability of the elementary class ECi and f(tl) is computed in Eq (4). ue to the simplicity of the example, only one level of the recognition ree is generated. Note that only the recognition tree with the root ode π 2 0 (Fig. 12b) directly corresponds to the small part of –KRFPN epicted in Fig. 11. The other recognition trees (Fig. 12a, c and d) also easide p45 tree p25 s_of) rock p17 -(consists_of) t82 sand p18 -(consists_of) t830.38 0.85 0.47 -(consists_of) t94 0.4 0.7 SceneBear p30 -(consists_of) t93 SceneShuttle p47 cene recognition for image depicted in Fig. 8. 9548 M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 Fig. 12. Recognition trees with enabled transitions for each root node. ( s i v I I o a 1 i o S o a t t i { 8 n h c contain leaf nodes corresponding to the scene classes that are part of the knowledge base but are not depicted in Fig. 11. The following steps of scene recognition are as follows. Each leaf node π k i in the recognition tree k = 1, 2, . . . , b is represented by a vector of dimension |P|, where P is a set of places, so the index of a node in the recognition tree corresponds to the index of the vector component and the value of a node is assigned to a value of the vector component. For example, a node π 2 1 = (p45, 0.595) (Fig. 12b) is rep- resented by the vector π2 1 = (0, 0 . . . 0, 0.595, 0, . . , 0) so that all the vector components are assented to a value 0, except the 45th vector component, to which a node value of 0.595 is assigned. Accordingly, the total sum Z of all leaf nodes in all recognition trees is computed: Z = b∑ k=1 ok∑ i=1 πk i . (7) where πk i is ith leaf node in the k–th recognition tree, b ≤ |M| is the number of recognition trees, ok ≤ |P| is the total number of leaves in the recognition tree k. In this example there is b = 4 recognition trees, the corresponding numbers of leaves are: o1 = 12, o2 = 2, o3 = 9, o4 = 11, and the total sum is: Z = 4∑ (k=1) (ok)∑ (i=1) π k i = 12∑ i=1 π 1i + 2∑ i=1 π 2i + 9∑ i=1 π 3i + 11∑ i=1 π 4i = (0 . . . 0, 0.36, 0.44, 0.09, 0.05, 0, 0.03, 0, 0.04, 0, 0, 0.05, 0.03, 0.03, 0.04, 0, 0.16, 1.80, 0.05, 0.05, 1.11, 0, . . . 0). For example, the 30th component of the vector Z with the value 0.44 is obtained by summing all the values of the nodes in all the recognition trees that correspond to the place p30 (i.e. π 1, π 2, π 3, π 4): 0.115 + 0.175 + 0.052 + 0.102 = 0.44 7 2 8 9 Then, a set of indices of elements with the highest sum Z = Z1, Z2, . . . , Z|P|) among all of the nodes in all the recognition trees is elected as: ∗ = arg max i=1,..,|P| {Zi}. (8) In the case that there are several i for which the same maximum alue of {Zi} is obtained, the set I∗ is created: ∗ = {i∗1, i∗2, . . . ,}. (9) A scene class assigned to a place with the max argument pi: i ∈ ∗ is chosen as the best match for a given set of elementary classes btained during image interpretation at the layer MI1. In this ex- mple, the 45th component of the vector Z has the maximum value .80. Therefore, a set of max arguments consists of only one element ∗ 1 = 45, so only one scene class is chosen as the best match, i.e., the ne that is assigned to a place with that max argument, α(p45) = easide. The next scene candidate is α(p48) = Inland with a value f 1.11. By merging all the classes that are so far associated with the im- ge, from elementary classes to the scene class, the multi-layered in- erpretation of the image is formed. For example, a multi-layered in- erpretation of image I (in Fig. 8) includes the results of the image nterpretation at the layers MI1 and MI2 : MI(I) = MI1(I) ∪ MI2(I) = sky, rock, sand, water} ∪ { Seaside}. . Inference of more abstract classes The obtained scene classes can be used as root nodes for the ext inheritance process that will infer more abstract concepts from igher semantic levels (here referred as generalized and derived lasses) either because they are directly linked with the concept or M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 9549 grass p11 consists_of t80 Seaside p45 tree p25 consists_of t85 water p26 consists_of t86 0.40 0.95 cloud p6 consists_of t79 0.40 sky p20 consists_of t84 rock p17 consists_of t82 sand p18 consists_of t830.38 0.85 0.90 0.47 building p4 consists_of t78 Landscape p60 is_a t93 1.00 Natural Scene p53 is_a t392 1.00 Outdoor Scene p55 is_a t387 1.00 Vacation p59 is_associated_to t81 0.70 Fig. 13. A part of the knowledge base that shows the properties of the class “Seaside” and its parents. Fig. 14. The inheritance tree for the concept “Seaside”. m ( t s w t s k p e m t { f w c w t a ay be inferred by means of concepts at a higher level of abstraction parents). To determine related, more abstract classes for a given scene class, he relations with its parents at higher levels of abstraction are in- pected using an inheritance algorithm. The proposed procedure, by hich classes that are more abstract are concluded, will be illus- rated using the example of scene class Seaside ∈ SC that was re- ult of the recognition algorithm in Section 7. In Fig. 13, a part of a nowledge base is shown that includes information about the com- onents of the class “Seaside” and its more abstract classes defined by xpert. At the first step of the algorithm the appropriate place is deter- ined by the function α−1(Seaside) = p45. A token value c(ml) is set o 1, so the corresponding root node of the inheritance tree is π 0(p45, 1.0}). Fig. 14 shows a 3-level inheritance tree on the KRFPN scheme or the class ’Seaside’ that shows its more abstract classes (nodes ithin the ellipsis) as well as its properties. To determine more abstract classes associated with the given lass, the key nodes are those in the parent-child relationship ith the given class. The nodes in parent-child relationship for he class ‘Seaside’ are: π 14(p59, {0.7}), π 15(p60, {1.0}), π 21(p53, {1.0}) nd π 31(p55, {1.0}) and the following applies: α(p59) = Vacation, 9550 M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 Fig. 15. The inheritance tree for a synonym of Seacoast concept Seaside. i m s m r i ( ( R ( m p P M T p u c e g e e e f I w 1 m – f o e f t i c 2 m e l o α(p60) = Landscape, α(p53) = Natural scene, α(p55) = Outdoor scene. The classes “Landscape”, “Natural scene” and “Outdoor scene” are a generalization of the class “Seaside”, while the class “Vacation” is a derived class that one can associate with the class “Seaside” using the relation is_associated_to. Thus, the result of a multi-layered image annotation for the image I given in Fig. 8, after the generalization and the derived- concepts inference is: MI(I) = MI1(I) ∪M I2(I) ∪M I3(I) ∪M I4(I) = {sky, rock, sand, water} ∪{ Seaside} ∪{ Landscape, Natural scene, Outdoor scene} ∪{ V acation}. Also, new concepts can be added to the knowledge. Some exam- ples of such an extension are synonyms of the concepts defined in a scheme like Seacoast for Seaside or terms that are colloquially un- derstood as synonyms like Forest or Logs for Trees. In these cases, the is_synonim_of relation is defined between a class that is already defined in the knowledge base (e.g. Seaside) and the synonym that should be added (e.g., Seacoast). Fig. 15 shows the fuzzy-inheritance tree for the concept Seacoast, for which applies α−1(Seacoast) = p57, so the corresponding root node of the inheritance tree is π 0(p57, {1.0}). Inclusion of concepts at different levels of abstraction maps the organization of concepts from natural language to image annotation and facilitates the adjustment of the system to the user’s needs and expectations. 9. Experiments and discussion To evaluate the proposed model of a multi-layered image anno- tation, an experiment on a part of the Corel image dataset related to outdoor scenes (e.g., Landscape, Vehicles, Animals, Space) was performed. The images were automatically segmented, based on the visual similarity of pixels, using the normalized-cut algorithm. Most of the images were segmented into approximately 10 regions. Every image segment was more precisely characterized by a set of 16 visual fea- tures based on the color in CIE L∗a∗b∗ color model, size, position, height, width and shape of the area (Duygulu et al., 2002). Also, each image segment of interest was annotated with the first keyword from the set of corresponding keywords provided by (Carbonetto, Freitas, & Barnard, 2004). The vocabulary used to anno- tate the image segments has 28 keywords related to natural and arti- ficial objects such as ’airplane’, ’bird’, ‘lion’, ‘train’ etc. and landscapes like ‘ground’, ‘sky’, ‘water’ etc. The keywords from the vocabulary cor- respond to the elementary classes. Visual features and keywords of each image segment make a data set. The data set used for the experiment consists of 3960 segments obtained from 475 images of outdoor scenes. The data was because of supervised learning divided into training (2772) and test (1188) subsets by a 10-fold holdout cross validation. Data in the training set were used to learn a classification model of each elementary class us- ng a Bayes classifier. The features from the test set are used to test the odel using the corresponding elementary classes as ground-truth. To evaluate the MIAS system at layer MI1, the results of image clas- ification at that layer are compared to the ground truth. The perfor- ance of MIAS system at layer MI1 is expressed with measures of ecall (10) and precision (11). Average scores after 10 runs are shown n Fig. 16. The recall is the ratio of the correctly predicted elementary classes tp - true positive) and all elementary classes in the ground-truth data tp + fn; fn - false negative): ecall = tp tp + fn . (10) The precision is the ratio of correctly predicted elementary classes tp) and total number of elementary classes obtained from the auto- atic image interpretation at layer MI1 of the MIAS (tp + fp; fp - false ositive): recision = tp tp + fp (11) The proposed system MIAS for image interpretation at the layer I1 achieves an average precision of 32.6% and average recall of 27.5%. he average precision is calculated as the average of all the values of recision that are obtained for each elementary class in the test set sing the 10-fold cross validation. Similarly, the average recall is cal- ulated as the average of all the values of recall that are obtained for ach elementary class in the test set. Each elementary class in the raph (Fig. 16) is marked with class ID, so that ID 1 corresponds to the lementary class ‘airplane’, ID 2 to the elementary class ‘bear’, ID 3 to lementary class ‘bird’ and so on until ID 28 that corresponds to el- mentary class ‘zebra’. The highest precision, over 56% was obtained or the elementary classes: ‘grass’- ID 11, ‘polar bear’ - ID 15, ‘rock’ - D 16, ‘sky’ - ID 20 and ‘tracks’ - ID 23. The highest recall, over 57% as obtained for the elementary classes: ‘grass’ - ID 11, ‘ground’ - ID 2, ‘rock’ - ID 13, ‘sky’ - ID 20, ‘train’ - ID 24 and ‘trees’ - ID 25. For ele- entary classes ‘bear’ – ID 2, ‘building’ – ID 4, ‘cheetah’ – ID 5, ‘coral’ ID 7, ‘dolphin’ – ID 8, ‘fox’ – ID 10 and ‘zebra’ – ID 28 obtained value or both, precision and recall, is zero. Some of the reasons for this utcome are too few samples that we had available for the particular lementary class (e.g. for the class building we had only 24 segments, or the class dolphin only 20 segments), then the big diversity of fea- ures within the class (e.g. instances of class coral significantly differ n color) as well as errors in segmentation. The obtained results, given as outputs of MIAS at layer MI1, are ompared to the results of the models published in (Carbonetto et al., 004). The results of the automatic image annotation obtained for the entioned set of images with the dMRF model defined in (Carbonetto t al., 2004) and the dInd model from (Duygulu et al., 2002) are pub- ished in (Carbonetto et al., 2004). The dMRF model uses the method f Markov random fields for the automatic image annotation, while M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 9551 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Class ID P re ci si o n /R e ca ll Precision Recall Fig. 16. A precision/recall graph for the image automatic interpretation with MIAS at layer MI1 . Table 1 Comparison of the results achieved with MIAS at MI1 , dInd, dMRF. Models MIAS–MI1 dInd dMRF Number of correctly predicted classes 21 23 24 Average precision 32.6% 19.9% 21% t a a a b i a c l h t o p H i t T m t i d l c o t i f d b s r a d c A a M e l fi a he dInd model is an example of a translation model that treats image nnotation as the translation between two discrete languages. The uthors have reported the precision for the task of automatic image nnotation for each of 28 keywords in the vocabulary achieved by oth models. Comparing the results of the automatic annotation on mages related to outdoor scenes, the dMRF model achieves an aver- ge precision of 21%, while the dInd model achieves an average pre- ision of 20%. As specified in the Table 1, average precision of MIAS at ayer MI1 exceeds the precision of both models although our system as correctly predicted fewer classes, 21 classes out of 28 possible. Comparing the results presented in Table 1, it should be noted hat, for learning, the models dInd and dMRF used image labels, while ur approach uses labels of the image segments. Because of the su- ervised learning approach, somewhat better results were expected. owever, the achieved difference for the average precision is signif- cant, although the dMRF model took into account the context. Note hat the given results of our system MIAS at layer MI1 (represented in able 1.) are without any checking for inconsistency. Table 2 Examples of multi-layered image annotation by MIAS. Image example: Multi-layered image annotation MI1 ‘shuttle’ - ID 19 ‘train’’ – ID 2 –ID 23, ‘sk MI2 ‘Shuttle Scene’, ‘Train Scene MI3 ‘Vehicle’, ‘Man-Made Object’, ‘Outdoor’ ‘Vehicle’, ‘Ma Objectȁ, ‘O MI4 ‘Space’ ‘Transport’ Generally, the results achieved by image automatic annotation odels on outdoor image domains are relatively poor and the ques- ion is whether they can meet customer requirements when retriev- ng or organizing images. Often the results of automatic annotation epend on the quality of the segmentation, so when an image has a ot of segments and when an object is over segmented, the results an include labels that do not correspond to the object or context f an image. Here, to mitigate this problem, the obtained results of he image interpretation at the layer MI1 are analyzed with the fuzzy nheritance algorithm. The aim is to purify the classification results rom class labels that do not match the likely context of the image. To o so, the facts from the knowledge base related to the relationships etween elementary classes as well as the reliability of the relation- hips computed with (5) are used with the fuzzy inheritance algo- ithm. Using inconsistency checking, those elementary classes that re obtained as a result of the image interpretation at layer MI1 and id not fit the likely context are discarded. As a consequence, the pre- ision of the image interpretation at layer MI1 is increased up to 43%. further improvement of the precision could be achieved by defining dditional relationships between the elementary classes. Afterwards, automatic image interpretation at layer MI2 of the IAS is performed by the fuzzy-recognition algorithm, using the lementary classes obtained as results of image interpretation at ayer MI1 and using knowledge about a particular domain. To de- ne the relationships between the scenes and the elementary classes utomatically, and to determine their reliability, we have analyzed 4, ‘tracks’ y’ - ID 20 ’grass’ – ID 11, ‘tiger’’- ID 22 ’water’ – ID 26, ‘sand’ – ID 18, ‘sky’ - ID 20, ‘road’-ID 16 ’, ‘Tiger Scene’, ‘Seaside’, n-Made utdoor’ ‘Wildcat’, ‘Wildlife’, ‘Natural Scenes’, ‘Outdoor Scene’ ‘Natural Scenes’, ‘Outdoor Scene’ ‘Savannah’ ‘Vacation’ 9552 M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 w T l a f g a e a h l c c c w c ( i t s D ( a t p c t w o f l i a m l t t a f r a u a e g t c e i i t a n q t s s t b r t elementary classes and scenes related to each image. Therefore, we have supplemented the existing vocabulary with 20 classes related to the scenes such as ‘Airplane Scene’, ‘Bird Scene’, ‘Sea’, etc. Then, we have used these classes of 475 images to make a data set to be used for scene recognition. The data was divided into training and test subsets (70:30) by a 5-fold holdout cross validation. Data in the training set were used to produce the rules about relationships between scenes and elementary classes according to (3), and to learn a classification model of each scene class using a Bayes classifier, according to (4). Obtained precision of automatic image interpretation at layer MI2 is 61% and the recall is 55%. The results at the layer MI2 depend on the results at layer MI1. For those scenes for which there is one main object class which is highly discriminant for that scene (e.g train for Train Scene), it is crucial to detect that object. In this kind of scenes background objects that are common to most scenes do not play an important role, but in scenes without one prominent object (e.g. Sea, Inland) they are important. Additionally, the inheritance algorithm is used to infer generalized classes related to a scene class that make the interpretation at the layer MI3 and derived classes at the layer MI4. In Table 2, some examples of a multi-layered image annotation obtained by MIAS are shown. 10. Conclusion The aim of the present research was to annotate automatically im- ages with words that are used in an intuitive image search. These words correspond to concepts on different levels of abstraction, in order to enable simple retrieval and organization of images. These concepts cannot be simply mapped to features but require additional reasoning with general and domain-specific knowledge, which can in context of image interpretation often be incomplete, imprecise, and ambiguous. Therefore, the ability of handling uncertainty and reason- ing from fuzzy knowledge turned out to be an important property. We have developed the fuzzy-knowledge based intelligent system MIAS for multi-layered image annotation supported with fuzzy in- ference engine that is capable to deal with approximate reasoning and uses the available knowledge to draw conclusions about image semantics. In order to bridge the semantic gap between the visual content of an image and the image semantics, the MIAS system deals with visual content of images (low-level features) and with image semantics (el- ementary, scene, generalized and derived classes) that are inspired by human image interpretation and presented on layers MI1 − MI4. We have merged the statistical approach for classification of im- age segments with knowledge-based approach to infer concepts that are more abstract. For classification of image segments into elemen- tary classes below the MI1 layer, a Bayesian classifier is used. The ar- chitecture of the MIAS system facilitates its compatibility with vari- ous classification methods so that other classification methods can be used as well. The fuzzy knowledge base is built using a fuzzy knowl- edge representation scheme based on Fuzzy Petri Nets (KRFPN) for- malism. The hierarchical arrangement of the KRFPN schemes used in MIAS allows that the schemes can be independently used, modified and connected with each other into a new hierarchical structure, e.g. to expand the knowledge base with new concepts that may be syn- onyms or with concepts on different semantic levels. The acquisition of knowledge was facilitated so that all the facts and rules on composition and distribution of concepts, as well as their reliability are produced automatically from data in a training set. A human expert explicitly specifies only the facts about general knowl- edge and heuristics about the particular domain. Both new relation- ships and new concepts with appropriate reliabilities can be stored into the knowledge base and used by the inference engine. The approximate reasoning capability of the inference engine sup- ported in KRFPN scheme was used in an original way for automatic scene recognition, for inference of classes that are more abstract as ell as for inconsistency checking of the classified image segments. he concepts obtained by classification of image segments at the ayer MI1 (elementary classes) were treated as components of scenes t the layer MI2 that can be inferred by further analysis with the in- erence engine. Thus obtained concepts were then used for inferring eneralized and derived classes related to the image at the layers MI3 nd MI4. Since the decisions about more abstract concepts can be made ven when input information about the concepts present in an image re imprecise and vague, the errors can be propagated through the ierarchical structure of concepts and affect the inference on higher evels. To reduce this problem, we have proposed a novel consistency- hecking procedure that checks consistency of obtained elementary lasses at the layer MI1 with the determined image context and dis- ards the intruder classes, to increase the reliability of conclusions, as ell as to improve the precision of image annotation. The results of image annotation at layer MI1 of the MIAS were ompared with the published results of automatic image annotation Carbonetto et al., 2004, Duygulu et al., 2002), on the same set of mages and using the same image features. It has been shown that he supervised learning approach provides significantly better re- ults than the unsupervised methods used in (Carbonetto et al., 2004, uygulu et al., 2002), even when they take into account the context Carbonetto et al., 2004). After the inconsistency checking is performed, the results of im- ge annotation at layer MI1 are significantly improved considering he average precision. Additionally, the proposed system MIAS sup- orts the recognition of scenes and reasoning about the related con- epts at different levels of abstraction, to mimic the way people in- erpret images and to enrich the image annotation with concepts that ould people most likely use when searching for these images. The main contributions of the presented research in the field f expert and intelligent systems are related to the definition of uzzy knowledge-representation scheme KRFPN for automatic multi- ayered image annotation and to novel and original use of approx- mate reasoning capabilities of the inference engine for inferring bout the semantics of images. The main advantages of the proposed ulti-layered image annotation MIAS system is the fusion of low- evel image features and knowledge based concepts related to seman- ics of an image. Another advantage stems from the connection be- ween the statistical and knowledge-based approach in order to take dvantages of their strengths so that statistical methods are used to acilitate the knowledge acquisition and for automatic generation of elationships between concepts as well as for computing their reli- bility. Other strengths of the MIAS system arise from the original se of fuzzy inference engine for scene recognition and for reasoning bout more abstract concepts as well as the novel use of inference ngine for checking consistency of concepts to reduce error propa- ation through the hierarchical structure of the scheme. Thanks to he KRFPN formalism, the MIAS system proved to be successful in oping with incomplete, imprecise, uncertain and ambiguous knowl- dge. The rules in the knowledge-base of MIAS can be visualized us- ng Fuzzy Petri Nets and conclusions can be directly understood us- ng the inference trees. Another advantage of the proposed system hat arises from the KRFPN formalism is the ability to be extended by dding new rules and to be adapted to a new domain by acquiring ew facts and adapting the fuzzy knowledge base. The proposed system architecture facilitates the knowledge ac- uisition phase, but due to automatic generation of rules, a larger raining set of images is needed. The automatically generated rules trongly depend on the used data set, so when images in the training et are not representative, the automatically generated rules on spa- ial relationships between objects in the images and the relationships etween objects and scenes may not be general enough and their eliability may not be properly set. Therefore, after development, he system should be additionally tuned for accuracy. Although the M. Ivasic-Kos et al. / Expert Systems With Applications 42 (2015) 9539–9553 9553 a c i v i w a M m d t t f d o s b A t i R A B B B C C C D D D E F F H H H I I L L L M M N P P R S S S S S S T Y Y Z rchitecture of the MIAS system is general, the limitation is that it annot be immediately used for new applications or domains. The mages should be preprocessed to obtain low-level features, a new ocabulary should be defined and new rules created, either automat- cally using the training data set or provided by an expert. This research was oriented to the domain of outdoor images, so e plan to implement and test the proposed system in new domains nd with very large image databases. Since the architecture of the IAS system facilitates its compatibility with various classi fication ethods, for the first layer of image interpretation we will examine ifferent classification methods and methods of probability estima- ion as well as optimized mechanisms for extracting visual features. In the future research, we plan to expand the proposed model and o examine the possibilities of its adaptation for annotation of videos, or recognition of activities in image or video contents and for pre- iction of future actions. Therefore, we will examine the possibility f including the fuzzy spatial and temporal relations into the MIAS ystem as well as explore the required adaptations of formalisms to e used in the system. cknowledgment This work has been fully supported by Croatian Science Founda- ion under the project 6733 De-identification for Privacy Protection n Surveillance Systems (DePPSS). eferences thanasiadis, T., et al. (2009). Integrating image segmentation and classification for fuzzy knowledge-based multimedia. In Proceedings of the MMM2009. arnard, K., Duygulu, P., Forsyth, D., Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Match- ing words and pictures. Journal of Machine Learning Research, 3, 1107–1135. enitez, A. B., Smith, J. R., & Chang, S. F. (2000). Medianet: A multimedia information network for knowledge representation. In Proceedings of the IS&T/SPIE: v. 4210 MA. lei, D., & Jordan, M. (2003). Modeling annotated data. In Proceedings of the 26th annual International ACM SIGIR Conference on Research and Development in Information Re- trieval (pp. 127–134). arbonetto, P., Freitas, Nde., & Barnard, K. (2004). A statistical model for general contex- tual object recognition. In Proceedings of ECCV 2004, Czech Republic, (pp. 350–362). hen, S. M., Ke, J. S., & Chang, J. F. (1990). Knowledge representation using fuzzy Petri nets. IEEE Transactions on Knowledge and Data Engineering, 2(3), 311–319 1990. hengjian, S., Zhu, S., & Shi, Z. (2015, May). Image annotation via deep neural network. In Proceedings of IEEE 14th IAPR International Conference on Machine Vision Applica- tions (MVA) (pp. 518–521). atta, R., Joshi, D., & Li, J. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Transactions on Computing Surveys, 20, 1–60. ong, P. T. (2014). A survey of refining image annotation techniques. International Jour- nal of Multimedia & Ubiquitous Engineering, 9(3). uygulu, P., Barnard, K., de Freitas, J. F. G., & Forsyth, D. A. (2002). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In Proceedings of European Conference on Computer Vision (pp. 97–112). akins, J., & Graham, M. (2000). Content-Based Image Retrieval. Technical Report JTAP- 039, JISC, Institute for Image Data Research, University of Northumbria, Newcastle. ei-Fei, L., & Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR.: vol. 2 (pp. 524–531). IEEE. eng, S., & Xu, D. (2010). Transductive multi-instance multi-label learning algorithm with application to automatic image annotation. Expert Systems with Applications, 37(1), 661–670. are, J. S., Lewis, P. H., Enser, P. G. B., & Sandom, C. J. (2006). Mind the Gap: Another look at the problem of the semantic gap in image retrieval. In Proceedings of Multimedia Content Analysis, Management and Retrieval San Jose, California, USA.. ong, R., Wang, M., Gao, Y., Tao, D., Li, X., & Wu, X. (2014). Image annotation by multiple-instance learning with discriminative feature mapping and selection. IEEE Transactions on Cybernetics, 44(5), 669–680. u, J., & Lam, K. M. (2013). An Efficient Two-Stage Framework for Image Annotation. Pattern Recognition, 46(3), 936–947. vasic-Kos, M., Pavlic, M. &, & Pobar, M. (2009). Analyzing the semantic level of out- door image annotation. In Proceedings of 32nd IEEE International Convention On In- formation And Communication Technology, Electronics And Microelectronics-MIPRO (pp. 293–296). Opatija, Croatia. vasic-Kos, M., Ribarić, S. &, & Ipsic, I. (2010). Image annotation using fuzzy knowledge representation scheme. In Proceedings of the IEEE 2010 International Conference of Soft Computing and Pattern Recognition (pp. 218–223). Paris, France. i, X., & Lara-Rosano, F. (2000). Adaptive fuzzy Petri nets for dynamic knowledge rep- resentation and inference. Expert Systems with Applications, 19(3), 235–241. i, J., & Wang, J. Z. (2003). Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(19), 1075–1088. iu, Y., Zhang, D., Lu, G., & Ma, W. Y. (2007). A survey of content-based image retrieval with high-level semantics. Pattern Recognition, 40(1), 262–282. aillot, N.E. (2005). Ontology based object learning and recognition (PhD thesis), Univer- site de Nice-Sophia Antipolis. arques, O., & Barman, N. (2003). Semi-automatic semantic annotation of images us- ing machine learning techniques. In Proceedings Of International Semantic Web Con- ference (pp. 550–565). ezamabadi-pour, H., & Kabir, E. (2009). Concept learning by fuzzy k-NN classification and relevance feedback for efficient image retrieval. Expert Systems with Applica- tions, 36(3), 5948–5954 Part 2, April. apadopoulos, G. T. H., Saathoff, C., Escalante, H. J., Mezaris, V., Kompatsiaris, I., & Strintzis, M. Z. (2011). A comparative study of object-level spatial context tech- niques for semantic image analysis. Computer Vision and Image Understanding, 115(9), 1288–1307. eterson, J. L. (1981). Petri net theory and the modeling of systems. Prentice Hall PTR. ibarić, S., & Pavešić, N. (2009). Inference procedures for fuzzy knowledge representa- tion scheme. Applied Artificial Intelligence, 23, 16–43 January 2009. hatford, S. (1986). Analyzing the subject of a picture: A theoretical approach. Cata- loguing & Classification Quarterly, 5(3), 39–61. hi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transaction on PAMI, 22(8), 888–905. imou, N., Athanasiadis, T., Stoilos, G., & Kollias, S. (2008). Image indexing and re- trieval using expressive fuzzy description logics, Signal, Image and Video Processing: 2 (pp. 321–335) December. Springer December. meulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380. rikanth, M., Varner, J., Bowden, M., & Moldovan, D. (2005). Exploiting ontologies for automatic image annotation. In Proceedings of SIGIR: 05 (pp. 552–558). toilos, G., Stamou, G., Tzouvaras, V., Pan, J. Z., & Horrocks, I. (2005). The fuzzy descrip- tion logic f-shin. In Proceedings of International Workshop on Uncertainty Reasoning For the Semantic Web. ousch, A. M., Herbin, S., & Audibert, J. Y. (2012). Semantic hierarchies for image anno- tation: A survey. Pattern Recognition, 45(1), 333–345. in, H., Jiao, X., Chai, Y., & Fang, B. (2015). Scene classification based on single-layer SAE and SVM. Expert Systems with Applications, 42(7), 3368–3380 1 May. u, Y., Pedrycz, W., & Miao, D. (2014). Multi-label classification by exploiting label cor- relations. Expert Systems with Applications, 41(6), 2989–3004 May. hang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation tech- niques. Pattern Recognition, 45(1), 346–362. http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0003a http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0003a http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0003a http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0003a http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012a http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012a http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012a http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0012a http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0023 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0023 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0023 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0023 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0024 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0024 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0025 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0025 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0025 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0025 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0026 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0026 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0026 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0026 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0026 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0026 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0027 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0027 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0027 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0027 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0027 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0027 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0027 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0028 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0028 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0028 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0028 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0028 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0028 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0029 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0030 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0030 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0030 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0030 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0030 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0031 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0031 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0031 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0031 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0031 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0031 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0032 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0032 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0032 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0032 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0032 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0033 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0033 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0033 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0033 http://refhub.elsevier.com/S0957-4174(15)00528-X/sbref0033 A knowledge-based multi-layered image annotation system 1 Introduction 2 Related work 3 Multi-layered image representation 4 A multi-layered image annotaton system 5 A knowledge-representation scheme 5.1 Definition of the KRFPN scheme for the multi-layered image annotation 5.2 Modeling the truth value of relationships 5.2.1 Relationship consists_of 5.2.2 Relationship occurs_with 5.2.3 Spatial relationships 6 Knowledge-based approach to inconsistency checking 7 Scene recognition 8 Inference of more abstract classes 9 Experiments and discussion 10 Conclusion Acknowledgment References