key: cord-0036379-k9j83xtr authors: Goh, Ong Sing; Fung, Chun Che title: Automated Knowledge Extraction from Internet for a Crisis Communication Portal date: 2005 journal: Fuzzy Systems and Knowledge Discovery DOI: 10.1007/11540007_162 sha: a38b26fb278e03f180789a74b38dd5b64067da75 doc_id: 36379 cord_uid: k9j83xtr This paper describes the development of an Automated Knowledge Extraction Agent (AKEA) which was designed to acquire online news and document from the internet for the establishment of a knowledge based crisis communication portal. It was recognized that in times of crisis, an effective communication mechanism is essential to maintain peace and calmness in the community by providing timely and appropriate information. It is proposed that the incorporation of software agents into the crisis communication portal will be capable to send alert news to subscribed users via internet and mobile services. The proposed system consists of crawler, wrapper, name-entity tagger, AIML (Artificial Intelligence Markup language) and an animated character is used in the front-end for human computer communication. With the acceptance and increasingly reliance of the Internet, the Internet has now become "the" repository of human knowledge and information for the 21 st century. On the other hand, advancements in internet and mobile communication technologies have provided effective and cheap means of communication for the modern society. The global implications of such technologies are unparalleled in the history of human civilization. Hence, the Internet now serves two of the most important functions in the modern world -as a giant virtual storehouse of data, information and knowledge, and, as the true information superhighway whereby delivery of all kinds of data and information can be done cheaply and quickly. The potential of effective use of these two aspects are particularly important in times of crisis. Within the context of this paper, crisis may be referred to events or incidents that have the potential to cause national panic, confusion, unrest and possible catastrophe. These crises may be due to health epidemic, natural disasters and man-made tragedies such as terrorist attacks. Examples of these events that happened in the recent past are Severe Acute Respiratory Syndrome (SARS), bird flu, mad cow disease, September 11, earth quakes and tsunami. In these cases, accurate information delivered within the shortest duration of time at the lowest costs would be essential in informing the affected communities and the relevant authorities. In particular, if decisions are made quickly and appropriately, this will have the benefits of reducing the potential damages and will lead to better manage of the situations. This paper reports the development an Automated Knowledge Extraction Agent (AKEA) which was designed to establish the knowledge base for a global crisis communication system called CCNet. CCNet was proposed during the height of the SARS epidemic in 2003. It was aimed at providing up-to-date information to its users via a conversational software robot called AINI (Artificial Intelligence Neuralnetwork Identity). The purpose of AINI is to deliver essential information from trusted sources and is able to interact with its users by animated characters. The idea is to rely on a human-like communication approach thereby providing a sense of comfort and familiarity. The functionalities of AINI have been reported in the past and development on AINI is ongoing [1] . It is foreseeable that the combination of AINI and AKEA will produce a more natural means of communication and computing in the near future. The architectural design of the proposed system is shown in Figure 1 . The CCNet Portal can be divided into two main parts plus a middle-tier of multiple knowledge bases. The two main parts are termed the Front-End, responsible for interaction with the user, and, the Back-End, which is designed to establish the knowledge bases in the background. The AINI Server and Mobile Gateway are located in the middle. They function as the interconnection linkage between the Front-End (Client) and the Back-End (Server) of the system. They process the communication between the users of the system and the CCNet Portal. The AINI's engine comprises of an intelligent agent framework. All communications with AINI are carried out through a natural humanmachine interface that uses natural language processing and speech technologies via a 3D animated character. AINI's engine carries out the sophisticated decision making process based on the information it interprets from the knowledge bases. These decision-making capabilities are based on the knowledge embedded in the XML specifications. The input and output of the modules in the AINI knowledge bases such as Expression Emotion, Customers and AlertNews are stored in XML-encoded data structure. These modules are representations of the knowledge conceptualized in the format of XML data structure. From the perspective of the users, the CCNet system accepts questions and requests, and it is also capable to process the queries based on the information contained in AINI's knowledge base. The Front-end provides the necessary interaction between the user and the system. Three different modes of communication are provided -web chat, PDA chat and mobile chat. The web chat sessions allow interaction between a user and the software robot. The communication can text-based or voice-based with the animation of a 3D character. If voice is desired, Text-to-Speech technology is used to convert the text to voice using synthesizer hardware and software. This is particularly useful for someone who has difficulties or unfamiliar with the conventional keyboard. In terms of the animated character, users may customize the interface as required. They can also input the questions and receive the responses directly from the website. In addition, users may navigate through all the information on the topics or issues of their interest. If necessary, guidance may also be provided to assist the users. The main objective of AINI is to intelligently offer related information on various topics in a virtual environment where no real live agents or specialists are required to be physically involved. AINI uses natural language parsing in Artificial Intelligent Mark-up Language (AIML) to search the predefined knowledge base as well as other data sources located in other systems via the internet. Users interact with the virtual advisor through WebGuide, WebTips and WebSearch engines. WebGuide is used to guide users through the entire portal. The WebTips engine, on the other hand, provides tips or hints to the users. The WebSearch system is an integrated search engine which can search for local sites as well as the Internet and online databases. At the same time, the users can interact and chat with the AINI chatterbot or Virtual Agent. The chatterbot is based on natural-language processing and aimed to initiate conversations with users [1] . On the other hand, AINI also offers messaging, email and phone services to the users. Developing AINI into Personal Digital Assistance (PDA) devices is a recent approach in order to provide an alternative human and personalized interface between the computer and human. The PDA chat has the same functions as in web chats but with mobile capability. It is designed to incorporate mobile technology with natural language interface to assist interaction naturally with mobile devices. Implementation of PDA chat with the knowledge base was designed using WiFi technology. In this paper, the focus is on the development of a knowledge base which forms the "brain" of the CCNet portal. This knowledge base contains the domain knowledge for crisis communication based on specific discipline or topic. All the information in this knowledge base is going to be extracted from AKEA, which is explained in detail in the next section. In this proposed system, AINI's knowledge base consists of a common knowledge base, an expression emotion database, a customer knowledge base and an Alert-News knowledge base. From literature, it was identified that START (SynTactic Analysis using Reversible Transformations) developed by Boris Katz at MIT's Artificial Intelligence Laboratory is a natural language understanding system, and Omnibase is a virtual database that provides uniform access to heterogeneous and distributed Web sources via a wrapper-based framework [2] . A simplified version of the natural language annotation technology is employed here as the database access schemata to mediate between natural language and database queries. A detailed description of each component is provided in the following sections. AIML is used to represent AINI's common knowledge base. It is an XML specification for programming chat robots created by ALICE Artificial Intelligence Foundation. A typical way of representing knowledge in an AIML file is as follows: PATTERN The tag demonstrates that this file describes the way that knowledge is stored. The tag indicates an AIML category and it is the basic unit of the chatterbot's knowledge. Each category has a and a corresponding