key: cord-0544135-86dil0vd
authors: Liu, Jiangwei; Min, Liangyu; Huang, Xiaohong
title: An overview of event extraction and its applications
date: 2021-11-05
journal: nan
DOI: nan
sha: b4057b17fd37a5b92d93b5ddcb482ec9cdb44809
doc_id: 544135
cord_uid: 86dil0vd

With the rapid development of information technology, online platforms have produced enormous text resources. As a particular form of Information Extraction (IE), Event Extraction (EE) has gained increasing popularity due to its ability to automatically extract events from human language. However, there are limited literature surveys on event extraction. Existing review works either spend much effort describing the details of various approaches or focus on a particular field. This study provides a comprehensive overview of the state-of-the-art event extraction methods and their applications from text, including closed-domain and open-domain event extraction. A trait of this survey is that it provides an overview in moderate complexity, avoiding involving too many details of particular approaches. This study focuses on discussing the common characters, application fields, advantages, and disadvantages of representative works, ignoring the specificities of individual approaches. Finally, we summarize the common issues, current solutions, and future research directions. We hope this work could help researchers and practitioners obtain a quick overview of recent event extraction.

Closed-domain event extraction. From the view of techniques used, existing approaches can be divided into four categories: pattern matching, machine learning, deep learning, and semi-supervised learning methods. It is worth noting that semi-supervised learning methods are separately treated as a single category because much research has recently used semi-supervised or distant learning methods to enhance corpora and has become a research hot.

From the view of how to train a model, existing approaches can be categorized into pattern matching, pipelined training, and joint training methods. Which manner is chosen mainly depends on how to treat the subtasks of event extraction by researchers.

From the view of whether much expert knowledge is needed, existing approaches can be divided into knowledge-driven, data-driven, and hybrid methods [20] . Knowledge-driven methods usually need expert knowledge to design delicate patterns. Data-driven approaches mainly exploit knowledge from big data through statistics or deep learning methods. The hybrid approaches combine the above mentioned methods.

Existing research can be divided into sentence level, document level, and cross-document level from the corpus level on which the event extraction tasks are performed.

Open-domain event extraction. Open-domain event extraction is highly different from closed-domain event extraction because it focuses on detecting new or unexpected events from texts. So there are no predefined event types, and event schema induction is a critical subtask of open-domain event extraction. From the view of technologies used, existing approaches can be divided into Bayesian-based [21] , clustering-based [11] , parsing-based [8] , lexicon-based [22] , semi-supervised [19] and distant supervision based [15] , Adversarial Domain Adaptation based [23] . From the view of the task target, the existing research can be categorized into new event detection, event generation, and event tracking.

Despite the importance and popularity of event extraction, there are limited comprehensive reviews and summaries on the recent study of event extraction [20, 24, 25] . Most of the surveys research mainly focus on some specific field, for example, deep learning schema-based event extraction [26] , multilingual event extraction [27] , event extraction from social networks [28] , biomolecular event extraction [29, 30] , event extraction for decision support systems [2] , etc. Another limitation is that most existing surveys, including comprehensive reviews, lack a summary of recent open-domain event extraction research. From this view, we review and provide an overview of recent event extraction literature. Different from previous survey research, we summarize the contributions of this study as follows:

(1) We systematically review the literature of event extraction from the technique view, both closed-domain and open-domain event extraction included. In each section, we review the models, techniques, event levels, datasets, and application fields of the representative research and summarize them in a corresponding table by year.

(2) A trait of this survey is that we try to provide an overview in moderate complexity. We ignore the specificities of individual research and avoid discussing the details of the individual research. We focus on discussing the common characters, application fields, advantages, and disadvantages of representative works. We hope this work could help researchers and practitioners obtain a quick outline of recent event extraction.

(3) We summarize the common issues and challenges that hinder event extraction generalization and industrial applications. And currently corresponding solutions and research directions are also mentioned in the following.

The remainder of this paper is organized as follow. We first introduce event extraction task definition, commonly used corpora, and evaluation metrics. Then review and summarize the literature in the technique view, with closed-domain event extraction in section 3 and open-domain event extraction in section 4. Section 5 summarizes and discusses the current common research issues and future directions. Conclusions are followed in section 6.

As a particular form of information, event extraction involves named entity recognition (NER) and relation extraction (RE), and mostly depends on the results of these tasks. As an interdisciplinary subject, event extraction is closely related to computer science, statistics, and natural language processing. We demonstrate the relations from its fundamentals to its applications in Figure 2 . 

Interdisciplinary Subjects … Figure 2 : Demonstration of the relationship between event extraction and other interdisciplinary subjects and techniques.

Following the event extraction task definition in ACE 2005, an event is frequently described as a change of state, indicating a specific occurrence of something that happens in a particular time and a specific place involving one or more participants. It can help answer the "5W1H" questions, i.e., "who", "when", "where", "what", "why" and "how" about an event. ACE employs the following terminologies to describe an event extraction task:

Event mention : An Event mention usually is a phrase or sentence that describes an event in which a trigger and corresponding arguments are included.

Event trigger : It usually is a verb or a noun that most clearly expresses the core meaning of an event. Argument role : An argument role is a function or position that an event argument performs in the relationship between the event argument and the trigger.

For example, there are two event types involved in sentence S1: "Die" and "Attack", triggered by "died" and "fired", respectively. For Die event, "Baghdad", "cameraman", and "American tank" are its arguments with corresponding roles: Place, Victim, and Instrument, respectively. For Attach event, "Baghdad", "cameraman", "American tank" and "Palestine Hotel" are its arguments with corresponding role: Place, Victim, Instrument and Target, respectively. This is a somewhat more complex example with three arguments shared, which is more challenging than the simple case with one event type in one sentence. Figure 3 shows the event extraction annotation and the syntactic parser results.

• S1: In Baghdad, a cameraman died when an American tank fired on the Palestine Hotel.

In Baghdad a cameraman died when an American tank fired on the Palestine Hotel . , The closed-domain event extraction task can be divided into four subtasks: trigger identification, event type classification, argument identification, and argument role classification. From the manner of how to organize the subtasks of the event extraction, most of the existing closed-domain event extraction methods can be divided into two mainstreaming categories: pipelined-based method and joint-based method. The pipeline-based method utilizes the idea of Divide-and-Conquer algorithms; thus, the advantage is that it simplifies each subtask and can afford information for subsequent subtasks. In contrast, the disadvantages are that it propagates cascading errors, and the overall performance dramatically relies on the previous subtasks. The joint-based method considers the subtasks independently, thus does not propagate errors among the subtasks. Accordingly, the disadvantages are that it can not utilize previous subtasks' information and needs more large-scale delicate labeled data to train the models.

Event Extraction corpora are annotated by professionals or experts with domain knowledge and used to train or evaluate models. This section mainly introduces some representative event extraction corpora afforded by public evaluation programs or mentioned in previous literature. We summarize these popular corpora in Table 1 • The Factbank corpus [33] is built on TimeBank 1.2 and part of the AQUAINT TimeML corpus. The difference is that the Factbank corpus is supplemented with additional information concerning the factuality of events. It consists of 208 documents and contains a total of 9488 manually annotated events.

• The GENIA corpus [34] is a semantically annotated corpus of biological literature. The GENIA corpus 3.0 consists of 1999 abstracts taken from the MEDLINE database. The current GENIA corpus event annotation covers 1,000 of the 1999 abstracts of the primary GENIA corpus, marking 36114 events in 9372 sentences. The more detailed event annotation information can be found at http://www.geniaproject.org/genia-corpus/eventcorpus.

• The TDT corpora 1 [35, 36] 

The event extraction task, especially the closed-domain event extraction task, can be regarded as a classification task or Sequence Labeling task. Most existing literature uses classification metrics to evaluate the event extraction performance.

In accordance with IE and TM, performance is generally measured by calculating the quantity of true positives and negatives, as well as that of false positives and negatives. The most used metrics, e.g., precision, recall, and F1 score, are calculated as follows:

These performance measures provide a brief explanation of the "Confusion Metrics". True positives (TP) and true negatives (TN) are the observations that are correctly predicted. In contrast, false positives (FP) and false negatives (FN) are the values that the actual class contradicts with the predicted class.

Open-domain event extraction aims to detect the unreported events or track the progress of the previously spotted events. In most cases, it has no predefined schemas and event types. But with the help of annotated corpus, it still can be transferred into a classification problem and thus uses the mentioned evaluation metrics. Many works conduct the open-domain event extraction by clustering algorithms, and therefore, some clustering evaluation metrics like mutual information or Chi-Square are often employed. For example, normalized pointwise mutual information (nPMI) can be used to measure the slot coherence [37] :

where W is the total number of words in the corpus; f (x) and f (y) are frequencies of x and y in the corpus; f (x, y) is the occurrence frequency of word pair (x, y) in the corpus. There are also other variants, e.g., cP M I (Corpus Level Significant PMI) and P M I 2 used in the literature [40, 41] .

This section categorizes closed-domain event extraction approaches into pattern matching, machine learning, deep learning, and semi-supervised learning methods. The categorical arrangement also considers and follows the time when the technique became a popular mainstream. We focus on providing an overview of closed-domain event extraction by concentrating on the most common characters, including the main idea, common framework, applicated area, advantage, and disadvantage. Many peculiarities of individual approaches are not considered in this study.

One character of pattern matching based methods is that they depend on domain-specific event templates, which require a great deal of manual knowledge engineering to construct elaborately designed features. The earliest event extraction methods were mainly based on syntax trees or regular expressions.

The typical representative work might be the AutoSlog system, developed by Ellen in 1993 [42] . It first defines 13 linguistic patterns with the help of a conceptual sentence analyzer. These linguistic patterns are used to build a domain-specific dictionary of concepts automatically. Then the AutoSlog uses the trigger word dictionary to detect a potential event. Lastly, it associates the event patterns and linguistic features, e.g., part-of-speech tags (POS) generated by the sentence parser, to assemble the argument and its corresponding role. We summarize this typical process in Figure 4 . Due to its outstanding performance in specific domains, the research of pattern matching based event extraction has exploded in various fields, such as the biomedical [43, 44, 16] , general information extraction [45, 46] , finance and economics [47] , etc. Akane et al. [43] design a program to extract events from biomedical papers using a full parser. Halil et al. [44] use syntactic dependency and rules to perform biological event extraction. Ekaterina et al. [16] incorporate manually curated dictionaries and machine learning methodologies to extract event triggers and arguments on trimmed dependency graph structures. Roman et al. [45] propose an automatic event pattern discovery approach, which can identify a set of relevant documents and a set of event patterns from un-annotated text, starting from a small set of "seed scenario patterns". Chang et al. [46] propose a method that can effectively summarize the Chinese e-news by four main components: Chinese POS tagger, Chinese term filter, Event Ontology Filter, and Summarization Agent. Jethro et al. [47] propose the use of lexico-semantic patterns for financial event extraction from RSS news feeds.

The typical characters lie in two aspects: (1) utilizing lexical features, e,g., part-of-speech tags (POS), entity information, and morphology features (token, lemma, etc.); (2) utilizing delicate event patterns normally designed by experts with domain knowledge.

Several advantages of pattern-based approaches are summarized as follow. First, it needs less corpus than data-driven methods. Second, it has better interpretability due to its patterns are manually designed and maintained. Third, it can achieve high extraction accuracy in a specific domain once the patterns are well designed.

We summarize the disadvantages of pattern-based approaches from designing and generalizing views. First, developing and maintaining the delicate event patterns is rather time-consuming and labor-intensive. Second, because pattern designing is strongly dependent on the expression form of text, it needs much effort to transfer the patterns from one domain to another. Low reusability of designed patterns or templates limits its generalization. 

To alleviate the difficulty in designing delicate event patterns, many researchers have explored machine learning methods to extract events. In this section, we first review the typical machine learning based event extraction literature and summarize it in Table 2 from the view of the year, model, paradigm, technique, datasets used, event-level performed, and application area. We also summarize and plot the typical abstract process in Figure 5 . Then we focus on discussing the common characters of typical research from feature engineering, the paradigm, technique, and application fields, without considering spending much effort in describing the details of specific methods. We finally summarize the advantages and disadvantages of machine learning based event extraction methods. The features reported in previous machine learning based event extraction methods can be categorized into lexical and contextual features. Lexical features contain part-of-speech tags (POS), entity information, and morphology features (e.g., token, lemma, etc.) [3] . Contextual features include local information (sentence level), global information (document level), and external dictionaries. These features are complementary, and there have been various research combining global evidence from related documents with local decisions [59, 60, 61] . For example, to overcome the shortage of analyzing sentences in isolation, Huang and Riloff [51] present a bottom-up architecture to consider a view of the larger context. It is implemented by integrating sequential sentence classifiers that capture textual cohesion, including lexical associations and discourse relations across sentences. To resolve the ambiguities of sentence-level event extraction relying on local information, Liao and Grishman [59] use document-level statistical information to improve sentence-level event extraction to achieve document level within-event and cross-event consistency. Patwardhan and Riloff [60] combine phrasal and sentential evidence into a probabilistic framework to enhance accuracy. Hong et al. [53] use blind cross-entity inference to improve sentence-level ACE event extraction by considering the consistency and distribution of entities and roles.

Considering the complexity of the event extraction task, the foremost researchers divide the task into four subtasks: event trigger identification, event type classification, argument detection, and role classification. There is much research to train the classifiers in a pipelined manner, with the advantage that the previous classifier can provide information to later classifiers [63, 62, 61, 4, 59, 37, 57, 56, 54, 53, 52, 5, 14, 17] . For example, Peng et al. [14] propose an automatic pipeline to extract adverse drug events (ADE) by using Naïve Bayes and Support Vector Machine (SVM) to detect drugrelated tweets and sentiment analysis before mapping the biomedical text into drug events. However, the shortcoming of pipelined training is also obvious: error propagation (cascading defects). To deal with this problem, researchers adopt a joint training manner that treats the event extraction task as a multi-classification problem [60, 51, 50, 49, 48] . For example, Chen and Ng [49] employ joint learning for Chinese event extraction and investigate (1) various linguistic features that exploit results of zero pronoun resolution and noun phrase coreference resolution, and (2) features that exploit trigger probability and trigger type consistency.

From the technique view, the support vector machine (SVM), maximum entropy (ME), Naive Bayes (NB), conditional random field (CRF), integer logic programming (ILP), Hierarchical agglomerative clustering (HAC) are the most used machine learning algorithms. Lu and Roth [48] present a semi-Markov CRF approach for automatic event extraction and further develop a novel learning approach called PM (structured preference modeling) that allows structured knowledge to be incorporated effectively in a declarative manner. Björne and Salakoski [52] use SVMs to extract biomedical events (detailed descriptions of biomolecular interactions) from research articles in a pipelined manner.

From the application field view, these machine learning based event extraction models evolve in many areas, including general information extraction [ [55] give a review of the current event extraction methods for systems biology.

There is much research involving in the specific domain or improving the extraction accuracy. Henn et al. [17] perform case studies on how visualization techniques enhance automated event extraction. Naughton et al. [62] merge and extract events from heterogeneous news sources. There is also much research involving other language event extraction, for example, Chinese event extraction [64, 65, 50] . Li et al. [50] employ joint learning for Chinese event extraction and solve the high ratio of pseudo trigger mentions to true ones by using trigger filtering schemas.

We end this section by summarizing the advantages and disadvantages of machine learning based event extraction by comparing it with pattern matching based methods. The benefits lie in two folds. Machine learning methods alleviate much effort to design delicate patterns and have better generalization and reusability. The disadvantages lie in three folds. First, supervised methods need more labeled data to train the model. Second, Feature engineering is a time-cost but critical step that affects extraction accuracy. Third, traditional machine learning methods have limitations in learning deep or complex nonlinear relations.

Feature engineering is the main challenging issue of traditional event extraction methods. And traditional machine learning methods have limitations in learning deep or complex nonlinear relations. Deep learning based methods can alleviate these shortages due to their two distinguishing characters. First, embedded representation of input is suitable for big data. Second, specific deep architectures can better capture various more complex nonlinear features. This section first reviews the recent deep learning based event extraction literature, then summarizes it in Table 3 from the year, model, paradigm, technique, datasets used, event-level performed, and application area. Then we focus on discussing the common characters of typical research from the view of feature, technique, and application field, without much effort in describing the details of specific methods. We finally summarize the advantages and disadvantages of deep learning based event extraction methods. RNN & LSTM based. RNN and LSTM architectures are good at capturing long-term and shot-term memory information, thus are suitable for sequence labeling and long dependency text. And event extraction can also be regarded as a sequence labeling task. For example, Nguyen et al. [80] use two bidirectional RNNs to learn a richer representation for the sentences. This representation is then utilized to predict event triggers and argument roles jointly. Wei et al. [12] propose a Bi-LSTM-CRF-RNN-CNN approach to extract medications and associated adverse drug events (ADEs) from clinical documents. Specifically, in the named entity recognition phase, the BI-LSTM layers calculate scores of all possible labels for each token in a sequence. Then the CRF layer predicts a token's label using its neighbor's information. In the relation classification phase, all possible candidate relation pairs are generated by a structure that integrates CNN and RNN. To deal with the error propagation issue, Wei et al. [12] propose a joint method for medication and adverse drug event extraction.

Attention & Transformer based. Attention mechanisms allow the deep learning models to learn the most important information and ignore the noises by allocating different weights to different embeddings. According to the object the attention mechanisms work on, there are word-level, sentence-level, document-level, and channel-level attentions. The Transformer is a multi-head self-attention architecture in essence. Much attention-based or Transformer-based event extraction research has emerged. For example, Zheng et al. [7] propose an end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic graph to fulfill the document-level event extraction. The difference between Doc2EDAG and the classic method, Bi-LSTM-CRF, is that Doc2EDAG employs the Transformer instead of the original encoder, LSTM. The Transformer layers are used to encode a sequence of embeddings by the multiheaded self-attention mechanism to exchange contextual information among the token sequence. Lu et al. [66] also propose a sequence-to-structure generation paradigm that can directly extract events from the text in an end-to-end manner. Compared with [7] , a distinguishing difference is that [66] uses the event schemas as constraints to control the event records generation.

GCN based. Multiple events existing in the same sentence, arguments of one event across more than one sentence, or document-level event extraction are all facing one challenge: long-range dependencies. A common solution to leverage dependency structures is using universal dependency parses. Syntactic Graph Convolution Networks (GCNs) with nodes representing tokens and edges representing directed syntactic arcs are helping alleviate this challenge. To handle the difficulty of multiple events existing in the same sentence, Liu et al. [78] propose a novel Jointly Multiple Events Extraction (JMEE) framework to jointly extract multiple event triggers and arguments by introducing attention-based GCN to model the dependency graph information. Ahmad et al. [67] use a Graph Attention Transformer Encoder (GATE) to learn the long-range dependencies and apply it in cross-lingual relation ad event extraction.

Bert based. Pretrained semantic representations, such as EMLo, Bert, have been widely used in multiple NLP tasks and have shown performance improvements in various NLP tasks. Bert is a bi-directional transformer architecture model, which has been trained on massive corpora and has learned fairly good semantic representations conditioned on token context and remains rich textual information [6] . Recently, much research has used Bert pre-trained representation as shared textual input features. For example, Liu et al. [75] explicitly cast the event traction task as a machine reading comprehension problem and use question-answering techniques to perform event extraction. Min et al. [71] propose an event extraction framework, ExcavatorCovid, which extracts COVID-19 related events and relations between them from news and scientific publications. These events are used to build a Temporal and Causal Analysis Graph, which will help the government sort out the information and adjust the related policies timely. The framework use Bert, Pooling, and linear layers for extracting temporal and causal relations.

Other new methods. Except for the mentioned deep learning based models, new paradigms of event extraction have emerged, such as question-answering based approaches [75] . For example, Liu et al. [75] explicitly cast the event traction as a machine reading comprehension problem and use question-answering techniques to perform event extraction. Many works are adopting strategies to improve extraction accuracy [74] . Many existing models seldomly consider the relationships between the event mentions and the event arguments in different sentences. To handle this challenge, Huang and Peng [74] propose a document-level event extraction framework, DEED, leveraging Deep Value Networks (DVN) to capture the cross-event dependencies and coreference resolution.

From an application perspective, these deep learning based event extraction models involve in many areas, including general information extraction [3, 80, 78, 77, 75, 74, 66] , biomedical, [79, 12, 69, 68] , financial [7, 6] , multimedia [76] , legal [15] , social [73, 71, 70] , political [72] , cross-lingual [67] , etc.

We close this section by summarizing the advantages and disadvantages of deep learning based event extraction by comparing it with traditional methods. Deep learning, essentially, is an extension and development of machine learning. So it has the same pros and cons as machine learning. Here, we focus on summarizing the distinguishing strengths and weaknesses. The benefits lie in three folds. First, deep learning methods have more powerful nonlinear expression ability and can capture more complex relations between features, avoiding much feature engineering. Second, each deep learning method has its specialty and strong point in capturing syntactic and semantic features. For example, LSTM and Transformer architectures are all skilled in capturing long-range dependencies. Third, pre-trained models, especially Bert, can ford excellent context information and have been widely used as standard input features. The weaknesses of deep learning methods are as follow. First, due to the complex deep architectures, deep learning based models mainly rely on huge labeled corpora to train the model. Second, numerous parameter settings may affect the performance, such as the learning rate, training epochs, etc. However, many researchers have explored semi-supervised and unsupervised learning methods to alleviate the difficulty in obtaining labeled corpus.

Most event extraction systems are trained with supervised learning and rely on a collection of annotated data. Due to the domain-specificity of tasks, event extraction systems must be retrained with new massive annotated data for each domain [81] . However, human-labeled training data is expensive to produce. Recently, some researchers have explored new methods, such as semi-supervised and distant supervision methods, to automatically produce more training data.

Semi-Supervised methods. Semi-supervised learning (SSL) has attracted considerable attention to help achieve strong generalization by making use of both unlabeled data and labeled data [13, 82, 83, 84, 85, 86, 87, 88, 89] . Much research has used various SSL methods to help generate data or augment data for event extractions: role-identifying nouns [81] , linear discrimination analysis [86] , Vector Quantized Variational Autoencoder [85] , multi-modal Generative Adversarial Network [89] , etc.

Huang and Riloff [81] use role-identifying nouns to learn extraction patterns by a bootstrapping solution. Then the role-identifying nouns and patterns are used to create training data for event extraction classifiers. Mansouri et al. [86] first use a convolutional neural network to extract explicit features from text and images, then use linear discrimination analysis (LDA) to predict the classes of unclassified data. Once the predicted accuracy is met, explicit features and predicted labels will be used to finally predict whether a piece of news is fake or real. The labeled and unlabeled instances are incorporated for training the semi-supervised learning model. Chen et al. [89] extend the multi-modal Generative Adversarial Network (mmGAN) model to a semi-supervised architecture, which attempts to discriminate if the data is real or generated and categorize it into one of the two classes: traffic event or non-traffic event. As shown in Figure 7 , the multi-modal feature learning architecture consists of three components: a Generator G, a Discriminator D, and a Classifier C. Different from the mentioned methods focusing on data generation and data augmentation, Zhou et al. [88] design a novel semi-supervised framework DualQA (dual question answering), to solve the event argument extraction in low-resource scenarios.

Distant Supervision Methods. Distant supervision is a successful paradigm that gathers training data for event extraction systems by automatically aligning vast databases of facts with the text [90, 91, 92, 93, 94] . For example, Reschke et al. [90] present a new publicly available dataset and use the distant supervision approach to plane crash events. Yang et al. [91] first use Distance Supervision (DS) to automatically generate labeled data, then a sequence tagging model to extract document-level events from the financial announcements. The data generation contains two steps. First, the event trigger can be automatically marked by querying the pre-defined dictionary (financial event knowledge base); thus, event mentions can be automatically identified, following the event trigger and the event arguments labeled. Second, once the event mention is identified, it is labeled as a positive example; then, the rest of the sentences in the announcement are marked as negative examples, which all constitute the document-level data. The deep event extraction architecture has a Bi-LSTM-CRF module for sentence-level and a CNN module for document-level event extraction. Zuo et al. [92] firstly design a Lexicon Enhanced Annotator (LexiAnno) to extract many causal event pairs based on linguistic knowledge and employ them to label sentences via distant supervision automatically. Experimental results show the proposed data augmentation framework outperforms other benchmark methods. To solve data lack and imbalance in coverage of crisis types, Alrashdi and O'Keefe [93] utilize distant supervision to automatically generate large-scale labeled tweet data for crisis response.

Every single event extraction approach has its own merits and demerits. Combining different techniques can help integrate the advantages of multiple methods and significantly enhance the performance. There is an increasing number of researchers that employ multiple approaches, i.e., hybrid models. We review the existing literature and discuss it in two scenarios: single event extraction task and comprehensive system.

Integrating Different Paradigm. As discussed above, we have divided the research into four paradigms: pattern matching methods, machine learning methods, deep learning methods, and data augmented methods. Many researchers have considered more than one paradigm to enhance the accuracy of event extraction. For example, Reschke et al. [90] extend the distant supervision approach to template-based event extraction and construct a new corpus, then use the linear-chain CRF model to test the performance on this dataset. Yang et al. [91] use the pattern-based methods to annotate the sentence-level and document-level corpus, then use the deep learning method to perform event extraction.

Integrating Different techniques. Because CRF and Bi-LSTM-CRF are widely used in different NER tasks, SVM and RNN-CNN are widely used in relation-classification tasks. And RNN is good at capturing global features, whereas CNN is good at capturing local features. Wei et al. [12] propose a Bi-LSTM-CRF-RNN-CNN approach to extract medications and associated adverse drug events (ADEs) from clinical documents. Li et al. [58] incorporate three supervised machine learning models: CRF, AdaBoost, and SVM automatically to extract medication events from clinical text. GCN is good at modeling the long dependency parse, and Transformer is good at capturing the most important information. Ahmad et al. [67] propose a deep model, integrating GCN and Transformer, to generate structured contextual representations based on the dependency parse results.

The pre-trained models, such as Bert, can well represent the contextual semantic information and have been used as the standard input features. Then other deep learning architectures can be stacked based on this input layer, finetuned, and trained to execute related tasks. Lybarger et al. [69] extract COVID-19 diagnoses and symptoms from clinical text. In this work, Bert, Bi-LSTM, Attention are used to generate pan representation. Specifically, firstly, Bert is used to map the input sentence into contextualized word embeddings. Then, these representations are feed to Bi-LSTM without finetuning the Bert. Lastly, each span is represented as the attention-weighted sum of the Bi-LSTM hidden states.

In recent years, event-related comprehensive systems have emerged. The remarkable character is that these systems extract multiple categorical information (e.g., entities, relations, and events), from multiple sources, multiple languages, and heterogeneous data modalities (speeches, texts, images, and videos).

Li et al. [76] present a comprehensive, open-source multimedia knowledge extraction system (GAIA) and create a coherent, structured knowledge base. This GAIA system enables the search of complex graph queries and retrieves multimedia evidence, including text, images, and videos. Specifically, the authors extract coarse-grained events and arguments using a Bi-LSTM-CRF model and a CNN-based model in the Text Knowledge Extraction (TKE) branch. [22] Lexicon & parsing RotatingCCG parser, CoreNLP de Vroe et al. [22] Event Fact Modality 2021 ETYPECLUS [99] Parsing & clustering Bert, EM, clustering algorithm ACE2005, ERE, Pandemic 14 Event Type Induction 2021 Peng et al. [11] clustering GCN, DBSCAN Collected from Weibo [11] Event Detection 2020 Chau et al. [98] Lexicon & parsing LSTM, Convolution Chau et al. [98] Energy Prediction 2020 Naik and Rose [23] Adversarial Wen et al. [18] also propose a comprehensive extraction system (RESIN) that can automatically construct temporal event graphs. RESIN extends from sentence-level event extraction to cross-document cross-lingual cross-media event extraction, coreference resolution, and temporal event tracking.

These event-related comprehensive systems have greatly enhanced the accuracy of information retrieval. The hybrid method integrates the advantages of multiple techniques, multiple sources, multiple languages, and heterogeneous data modalities, leading it to a mainstreaming paradigm in the future, especially in industrial applications.

The most distinguishing characteristic of open-domain event extraction is that it does not assume predefined event types and schemas. It usually focuses on detecting new or unexpected events [19, 11, 95] , event text generation [96, 97] , specific field application ( e.g., energy prediction [98] ), and other general information extraction [99, 23, 37, 21, 100, 10, 9, 8, 101] . In this section, we first review the recent open-domain event extraction literature and summarize it in Table  4 Clustering-based. Social events are the unique aggregation of various semantics, and related events or evolutions tend to be cohesive. Thus, density-based clustering algorithms can be used to detect new events and evolution discovery. For example, for each event group, an event schema can also be constructed with a slot-value schema through Event Schema Induction (ESI). Peng et al. [11] propose a streaming social event detection and evolution discovery framework. Specifically, first, an event-based heterogeneous information network (HIN) and a novel Pairwise Popularity Graph Convolutional Network (PP-GCN) are constructed. Then a parallel heterogeneous clustering algorithm (H-DBSCAN) is proposed for streaming event detection and evolution discovery.

Parsing-based. Syntactic parsing results are widely used to enhance open-domain event extraction tasks. For example, the verb tag helps detect the event trigger, whereas the noun tag helps filter the event arguments. And the syntactic dependencies help catch the same event's roles and arguments, which appear across multiple sentences. Ritter et al. [8] present the first open-domain event extraction and categorization system (TwiCal) for Twitter. As is shown in Figure  8 , the processing pipeline contains POS tag, temporal resolution, NER, event tagger, significance ranking, and event classification components. Shen et al. [99] present an open-domain event type induction framework (ETYPECLUS). For this purpose, the framework first selects predicates and object heads, then disambiguates predicates, lastly induces the <hpredicate sense, object head> pairs by embedding and clustering algorithms. Chau et al. [98] use syntactic parsing, WordNet, and a word sense disambiguation tool to extract events from news headlines. Then the events are used to feed to a deep neural network to predict the natural gas price. 

Bayesian-based open-domain event extraction models assume that a sentence or document is a joint distribution over event types, slots, entities, and contextual features. For example, Wang et al. [21] propose an open event extraction model (AEM) based on Bayesian and Generative Adversarial Nets. Specifically, a Dirichlet prior and a generator are used to capture the patterns of latent events. In contrast, a discriminator is used to distinguish documents reconstructed from the latent events and the original input documents. Unlike other GAN-based text generation approaches that capture the generating text sequence, the generator in AEM learns the projection function between an event distribution and the event-related word distributions; thus, it captures the event-related patterns. Zhou et al. [9] propose a Bayesian model, called Latent Event Model (LEM), to extract a structured representation of events from social media. The most striking characteristic of LEM is that it is a fully unsupervised approach, and no annotated data is required. Reference [37] extracts event type, schema, and arguments using a neural latent variable network and Bayesian inference model (ODEE) and gets better results than other base models.

Adversarial Domain Adaptation. The adversarial domain adaptation (ADA) framework is initially proposed by Ganin and Lempitsky and has been widely used in multiple NLP tasks [105] . Naik and Rose [23] leverage the adversarial domain adaptation (ADA) framework to identify event triggers. This framework treats the event trigger identification task as a token classification problem. A representation learner is trained to generate token-level representations, which are predictive for trigger identification but not for domain prediction, making it more domain-invariant. The obvious advantage is that there is no need to annotate the target domain data.

Open Domain Event Text Generation. Automated Story Generation (ASG) has been a research problem of interest and open-domain event extraction subtask. Fu et al. [96] perform an open-domain event text generation task with an entity chain as its skeleton. To build this dataset, a wiki augmented generator framework containing an encoder, a retriever, and a decoder is proposed. The encoder encodes the entity chain into hidden representations while the decoder decodes from these hidden representations and generates related stories. The retriever is responsible for collecting reliable information to enhance the readability of the generated text. Martin et al. [97] model the automated story generation task as a sampling problem. It generates the following event by choosing the maximizing probability from the event distribution.

We close this section by discussing the advantages and disadvantages of the mentioned works compared with the closed-domain event extraction methods. Most open-domain event extraction works focus on detecting new events and extracting related information. This information is beneficial for scenarios that require comprehensive knowledge of broad-coverage, fine-grained, and dynamically evolving event categories, e.g., stock price prediction based on news. However, from the mentioned literature review, we can find that the existing methods are mainly based on syntactic parsing, clustering, Bayesian, lexicon, etc. The current methods' output is still not as perfect as closed-domain event extraction results in two aspects. First, since open-domain event extraction needs no predefined schemas, the extracted results are omnifarious, increasing the difficulty of utilization. Second, since open-domain event extraction has no predefined event types, some research uses the extracted event trigger to represent the event types. Although many researchers have tried to induce these event types by clustering or latent event type inference, the results are not always convenient or understandable. Due to the usefulness of dynamically evolving event categories, we believe that there will be more research exploring new paradigms and techniques in open-domain event extraction.

In this section, we summarize and discuss current common research issues in event extraction. Despite the considerable progress in event extraction, there are still some challenges involved but not limited to the following aspects.

Datasets. Although there have been various annotated corpora and many researchers have explored some semisupervised methods to automatically label data, the data size and categories still look embarrassed compared with the big data algorithm requirement. Another problem is category imbalance. For example, the existing corpus category mainly focuses on natural disasters, social relationships, biomedicine, etc. And the size of some categories are small. Even worse, there is no annotated corpus in some fields. More high-quality annotated data need more research, such as semi-supervised or distant supervision methods.

Document-level and corpus-level event extraction. Most existing event extraction methods mainly extract event arguments within the sentence scope [7, 61] . However, the extraction results are not ideal in the following two cases. First, event arguments of the same event always scatter across different sentences. Another, multiple sentences or documents characterize the same event. The former case leads the extraction results incomplete, while the latter case leads the extraction results redundant. Document-level and corpus-level event extraction tasks face the following challenges: long-term dependency and entity and event coreference. Researchers have started to settle this problem by various mechanisms, such as end-to-end structured prediction [74] , sequence-to-structure generation paradigm [66] , Open-schema event profiling [106] , etc.

Cross-linguistic. Researchers have contributed relatively richer event extraction corpora in English, whereas fewer corpora in other languages. Recently cross-lingual transfer learning approaches have been used for event extraction [107, 67] . For example, Subburathinam et al. [107] use a GCN-based network to train an event extraction model from source language annotations to the target language. However, GCN is not good at capturing long-range dependencies or not directly connected relations in the dependency tree. Ahmad et al. [67] improve this work by using attention mechanisms to learn the dependencies between words with different syntactic distances. Cross-linguistic event extraction can save much effort in constructing corpora in other languages, and it is beneficial for low-source languages.

Event coreference. Usually, the same event frequently co-exists in multiple documents. For example, it is a common case that different news media report the same hot news. Even document-level event extraction may not alleviate the redundancy. Event coreference or event merge is crucial for information retrieval, especially in event-related comprehensive systems which involve multiple sources, multiple languages, and heterogeneous data modalities (speeches, texts, images, and videos).

Open-domain event extraction needs new schemas and techniques. The current research primarily focuses on closed-domain event extraction due to its plentiful corpora, mature methods, and acknowledged evaluation mechanisms. 

In this paper, we review and summarize the literature in event extraction from text. Overall, we focus on providing a comprehensive overview of event extraction tasks, ignoring the peculiarities of individual approaches. Specifically, we first introduce the related concepts of event extraction, such as EE catalog, task definition, corpora, evaluation metrics. Then we summarize the literature from the technique view. In both closed-domain and open-domain event extraction sections, we summarize the literature from the year, common framework, technique, corpus, application field, advantage, and disadvantage. Last, we summarize and discuss current common issues and related progress in closed-domain and open-domain event extraction.

Although there are still many challenges, event extraction, especially open-domain event extraction, is attracting more and more attention due to its crucial role in information extraction. This research provides a way to quickly understand up-to-date event extraction tasks from a moderately difficult perspective.

The automatic content extraction (ace) program-tasks, data, and evaluation

A survey of event extraction methods from text for decision support systems

Event extraction via dynamic multi-pooling convolutional neural networks

Real-time news event extraction for global crisis monitoring

Real-time event extraction for driving information from social sensors

Casee: A joint learning framework with cascade decoding for overlapping event extraction

Doc2edag: An end-to-end document-level framework for chinese financial event extraction

Open domain event extraction from twitter

A simple bayesian modelling approach to event extraction from twitter

Open-domain extraction of future events from twitter

Streaming social event detection and evolution discovery in heterogeneous information networks

A study of deep learning approaches for medication and adverse drug event extraction from clinical text

Ssel-ade: a semi-supervised ensemble learning framework for extracting adverse drug events from social media

Efficient adverse drug event extraction using twitter sentiment analysis

Events matter: Extraction of events from court decisions

Event extraction from trimmed dependency graphs

Visualization techniques to enhance automated event extraction

Resin: A dockerized schema-guided cross-document cross-lingual cross-media information extraction and event tracking system

Augmenting open-domain event detection with synthetic data from gpt-2

An overview of event extraction from text

Open event extraction from online text using a generative adversarial network

Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Modality and negation in event extraction

Towards open domain event trigger identification using adversarial domain adaptation

Survey on event extraction technology in information extraction research area

A survey of event extraction from text

Deep learning schema-based event extraction: Literature review and current trends

A survey of multilingual event extraction from text

A survey of textual event extraction from social networks

An overview of biomolecular event extraction from scientific documents. Computational and mathematical methods in medicine

A short survey of biomedical relation extraction techniques

The timebank corpus

Timebank 1.2 documentation. Event London

Factbank: a corpus annotated with event factuality. Language resources and evaluation

Genia corpus-a semantically annotated corpus for bio-textmining

Topic detection and tracking: Event clustering as a basis for first story detection

Topic detection and tracking: event-based information organization

Open domain event extraction using neural latent variable models

A dataset for open event extraction in english

Reporting the unreported: Event extraction for analyzing the local representation of hate crimes

Improving pointwise mutual information (pmi) by incorporating significant co-occurrence

An extensive empirical study of collocation extraction methods

Automatically constructing a dictionary for information extraction tasks

Event extraction from biomedical papers using a full parser

Syntactic dependency based heuristics for biological event extraction

Automatic acquisition of domain knowledge for information extraction

Ontology-based fuzzy event extraction agent for chinese e-news summarization

Semi-automatic financial events discovery based on lexico-semantic patterns

Automatic event extraction with structured preference modeling

Joint modeling for chinese event extraction with rich linguistic features

Joint modeling of trigger identification and event type determination in chinese event extraction

Modeling textual cohesion for event extraction

Generalizing biomedical event extraction

Using cross-entity inference to improve event extraction

Complex event extraction at pubmed scale

Event extraction for systems biology by text mining the literature

Event extraction with complex event classification using rich features

Extracting protein interactions from text with the unified akanere event extraction system

Lancet: a high precision medication event extraction system for clinical text

Using document level cross-event inference to improve event extraction

A unified model of phrasal and sentential evidence for information extraction

Refining event extraction through cross-document inference

Event extraction from heterogeneous news sources

The stages of event extraction

Research on chinese event extraction

Language specific issue and feature exploration in chinese event extraction

Text2event: Controllable sequence-to-structure generation for end-to-end event extraction

Gate: Graph attention transformer encoder for crosslingual relation and event extraction

A novel joint biomedical event extraction framework via two-level modeling of documents

Extracting covid-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework

Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction

Excavatorcovid: Extracting events and relations from text corpora for temporal and causal analysis for covid-19

Protest-er: Retraining bert for protest event extraction

Event prediction based on evolutionary event ontology knowledge

Efficient end-to-end learning of cross-event dependencies for documentlevel event extraction

Event extraction as machine reading comprehension

Gaia: A fine-grained multimedia knowledge extraction system

Exploring pre-trained language models for event extraction and generation

Jointly multiple events extraction via attention-based graph information aggregation

Biomedical event extraction using convolutional neural networks and dependency parsing

Joint event extraction via recurrent neural networks

Bootstrapped training of event extraction classifiers

A semi-supervised learning framework for biomedical event extraction based on hidden topics

Semi-supervised event extraction with paraphrase clusters

Semisupervised recurrent neural network for adverse drug reaction mention extraction

Semi-supervised new event type induction and event detection

A semi-supervised learning method for fake news detection in social media

Conformer-based sound event detection with semi-supervised learning and data augmentation

What the role is vs. what plays the role: Semi-supervised event argument extraction via dual question answering

Multi-modal generative adversarial networks for traffic event detection in smart cities

Event extraction using distant supervision

Dcfee: A document-level chinese financial event extraction system based on automatically labeled training data

Knowdis: Knowledge enhanced data augmentation for event causality detection via distant supervision

Automatic labeling of tweets for crisis response using distant supervision

Biomedical relation extraction using distant supervision

Open-domain event detection using distant supervision

Open domain event text generation

Event representations for automated story generation with deep neural nets

A neural-based model to predict the future natural gas market price through open-domain event extraction

Corpus-based open-domain event type induction

Event extraction from twitter using non-parametric bayesian mixture model with word embeddings

Automatically generated noun lexicons for event extraction

Financial event extraction using wikipedia-based weak supervision

Learning latent personas of film characters

Can twitter replace newswire for breaking news?

Unsupervised domain adaptation by backpropagation

Open-schema event profiling for massive news corpora

Cross-lingual structure transfer for relation and event extraction