Utilizing Temporal Information for Taxonomy Construction Luu Anh Tuan ∗1, Siu Cheung Hui †2, See Kiong Ng #3 ∗Institute for Infocomm Research, Singapore 1at.luu@i2r.a-star.edu.sg †School of Computer Engineering, Nanyang Technological University, Singapore 2asschui@ntu.edu.sg #Institute of Data Science, National University of Singapore 3seekiong@nus.edu.sg Abstract Taxonomies play an important role in many applications by organizing domain knowledge into a hierarchy of ‘is-a’ relations between terms. Previous work on automatic construc- tion of taxonomies from text documents ei- ther ignored temporal information or used fixed time periods to discretize the time se- ries of documents. In this paper, we pro- pose a time-aware method to automatically construct and effectively maintain a taxon- omy from a given series of documents pre- clustered for a domain of interest. The method extracts temporal information from the docu- ments and uses a timestamp contribution func- tion to score the temporal relevance of the evidence from source texts when identifying the taxonomic relations for constructing the taxonomy. Experimental results show that our proposed method outperforms the state- of-the-art methods by increasing F-measure up to 7%-20%. Furthermore, the proposed method can incrementally update the taxon- omy by adding fresh relations from new data and removing outdated relations using an in- formation decay function. It thus avoids re- building the whole taxonomy from scratch for every update and keeps the taxonomy effec- tively up-to-date in order to track the latest in- formation trends in the rapidly evolving do- main. 1 Introduction The explosion in the amount of unstructured text data gives us the opportunity to explore knowledge in depth, but there are also challenges to recog- nize useful information for our interests. To pro- vide access to information effectively, it is impor- tant to organize unstructured data in a structured and meaningful manner. Taxonomies, which serve as backbones for structured knowledge, are use- ful for many NLP applications such as question answering (Harabagiu et al., 2003) and document clustering (Fodeh et al., 2011). However, hand- crafted, well-structured taxonomies such as Word- Net (Miller, 1995), OpenCyc (Matuszek et al., 2006) and Freebase (Bollacker et al., 2008), which are pub- licly available, can be incomplete for new or special- ized domains. As it is time-consuming and cum- bersome to create a new one manually, methods for automatic domain-specific taxonomy construc- tion from text corpora are highly desirable. Previous work on automatic construction of domain-specific taxonomies from text documents assumed that the data sets (that is, the document sets) and the underlying taxonomic relations are static. However, the data sets for certain domains may evolve over time, as new documents are added while older documents are deleted or modified. As such, the taxonomic relations for these potentially fast-changing domains may not remain static but become dynamic over time as new domain terms emerge while some older ones disappear. For ex- ample, in World Health Organization reports about disease outbreak, the term ‘smallpox’ used to be a hyponym of ‘dangerous disease’, but it has fallen off since 1980. On the other hand, since 2014 the term ‘Ebola’ has become an emerging hyponym of ‘dan- gerous disease’. As another example, up until 1992, in a collection of US yearly reports of terrorism, the term ‘Palestine Liberation Organization’ used to be a hyponym of ‘terrorist group’, but it is no longer true nowadays. ‘Palestine Liberation Organization’ should now be classified as a ‘national organization’ of Palestine. When temporal information in data sets is not captured, the resultant taxonomy may be incom- plete or outdated and misleading. This could be caused by the overwhelming evidence of older pat- terns/contexts compared to emerging, but relatively 551 Transactions of the Association for Computational Linguistics, vol. 4, pp. 551–564, 2016. Action Editor: Bo Pang. Submission batch: 6/2016; Revision batch: 10/2016; Published 12/2016. c©2016 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license. small amount of, evidence of newer relations. For example, in the taxonomy of US yearly reports on terrorism, many previous methods could fail to rec- ognize the taxonomic relation between the two terms ‘ISIS’ and ‘terrorist group’ due to relatively infre- quent mentions of ‘ISIS’ (which only appears in re- ports from 2014). Meanwhile, ‘Palestine Liberation Organization’ could still be classified as a hyponym of ‘terrorist group’ because of the relatively larger number of mentions in the documents from the ear- lier years. In this paper, we propose a time-aware method for domain-specific taxonomy construction from a time series of text documents for a particular do- main. We incorporate temporal information into the process of identifying taxonomic relations by com- puting evidence scores for the data sources weighted by a timestamp contribution function (Efron and Golovchinsky, 2011; Li and Croft, 2003) to cap- ture the temporally-varying contributions of evi- dence from various documents at a particular point in time. We assume that newer evidence is more im- portant than older evidence. For example, the evi- dence that ‘Palestine Liberation Organization’ was a hyponym of ‘terrorist group’ in 1990 is less im- portant now than the evidence that ‘ISIS’ is a hy- ponym of ‘terrorist group’ in 2014. In the proposed method, we incorporate the timestamp contribution function into the method of Tuan et al. (2014) to measure the weights of the evidence for both sta- tistical and linguistic methods. With such built-in time-awareness for taxonomy construction, we en- sure that the constructed taxonomies are up-to-date for the fast-changing domains found in newswire and social media, where users constantly search for updated relations and track information trends. Most previous work requires re-running the tax- onomy construction process whenever there are new incoming data. Our proposed method enables in- cremental update of the constructed taxonomies to avoid costly reconstructions. We incorporate an information decay function (Smucker and Clarke, 2012) to manage outdated relations in the con- structed taxonomy. The decay function measures the extent that the relation is out-of-date over time, and we incorporate it into a time-aware graph-based al- gorithm for taxonomy update. The contributions of our research are summarized as follows: • We propose a time-aware method for taxon- omy construction that extracts and utilizes tem- poral information to measure evidence weights of taxonomic relationships. Our method con- structs an up-to-date taxonomy by adding new emerging relations and discarding obsolete and incorrect ones. • We propose an incremental time-aware graph- based algorithm to update an existing taxon- omy, instead of rebuilding a new taxonomy from scratch. 2 Related Work Previous work on taxonomy construction can be roughly divided into two main approaches: statis- tical learning and linguistic pattern matching. Sta- tistical learning methods for taxonomy construc- tion include co-occurrence analysis (Lawrie and Croft, 2003), hierarchical Latent Dirichlet Alloca- tion (LDA) (Blei et al., 2004), clustering (Li et al., 2013), term embedding (Tuan et al., 2016), linguis- tic feature-based semantic distance learning (Yu et al., 2011), and co-occurrence subnetwork mining (Wang et al., 2013). Supervised statistical methods (Petinot et al., 2011) rely on hierarchical labels to learn the corresponding terms for each label. The labeled training data, however, are costly to obtain and may not always be available in practice. Un- supervised statistical methods (Pons-Porrata et al., 2007; Li et al., 2013; Wang et al., 2013) are based on the idea that terms that frequently co-occur may have taxonomic relationships. However, these meth- ods generally achieve low accuracy. Linguistic approaches for taxonomy construction rely on lexical-syntactic patterns (Hearst, 1992) (e.g., ‘A such as B’) to capture textual expressions of taxonomic relations, matching them with given documents or Web information to identify the rela- tions between a term and its hypernyms (Kozareva and Hovy, 2010; Navigli et al., 2011; Wu et al., 2012). These patterns can be manually created (Wu et al., 2012) or automatically identified (Navigli et al., 2011). Such linguistic pattern matching meth- ods can generally achieve higher precision than the statistical methods, but they suffer from lower cov- erage. To balance precision and recall, Zhu et al. (2013) and Tuan et al. (2014) have combined both unsupervised statistical and linguistic methods to 552 Input documents & Web data Temporal information processing Sentence timestamp Domain term extraction Taxonomic relation identification Taxonomy induction Existing taxonomy (New) taxonomy Taxonomy construction Figure 1: Workflow of the proposed time-aware taxonomy construction method. find taxonomic relations. The approach that is closest to our work is the one proposed by Zhu et al. (2013), which performs dynamic taxonomy update. To keep up with ever changing social media data, new terms are mined from incoming data and added to the existing tax- onomy. The data are divided into separate clusters using pre-defined time periods based on their docu- ment timestamps. The newly found taxonomic re- lations in each period are then added to the exist- ing taxonomy. The use of a pre-defined time pe- riod to discretize the time series of the documents for taxonomy update can be problematic. If the cho- sen time period is too long, rapid changes of do- main terms and their taxonomic relations that have occurred within the time period may not be reported. If the time period is too short, it may fail to identify valid taxonomic relations that needed a longer time period to establish. The method also does not re- move those relations in the existing taxonomy that may have become obsolete over time. 3 Problem Specification We define the root term of a domain-specific taxon- omy as a word or phrase that represents the domain of interest. It can be any informative concept such as an entity (‘animal’) or event (‘Ebola outbreak’). Given a root term R, we define a corpus C as a pre- clustered set of a time series of text documents ac- cording to R. Given two terms t1 and t2, we denote t1 → t2 as a taxonomic relation where t1 is a hypernym of t2. In this work, we define a taxonomy as a triple H = (V,E,s), where: • V is the set of the taxonomy’s vertices, i.e., the set of terms, including the root term. • E is the set of the taxonomy’s edges, i.e., the set of taxonomic relations. • s is the creation time of the taxonomy. It can be the current date or any specified time. Our task is formally defined as follows: Given a root term R, a corpus C and an optional existing taxonomy H1 = (V1,E1,s1) constructed at time s1, we aim to build a new taxonomy H2 = (V2,E2,s2) at time s2, where s2 > s1, so that we can process the document set in C up to time s2 into the relevant terms in the taxonomy. If H1 does not exist, the problem becomes cre- ating a taxonomy H2 for corpus C. Otherwise, the problem is to update the existing taxonomy with the newly obtained data for corpus C. Note that while the taxonomy construction method is not a totally unsupervised method as it does require as input a corpus (C) pre-clustered by a domain of interest (i.e., R), the subsequent steps for constructing the taxon- omy given the text corpus are unsupervised. 4 Methodology Figure 1 shows the workflow of the proposed time- aware taxonomy construction method. There are two key processes in the proposed method: temporal information processing and taxonomy construction. 4.1 Temporal information processing The aim of the temporal information processing pro- cess is to generate temporal information (or times- tamp for short) for each sentence in the input docu- ment or Web data. Previous taxonomy construction methods (Zhu et al., 2013) only extract temporal information at the document level, i.e., all information in the document has the same timestamp as the document creation date. This assumption, however, is not always cor- rect. Figure 2 shows a sample report about the flight MH370 created on 30 July 2015. In this report, the timestamp of each sentence is very different from the document creation date. If we were to simply use the temporal information at the document level, the timestamps for the search areas of MH370 at dif- ferent periods will be incorrect. Thus, we propose a 553 Figure 2: A sample report about the flight MH370 created on 30 July 2015. method to extract timestamps (i.e., temporal expres- sions) at the sentence level. The method comprises the following three steps: Document creation date extraction: First, we ex- tract the timestamp at the document level. The text corpus that we are using for this study consists of a collection of reports, scientific publications and Web search results. For the first two types of documents, the timestamp is the document creation date that can be extracted directly from the data source, i.e., the date of the report, or the date of the publication. For Web search results, we use Google advanced search with customized time range which returns the search results together with their creation dates at the begin- ning of search snippets. Temporal expression extraction: Next, the sec- ond step proceeds to extract all temporal expressions (e.g. “2015 December 31”) in the document. Here, we use SUTime (Chang and Manning, 2012), a li- brary for recognizing and normalizing time expres- sion using a deterministic rule-based method. The output of this step is a list of time expressions, to- gether with their positions in the document. Sentence timestamp extraction and normaliza- tion: Finally, in the third step, we assign each sen- tence in the document a time expression as follows: • First, we assign a temporal value τ as the doc- ument creation date. • For each sentence s in the document: – If s contains a temporal expression τ1, assign τ1 as the timestamp of s and update τ = τ1. – Otherwise, assign τ as the timestamp of s. Note that we use the format ‘YYYY-MM-DD’ for the temporal expression. If the information of DD or MM is missing, it is replaced with the first day or first month respectively. For example, ‘December 2015’ will be normalized as ‘2015-12-01’. Using the proposed method, sentence (1) and sentence (2) in the example of Figure 2 will have the same times- tamp of ‘2014-03-08’, while sentence (3) will have the timestamp of ‘2014-03-17’. In Section 5.2, we will show that the extraction of timestamps at the sentence level will improve the performance of the proposed taxonomy construction method as compared to the extraction of timestamps at the document level. 4.2 Taxonomy construction There are three general steps to constructing a tax- onomy: domain term extraction, taxonomic relation identification, and taxonomy induction. We make use of the taxonomy construction method of Tuan et al. (2014) for the first step, incorporate times- tamps into the second step of identifying taxonomic relations (Section 4.2.2), and propose an incremen- tal taxonomy induction algorithm for the third step (Section 4.2.4). As extraction of domain term ex- traction does not affect the temporal aspects of tax- onomy construction, the first step of domain term extraction is not within the scope of this study. The reader can refer to Tuan et al. (2014) or Zhu et al. (2013) in which linguistic approaches to extract do- main terms are discussed. In this paper, we assume that the list of domain terms is available and we will focus only on discussing the second and third steps for taxonomy construction. 4.2.1 Taxonomic relation identification In this section, we give an overview of the method to identify taxonomic relations proposed in Tuan et al. (2014). Given an ordered pair of two terms t1 and t2, Tuan et al. (2014) calculates the evidence score that t1 → t2 based on the following three methods: Syntactic contextual subsumption (SCS): This method derives evidence for t1 → t2 from their syn- tactic contexts, particularly from triples of the form (subject,verb,object). It is observed that if the context set of t1 mostly contains that of t2 but not vice versa, then t1 is likely to be a hypernym of t2. To implement this idea, the method finds the most common relation (or verb) r of t1 and t2, submits the queries “t1 r” and “t2 r” to Web search engine and collects all search results to construct two corpora 554 CorpusΓt1 and Corpus Γ t2 for t1 and t2. The syntac- tic context sets are then created from these contex- tual corpora using a non-taxonomic relation identifi- cation method. The details of ScoreSCS(t1, t2) can be found in Tuan et al. (2014). Lexical-syntactic pattern (LSP): This method is to find how much more evidence for t1 → t2 is found on the Web than for t2 → t1. Specifically, a list of manually constructed taxonomic patterns (e.g., “t2 is a t1”) is queried with a Web search engine to es- timate the amount of evidence for t1 → t2 from the Web. The LSP measure is calculated as follows: ScoreLSP (t1, t2) = log(|CWeb(t1, t2)|) 1 + log(|CWeb(t2, t1)|) where CWeb(t1, t2) denotes the set of search results. String inclusion with WordNet (SIWN): This method is to check the evidence for t1 → t2 by using the combination of string inclusion and references in WordNet synsets. ScoreSIWN(t1, t2) is set to 1 if there is such evidence; otherwise, it is set to 0. Combined evidence: The three scores are then combined linearly as follows: Score(t1, t2) = α×ScoreSCS(t1, t2) + β ×ScoreLSP (t1, t2) + γ ×ScoreSIWN(t1, t2) If Score(t1, t2) is greater than a threshold value, then t1 is regarded as a hypernym of t2. 4.2.2 Incorporating temporal information into taxonomic relation identification Previous studies of taxonomic relation identifi- cation treated all evidence equally, i.e., evidence from 1950 is treated equally with evidence from 2014. This assumption is not always appropriate, as discussed in Section 1. We propose a time-aware method to identify taxonomic relations by incorpo- rating timestamps into the process of finding evi- dence, using the following timestamp contribution function: Definition 1 (Timestamp contribution function). Given a text sentence d with timestamp sd, the times- tamp contribution of d at time s0 is defined as: Td(s0) = ξe −ξ(s0−sd), (1) where ξ is a control rate, s0 > sd and (s0 − sd) is the time lapse between sd and s0. Equation (1) describes the timestamp contribution of a sentence at a specific time by using an expo- nential distribution function Td. The intuition be- hind this function is that the evidence of taxonomic relations found in more recent sentences will be of higher relevance than that found in older sentences. This function is inspired by the work of Efron and Golovchinsky (2011), and Li and Croft (2003), in which it was used to effectively rank documents over time intervals. Using the timestamp contribution function, we in- corporate temporal information into the three taxo- nomic relation identification methods described in Section 4.2.1, as follows: LSP method: For each search result snippet d in CWeb(t1, t2) collected from the Web search engine, we calculate the timestamp contribution score of d by using Td: Td(s0) = ξe−ξ(s0−sd), where s0 is a chosen specific time (i.e., the time of taxonomy con- struction) and sd is the timestamp of d. Note that sd has to be earlier than s0. The unit of time lapse (s0−sd) depends on the nature of corpus and can be, for instance, a day, a month or even a year. For ex- ample, if the corpus is from a fast-changing source such as social media, we can set the unit as day to keep up with the change of data on a daily basis. In contrast, for a corpus from slower changing domains such as scientific disciplines, the unit can be a year. The time-aware score for the LSP method is calcu- lated as follows: Score Time LSP (t1, t2) = ScoreLSP (t1, t2) × ∑ d∈CW eb(t1,t2) Td(s0) |CWeb(t1, t2)| (2) In Equation (2), the original LSP evidence score is multiplied by the average timestamp contribution score of all evidence sentences for the taxonomic re- lation from the Web. If the number of the returned search results is too large, we will use only the first 1,000 results to estimate the average timestamp con- tribution of evidence. Note that the total timestamp contribution score of all evidence sentences ∑ d∈CWeb(t1,t2) Td(s0) can be considered as the “weighted size” of CWeb(t1, t2), i.e., we weigh each evidence sentence using Equa- tion (1) and sum all these weights. However, if we use only the “weighted size” of CWeb(t1, t2) for the time-aware score ScoreTimeLSP (t1, t2), there will be some issues. Firstly, the score ScoreTimeLSP will not 555 be normalized with respect to the number of evi- dence sentences. This may lead to potential bias due to large amounts of past evidence—if there were an obsolete or incorrect taxonomic relation with many evidence sentences in the past, it may overwhelm the new taxonomic relations which may only have a small number of recent evidence sentences. Sec- ondly, if we normalize the score, the information on the number of evidence sentences, which is im- portant for the LSP method to recognize true taxo- nomic relationships, will be lost. Therefore, we pro- pose to use Equation (2), which combines both in- formation on the number of evidence sentences (em- bedded inside the original ScoreLSP score) and the normalized “weighted size” of CWeb(t1, t2). SCS method: Similarly, for each search result snip- pet d in CorpusΓt1 and Corpus Γ t2 , we calculate the timestamp contribution score of d using the function Td: Td(s0) = ξe−ξ(s0−sd), where s0 is a specific time and sd is the timestamp of d. The time-aware score for SCS method is calculated as follows: Score Time SCS (t1, t2) = ScoreSCS(t1, t2) × ( ∑ d1∈CorpusΓt1 Td1 (s0) |CorpusΓt1| + ∑ d2∈CorpusΓt2 Td2 (s0) |CorpusΓt2| ) (3) In Equation (3), the original evidence score of t1 → t2 is multiplied by the average timestamp con- tribution scores of the returned search snippets. Sim- ilar to Equation (2), Equation (3) combines both in- formation on the number of evidence sentences (em- bedded inside the original score ScoreSCS) and the normalized “weighted size” of them. SIWN method: Because WordNet does not contain information about timestamps, we set: Score Time SIWN(t1, t2) = ScoreSIWN(t1, t2) (4) Combined evidence: The final combined evidence score for the time-aware method is calculated as: Score Time (t1, t2) = α×ScoreTimeSCS (t1, t2) + β ×ScoreTimeLSP (t1, t2) + γ ×ScoreTimeSIWN(t1, t2) (5) If the value ScoreTime(t1, t2) is greater than a threshold value, we extract the relation t1 → t2. 4.2.3 Parameter learning We need to estimate the optimal values for the pa- rameters α, β and γ which are used in Equation (5). For this purpose, we apply ridge regression (Hastie et al., 2009). First, we use the time-aware method to create taxonomies for the ‘Animal’, ‘Plant’ and ‘Ve- hicle’ domains using corpora constructed by a boot- strapping method (Kozareva et al., 2008). Then, we ask two annotators to construct gold standard tax- onomies of the three domains (see Section 5.2 for more details) and use them to build the training sets. For each pair of terms (t1, t2) found in the gold stan- dard taxonomies, its evidence score is estimated as (τ+1), where τ is the threshold value for ScoreTime. Finally, we use Equation (5) to learn the best combi- nation of α, β and γ using the ridge regression algo- rithm. Note that we learn the parameters only once and use them subsequently for the other domains. 4.2.4 Incremental taxonomy induction To avoid reconstructing a taxonomy whenever there is new incoming data, we propose a novel in- cremental graph-based algorithm to update an exist- ing taxonomy with a given set of taxonomic rela- tions. The proposed algorithm updates a taxonomy automatically over time based on the information decay function defined below. Definition 2 (Information decay function). Given a taxonomic relation r, the information decay of r over the period from time s1 to time s2 is computed by the information decay function: Dr(s1,s2) = e −λ(s2−s1), (6) where λ is a decay rate and s2 > s1. The intuition behind the information decay func- tion is that the evidential value of a relation will de- crease over time at an exponential rate. Given a root node R, a set of taxonomic rela- tions T and, optionally, an existing taxonomy H1 = (V1,E1,s1) created at time s1 with vertex set V1 and edge set E1, the proposed graph-based algorithm constructs a new taxonomy H2 = (V2,E2,s2) cre- ated at time s2 with vertex set V2 and edge set E2. t1 → t2 denotes the edge from t1 to t2 in a taxon- omy, and w(t1 → t2) as the weight of this edge (i.e., evidence score). Algorithm 1 consists of four steps: Step 1: Update existing taxonomy (lines 2 - 4) This step aims to update the existing taxonomy from 556 Algorithm 1 Taxonomy induction algorithm Input: R: root node of taxonomy; T : new taxonomic relation set; H1 = (V1,E1,s1): existing taxonomy created at time s1 with vertex set V1 and edge set E1; Output: H2 = (V2,E2,s2): new taxonomy created at time s2 with vertex set V2 and edge set E2; 1: Set V2 = V1 and E2 = E1 2: for each edge (t1 → t2) ∈E2, t1 6= R and t2 6= R do 3: w(t1 → t2) = w(t1 → t2) ×e−λ(s2−s1) 4: end for 5: for each relation (t1 → t2) ∈T do 6: if (t1 → t2) ∈E2 then 7: w(t1 → t2) = w(t1 → t2) + ScoreTime(t1, t2) 8: else 9: E2 = E2 ∪ (t1 → t2) 10: w(t1 → t2) = ScoreTime(t1, t2) 11: if t1 6∈ V2 then 12: V2 = V2 ∪{t1} 13: end if 14: if t2 6∈ V2 then 15: V2 = V2 ∪{t2} 16: end if 17: if @ (t3 → t1) ∈E2 and t3 6= R then 18: E2 = E2 ∪ (R→ t1) 19: w(R→ t1) = 1 20: end if 21: if ∃ (R→ t2) ∈E then 22: E2 = E2 \ (R→ t2) 23: end if 24: end if 25: end for 26: edgeFiltering(H2); 27: graphPruning(H2); time s1 to s2. In this step, the weight of each edge (t1 → t2) in E1 (except the edges connected to root R) is reduced using the information decay function: w(t1 → t2) = w(t1 → t2)×Dt1→t2(s1,s2) Step 2: Add new relations to existing taxonomy (lines 5 - 25) This step adds new taxonomic relations to the existing taxonomy and updates their weights. It adds each relation t1 → t2 as a directed edge from the parent node t1 to child node t2 if this edge does not exist in the existing taxonomy. Otherwise, we update the weight of this edge with a new evidence score. If t1 does not have any parent node, t1 will become a child node of root R. The edge’s weight is updated as follows: w(t1 → t2) =    1 if t1 = R w(t1 → t2) + ScoreT ime(t1, t2) if t1→t2 ∈ E1 ScoreT ime(t1, t2) otherwise The result of this step is a weighted connected graph containing all taxonomic relations with root R. Step 3: Edge filtering (line 26) The graph gener- ated in Step 2 contains some edges with low evi- dence scores. The reason is that some relations in the existing taxonomy can become outdated during the period from s1 to s2 (according to the informa- tion decay function), and they do not exist in the new relation set. In this step, each edge t1 → t2 in the graph is revisited, and if its weight is lower than the threshold value of ScoreTime, it will be removed from the graph. In the case that t2 does not have any other parent node except t1, t2 will be deleted from the vertex set, and edges from t1 to t2’s chil- dren will be added to the edge set with weights that are equal to the weights of the edges from t2 to t2’s children. Then, all edges from t2 to t2’s children will be removed from the edge set. Step 4: Graph pruning (line 27) The graph gener- ated in Step 3 is not an optimal tree as it may con- tain redundant edges or incorrect edges—for exam- ple, those edges that form a loop in the graph. This step aims to produce an optimal tree of the taxon- omy from the weighted graph in Step 3. For this purpose, we apply Edmonds’ algorithm (Edmonds, 1967) for finding the optimal spanning arborescence for a weighted directed graph. Using this algorithm, we can find a subset of the current edge set that forms a taxonomy where every non-root node has in-degree 1 and the sum of the edge weights is max- imized. 5 Performance Evaluation We have conducted two experiments for perfor- mance evaluation. The first experiment evaluates the performance of our proposed time-aware method on constructing a taxonomy from a given list of terms without any prior knowledge (i.e., without any exist- ing taxonomies). The second experiment evaluates the performance of our proposed method on taxon- omy update. 5.1 Datasets We evaluate our method for taxonomy construction based on the following four datasets of document collections obtained from different domains: • Artificial Intelligence (AI) domain (Navigli et al., 2011): The corpus is about the root term ‘Artificial Intelligence’, consisting of 4,976 papers extracted from the IJCAI proceedings 557 from 1969 to 2014 and the ACL archives from year 1979 to 2014. • MH370 domain: The corpus is about the root term ‘Issues related to MH370 search’. MH370 is the missing flight that went down in the ocean on Saturday, 8 March 2014. The corpus is created by querying the Google search en- gine with the keyword “MH370” from March 08, 2014 to April 30, 2014 and collecting the first 300 documents from the search results each day. After removing duplicates, the cor- pus contains a total of 12,307 documents. • Terrorism domain: The corpus is about the root term ‘Terrorism’. It contains 293 reports from “Patterns of Global Terrorism (1988-2003)” 1 and “Country Reports on Terrorism (2004- 2014)” 2 of the US state department. Each re- port contains about 1,500 words. • Disease domain: The corpus is about the root term ‘Disease outbreak’, created by collecting reports from “Disease outbreaks by year from 1996 to 2014” 3 of WHO, and the email archive of ProMed 4 which is an email based reporting system dedicated to reporting on disease out- breaks that affect human health. The corpus contains a total of 25,370 reports/emails. Parameter settings. For the rapidly changing do- main ‘MH370’, we choose ‘day’ as the unit of the time lapse whereas for the other three domains, we use ‘year’ as the time lapse unit. We set the thresh- old value of ScoreTime in Equation (5) as 2.2, and the control rate ξ in Equation (1) and decay rate λ in Equation (6) as 0.15. The setting of these parameters will be discussed in Section 5.4. 5.2 Evaluation on taxonomy construction 5.2.1 Experiment In this experiment, we compare our time-aware taxonomy construction method with other state-of- the-art methods in the task of constructing a new taxonomy from a given list of terms without any prior knowledge (i.e., without any existing taxon- omy). Three state-of-the-art methods in the litera- ture are selected for comparison: 1http://www.fas.org/irp/threat/terror.htm 2http://www.state.gov/j/ct/rls/crt/index.htm 3http://www.who.int/csr/don/archive/year/en/ 4http://www.promedmail.org/ • Zhu’s method (Zhu et al., 2013): It constructs the taxonomy using evidence from multiple sources such as WordNet, Wikipedia and Web search engines. In their method, both statisti- cal and linguistic approaches are used to infer taxonomic relations. • Kozareva’s method (Kozareva and Hovy, 2010): It constructs the taxonomy using evi- dence from a Web search engine by matching the search results with a predefined set of syn- tactic patterns. • Tuan’s method (Tuan et al., 2014): It is the non time-aware method described in Section 4.2.1. This method ignores temporal information dur- ing taxonomy construction. To evaluate the effectiveness of extracting times- tamps at the sentence level (as described in Section 4.1), we also conduct an experiment on a setting that uses the timestamps out the document level (i.e., all evidence in the document will have the same times- tamp information as the document creation date). We use the subscript docstamp to denote this setting. 5.2.2 Evaluation metric In this experiment, we evaluate the constructed taxonomies against the manually created gold stan- dard taxonomies. The gold standard taxonomies are created as follows. For each domain, two annotators are employed at the same time to create taxonomies independently using the list of terms obtained from the domain term extraction module, according to the following rules: • Rule 1 (Relevancy): Every term in the taxon- omy should be related to the root term. • Rule 2 (Appropriateness): Each edge between two terms should be established at the time the taxonomy is created, if their relation is correct and not obsolete. A relation is obsolete if it is invalid at the time of consideration. • Rule 3 (Hierarchical structure): The gold stan- dard taxonomy of each domain should form a tree, without redundant paths or cycles. The annotators then compare their constructed taxonomies. A taxonomic relation t1 → t2 is counted as an agreement if and only if both an- notators have t1 and t2 in their taxonomies, and there is a directed path from t1 to t2. If an anno- tator has a taxonomic relation with one vertex not 558 Domain Number of vertices Average depth MH370 257 3.2 AI 925 5.3 Terrorism 312 3.6 Disease 459 4.0 Table 1: Analysis of gold standard taxonomies. in the other annotator’s taxonomy, it will be consid- ered as a disagreement. After evaluation, the aver- age inter-annotator agreement on edges of the con- structed taxonomies between the two annotators is 87% using Cohen’s kappa coefficient measurement. Finally, the two annotators discuss to come up with the gold standard taxonomies. As a result, the num- ber of nodes and average depth of the taxonomies are summarized in Table 1. We use precision, re- call and F-measure to measure the performance of taxonomy construction. Let R and Rgold be the set of taxonomic relations of our constructed taxonomy and the gold standard taxonomy respectively; then the metrics are given as follows: precision = |R∩Rgold| |R| ; recall = |R∩Rgold| |Rgold| ; F-measure = 2× precision×recall precision + recall . 5.2.3 Experimental results The experimental results are given in Table 2 which shows that our time-aware method achieves significantly better performance than Kozareva’s method and Zhu’s method in terms of F-measure (t-test, p-value<0.05). Our method shows slightly lower precision than that of Kozareva’s method due to the SCS method, but much higher recall and F- measure than Kozareva’s method. In contrast, our method shows slightly lower recall but much higher precision and F-measure than Zhu’s method, which is based on statistical methods such as pointwise mutual information and cosine similarity. On aver- age, our time-aware method improves the F-measure by 20% compared to Kozareva’s method, and by 10% compared to Zhu’s method. Moreover, the incorporation of timestamps into the time-aware method also contributes to better per- formance as it helps identify new taxonomic rela- tions effectively, while getting rid of obsolete and incorrect relations. As shown from the experimental results, the time-aware method shows significantly better performance than the non time-aware method (i.e. Tuan’s method) in all four domains in terms of Method Domain P R F Kozareva MH370 89% 36% 51% Zhu MH370 60% 59% 59% Tuan MH370 85% 49% 62% Time-awaredocstamp MH370 81% 54% 65% Time-aware MH370 86% 57% 69% Kozareva AI 90% 37% 52% Zhu AI 59% 57% 58% Tuan AI 84% 51% 63% Time-awaredocstamp AI 80% 55% 65% Time-aware AI 84% 57% 68% Kozareva Terrorism 91% 32% 47% Zhu Terrorism 60% 64% 62% Tuan Terrorism 86% 49% 62% Time-awaredocstamp Terrorism 80% 56% 66% Time-aware Terrorism 87% 60% 71% Kozareva Disease 90% 29% 44% Zhu Disease 57% 56% 56% Tuan Disease 83% 50% 62% Time-awaredocstamp Disease 81% 53% 64% Time-aware Disease 85% 55% 67% Table 2: Experimental results for taxonomy construction. P stands for Precision, R for Recall, and F for F-measure. F-measure (t-test, p-value<0.05). On average, our time-aware method improves the F-measure by 7% compared to Tuan’s method. We further examine the taxonomic relations iden- tified by the time-aware method but not by the non- time-aware method, and vice versa. We observed that around 91% of relations found by the time- aware method but not by the non-time-aware method are recent relations (i.e., relations found in recent documents), while around 86% of relations found by the non-time-aware method but not by the time- aware method are obsolete relations. The percentage of taxonomic relations that become obsolete in each of the datasets are summarized in Table 3. Domain Percentage of obsolete relations MH370 21% AI 5% Terrorism 12% Disease 7% Table 3: Percentage of taxonomic relations that become obsolete. For example, in the Terrorism domain, our method recognizes ‘ISIS’ as a hyponym of ‘terrorist group’, while the three state-of-the-art methods can- not recognize this. In addition, while the other three methods have extracted the outdated taxonomic re- lation between ‘Palestine Liberation Organization’ 559 and ‘terrorist group’, our method was able to ig- nore it. The reason is that the three state-of-the- art methods inferred taxonomic relations using co- occurrence frequency, but ‘ISIS’ has only appeared in reports since 2014. The occurrence frequency of ‘ISIS’ was very low compared to ‘Palestine Liber- ation Organization’ which was mentioned over the past many years. In contrast, by using the timestamp contribution function to better profile the relevance of evidence over time, our method can recognize the recent relationship of ‘terrorist group’ with ‘ISIS’ while getting rid of the obsolete and incorrect rela- tion with ‘Palestine Liberation Organization’. From the experimental results of the time-aware and time-awaredocstamp methods, we also observe that the use of timestamps extracted at the sentence level is more effective than the use of timestamps at the document level. The timestamps extracted at the sentence level can capture more precisely the tempo- ral information of the facts in fast-changing domains than those at the document level. The results showed that the use of sentence-level timestamps can im- prove the precision and recall of our taxonomy con- struction method, improving the F-measure by 4% on average, as compared to the use of timestamps at the document level. 5.3 Evaluation on taxonomy update 5.3.1 Experiment For fast-changing domains, taxonomies should be frequently and quickly updated. In this experiment, we examine how the proposed time-aware method can effectively update the constructed taxonomies over time to keep up with the latest information trends. We use the case study of the ‘MH370’ domain for this experiment. During the search operation for the missing flight MH370, there were several turning points which can be captured by the follow- ing phases (according to well-known news agencies such as CNN, BBC and the New York Times): • Phase 1 (from March 08, 2014): The flight lost contact with the airport. The search started from the South China Sea and Gulf of Thailand, and was extended to the Strait of Malacca. • Phase 2 (from March 13, 2014): Images from satellites indicated the plane might have fallen into the Indian Ocean. The search focus was moved from the South of Sumatra to the South- West of Perth in the Southern Indian Ocean. • Phase 3 (from March 28, 2014): Estimation of the aircraft’s remaining fuel and the radar track led the search to shift to a new area, the North- West of Perth in the Southern Indian Ocean. We apply the proposed time-aware method to con- struct and update the taxonomy for ‘MH370’ incre- mentally every two days. We compare our time- aware method with the following three methods: • Zhu’s method (Zhu et al., 2013): It applies a graph-based algorithm to update taxonomies incrementally with timestamp information. • Baseline 1: The taxonomy is updated with the newly obtained data every two days, but does not use any temporal information in either tax- onomic relation identification (Section 4.2.2) or taxonomy induction (Section 4.2.4). Specifi- cally, Step 1 (update existing taxonomy) in Sec- tion 4.2.4 is excluded since we are not using any temporal information, so there is no updat- ing of the weights of the existing taxonomy us- ing the decay function. • Baseline 2: We construct the taxonomies using temporal information every two days, but only with the new documents from these two days. This allows us to evaluate the effect of retiring all the taxonomic relations built from the pre- vious documents instead of the gradual decay approach in our proposed method. Here we have chosen the time period of two days because the ‘MH370’ domain was a truly fast- changing domain. As we shall see shortly, even us- ing only the new documents within 2 days to build the taxonomy in our baseline method 2, there were new taxonomic relations updated from the latest in- formation (as shown in the example in Figure 4). 5.3.2 Evaluation metric When constructing the gold standard taxonomies using the same rules described in Section 5.2, we asked the annotators to select for each parent term at most three sub-terms that are most related to it at the time of taxonomy construction. We denote the set of gold-standard taxonomic relations as Sgold. In the same way, when applying the methods of tax- onomy construction, we select for each parent term 560 at most three sub-terms with the highest evidence scores. We denote the set of those automatically ex- tracted taxonomic relations as S. We use the follow- ing metrics to evaluate the update of taxonomy: precision = |S ∩Sgold| |S| ; recall = |S ∩Sgold| |Sgold| ; F-measure = 2× precision×recall precision + recall . The intuition for limiting the sub-term number to three for the evaluation is that if a taxonomy can keep up with the newly updated data, it should be able to detect the emerging terms and relations and add them to the taxonomy with high evidence scores so that the user can easily observe an emerging trend of information in the domain as it occurs. In addi- tion, the method should also have the capability to remove any obsolete relations in the taxonomy when they are no longer valid. 5.3.3 Experimental results From the results shown in Figure 3, we can see that our time-aware method achieves the best performance and significantly outperforms the two baseline methods and Zhu’s method in terms of F-measure (t-test, p-value<0.05). One interest- ing point to observe is that there are two periods when the time-aware method shows much higher F-measure than the baseline methods and Zhu’s method: from March 12 to March 14, and from March 28 to March 30. During these periods, the performance of baseline method 1 (which does not use any timestamp information) and Zhu’s method drops significantly, while our time-aware method’s performance increases slightly. One plausible explanation is that there are some turning points on March 13 and March 28, which fall within these periods as described above. Dur- ing these periods, many new terms/relations such as search area, search focus and search device are added to the corpus. Our time-aware method was able to assign higher weights to the new taxonomic relations than the older relations due to their recent timestamps, even though the frequencies of these new relations are fewer than that of the older re- lations. In contrast, Zhu’s method and baseline method 1 were unable to recognize these new rela- tions due to their relatively low frequencies in the corpus. In addition, incorrect relations in the exist- ing taxonomy were also removed from the new tax- 0.4$ 0.45$ 0.5$ 0.55$ 0.6$ 0.65$ 0.7$ 0.75$ 0.8$ 10 *M ar $ 12 *M ar $ 14 *M ar $ 16 *M ar $ 18 *M ar $ 20 *M ar $ 22 *M ar $ 24 *M ar $ 26 *M ar $ 28 *M ar $ 30 *M ar $ 2* A pr $ 4* A pr $ 6* A pr $ 8* A pr $ 10 *A pr $ 12 *A pr $ 14 *A pr $ 16 *A pr $ 18 *A pr $ 20 *A pr $ 22 *A pr $ 24 *A pr $ 26 *A pr $ 28 *A pr $ 30 *A pr $ F" m ea su re ) Figure 3: Performance results on taxonomy update over time in the ‘MH370’ domain. onomy using the information decay function by our time-aware method, whereas the other two methods still kept them in the taxonomy. In short, our time- aware method can update the taxonomy faster with the latest information trends, as well as remove in- correct relations effectively, as compared to the other methods. Also, from the experimental results of our time- aware method and the baseline method 2, we can observe that updating the existing taxonomy with new taxonomic relations is more effective than re- building a new taxonomy using only the new data. The reason is that although some older taxonomic relations are mentioned occasionally in the new data, they are still valid. Therefore, if we ignore the older data, their taxonomic relations will be lost in the new taxonomy when it is constructed with only the new data. In addition, there are also many taxonomic re- lations that needed a longer time period to become established. Figure 4 shows an example of the changes of the hyponym list for the term ‘search area’ over time using different methods. We observe that both the time-aware method and baseline method 2, which utilized the temporal information, can quickly up- date the relations with the latest information as com- pared to Zhu’s method and baseline method 1, which ignore temporal information for taxonomy construc- tion. For example, in the taxonomy constructed on March 14, the time-aware method and baseline method can quickly recognize the change of the search area to ‘Southern Indian Ocean’ and ‘Suma- tra’, thereby ranking them at the top of the hy- ponym list of ‘search area’, whereas Zhu’s method 561 South China Sea Gulf of Thailand Malacca Strait Zhu’s method Baseline method 1 Time-aware method 2014-03-12 2014-03-14 2014-03-26 2014-03-30 2014-04-30 South China Sea Gulf of Thailand Southern Indian Ocean Southern Indian Ocean South China Sea Sumatra South China Sea Gulf of Thailand Malacca Strait South China Sea Southern Indian Ocean Gulf of Thailand Southern Indian Ocean South China Sea Sumatra South China Sea Malacca Strait Gulf of Thailand Southern Indian Ocean Sumatra South China Sea Southern Indian Ocean South-West of Perth South China Sea Southern Indian Ocean South China Sea South-West of Perth North-West of Perth Southern Indian Ocean South-West of Perth North-West of Perth Southern Indian Ocean South China Sea North-West of Perth Southern Indian Ocean South China Sea North-West of Perth Southern Indian Ocean 2014-03-13 2014-03-28 Actual search area: South China Sea, Malacca Strait, Gulf of Thailand Actual search area: Sumatra, South-West of Perth, Southern Indian Ocean Actual search area: North-West of Perth, Southern Indian Ocean Baseline method 2 South China Sea Malacca Strait Gulf of Thailand Southern Indian Ocean Sumatra South China Sea Southern Indian Ocean South-West of Perth North-West of Perth Southern Indian Ocean North-West of Perth Southern Indian Ocean Southern Indian Ocean Sumatra South-West of Perth Figure 4: Top three hyponyms of ‘Search area’ in ‘Issues related to MH370 search’ taxonomies over time. and baseline method 1 both missed this update un- til March 26. Another interesting point is that due to the lack of temporal information, both Zhu’s method and baseline method 1 still ranked ‘South China Sea’ at the top of the taxonomies constructed on April 30, while this term was removed earlier from the hyponym list of ‘search area’ by our time-aware method using temporal information. 5.4 Parameter tuning In our method for taxonomy construction, some pa- rameters are tuned to optimize performance. The threshold value for ScoreTime in Equation (5) controls the number of extracted taxonomic re- lations. In general, the larger this threshold value is, the higher number of true taxonomic relations we can get. However, a higher number of incorrect re- lations may also occur. From our experiments, we found that the threshold value for ScoreTime can be set between 2.1 to 2.3 for the time-aware method to achieve the best performance. The control rate ξ in Equation (1) and decay rate λ in Equation (6) affect the contribution of old and new data. Specifically, smaller values for the con- trol rate and decay rate will allow newer data to con- tribute more evidence of taxonomic relations than older data, whereas larger values will cause the old and new data to have similar evidence contribu- tions. According to our experiments, the time-aware method shows the best performance when the values of these rates are set between 0.15 to 0.20. 6 Conclusion In this paper, we have proposed a novel time-aware method for taxonomy construction given a time se- ries of text documents from a domain that could be fast-changing with emerging concepts or events. By using timestamp contribution and information decay functions, our method can effectively utilize tempo- ral information for both taxonomic relation identifi- cation and taxonomy update. The experimental re- sults show that our method achieves better perfor- mance than the state-of-the-art methods. In addi- tion, the proposed method can be used to update the taxonomy incrementally over time and keep the tax- onomy up-to-date with the latest information trends for the domain. All the datasets, including the gold standards of the four domains and the outputs of our method, are publicly available at https://sites. google.com/site/tuanluu219/research/tacl1. Acknowledgements We are grateful to action editors Lillian Lee and Bo Pang and the anonymous reviewers for their helpful suggestions, which substantially improved the present paper. 562 https://sites.google.com/site/tuanluu219/research/tacl1 https://sites.google.com/site/tuanluu219/research/tacl1 References David M. Blei, Thomas L. Griffiths, Michael I. Jordan, and Joshua B. Tenenbaum. 2004. Hierarchical Topic Models and the Nested Chinese Restaurant Process. Advances in Neural Information Processing Systems, pages 17–24. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Col- laboratively Created Graph Database for Structuring Human Knowledge. Proceedings of the ACM SIG- MOD International Conference on Management of Data, pages 1247–1250. Angel X. Chang and Christopher D. Manning. 2012. SUTIME: A Library for Recognizing and Normaliz- ing Time Expressions. Proceedings of the Conference on Language Resources and Evaluation, pages 3735– 3740. Jack Edmonds. 1967. Optimum Branchings. Journal of Research of the National Bureau of Standards B, 71:233–240. Miles Efron and Gene Golovchinsky. 2011. Estimation Methods for Ranking Recent Information. Proceed- ings of the 34th ACM SIGIR conference, pages 495– 504. Samah Fodeh, Bill Punch, and Pang N. Tan. 2011. On Ontology-driven Document Clustering Using Core Se- mantic Features. Knowledge and Information Sys- tems, 28(2):395–421. Sanda M. Harabagiu, Steven J. Maiorano, and Marius A. Pasca. 2003. Open-domain Textual Question An- swering Techniques. Natural Language Engineering, 9(3):231–267. Trevor Hastie, Robert Tibshirani, and Jerome H. Fried- man. 2009. The Elements of Statistical Learning. Springer-Verlag. Marti A. Hearst. 1992. Automatic Acquisition of Hy- ponyms from Large Text Corpora. Proceedings of the 14th Conference on Computational Linguistics, pages 539–545. Zornitsa Kozareva and Eduard Hovy. 2010. A Semi-supervised Method to Learn and Construct Tax- onomies Using the Web. Proceedings of the Confer- ence on Empirical Methods in Natural Language Pro- cessing, pages 1110–1118. Zornitsa Kozareva, Ellen Riloff, and Eduard H. Hovy. 2008. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs. Proceedings of the 46th Annual Meeting of the ACL, pages 1048–1056. Dawn J. Lawrie and W. Bruce Croft. 2003. Generating Hierarchical Summaries for Web Searches. Proceed- ings of the 26th ACM SIGIR conference, pages 457– 463. Xiaoyan Li and Bruce W. Croft. 2003. Time-based Lan- guage Models. Proceedings of the 12th ACM CIKM conference, pages 469–475. Baichuan Li, Jing Liu, Chin-Yew Lin, Irwin King, and Michael R. Lyu. 2013. A Hierarchical Entity-based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers. Proceed- ings of the EMNLP conference, pages 1521–1532. Cynthia Matuszek, John Cabral, Michael J. Witbrock, and John DeOliveira. 2006. An Introduction to the Syntax and Content of CYC. Proceedings of the AAAI Spring Symposium, pages 44–49. George A. Miller. 1995. WordNet: A Lexical Database for English. Communications of the ACM, 38(11):39– 41. Roberto Navigli, Paola Velardi, and Stefano Faralli. 2011. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch. Proceedings of the 20th In- ternational Joint Conference on Artificial Intelligence, pages 1872–1877. Yves Petinot, Kathleen McKeown, and Kapil Thadani. 2011. A Hierarchical Model of Web Summaries. Pro- ceedings of the 49th Annual Meeting of the ACL, pages 670–675. Aurora Pons-Porrata, Rafael Berlanga-Llavori, and Jose Ruiz-Shulcloper. 2007. Topic Discovery Based on Text Mining Techniques. Information processing & management, 43(3):752–768. Mark D. Smucker and Charles L. Clarke. 2012. Time- based Calibration of Effectiveness Measures. Pro- ceedings of the 35th ACM SIGIR conference, pages 95–104. Luu A. Tuan, Jung J. Kim, and See K. Ng. 2014. Tax- onomy Construction using Syntactic Contextual Evi- dence. Proceedings of the EMNLP conference, pages 810–819. Luu A. Tuan, Yi Tay, Siu C. Hui, and See K. Ng. 2016. Learning Term Embeddings for Taxonomic Relation Identification Using Dynamic Weighting Neural Net- work. Proceedings of the EMNLP conference, pages 403–413. Chi Wang, Marina Danilevsky, Nihit Desai, Yinan Zhang, Phuong Nguyen, Thrivikrama Taula, and Jiawei Han. 2013. A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy. Proceedings of the 19th ACM SIGKDD conference, pages 437–445. Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q. Zhu. 2012. Probase: A Probabilistic Taxonomy for Text Understanding. Proceedings of the ACM SIG- MOD conference, pages 481–492. Jianxing Yu, Zheng-Jun Zha, Meng Wang, Kai Wang, and Tat-Seng Chua. 2011. Domain-assisted Product As- pect Hierarchy Generation: Towards Hierarchical Or- ganization of Unstructured Consumer Reviews. Pro- ceedings of the EMNLP conference, pages 140–150. Xingwei Zhu, Zhao Y. Ming, and Tat-Seng Chua. 2013. Topic Hierarchy Construction for the Organization of 563 Multi-source User Generated Contents. Proceedings of the 36th ACM SIGIR conference, pages 233–242. 564