key: cord-0667785-5s2dqh46
authors: Jeon, Yeseul; Chung, Dongjun; Park, Jina; Jin, Ick Hoon
title: Network-based Trajectory Analysis of Topic Interaction Map for Text Mining of COVID-19 Biomedical Literature
date: 2021-06-07
journal: nan
DOI: nan
sha: 0499dacc15048127b251cf8e4248b0ee125069d4
doc_id: 667785
cord_uid: 5s2dqh46

Since the emergence of the worldwide pandemic of COVID-19, relevant research has been published at a dazzling pace, which makes it hard to follow the research in this area without dedicated efforts. It is practically impossible to implement this task manually due to the high volume of the relevant literature. Text mining has been considered to be a powerful approach to address this challenge, especially the topic modeling, a well-known unsupervised method that aims to reveal latent topics from the literature. However, in spite of its potential utility, the results generated from this approach are often investigated manually. Hence, its application to the COVID-19 literature is not straightforward and expert knowledge is needed to make meaningful interpretations. In order to address these challenges, we propose a novel analytical framework for estimating topic interactions and effective visualization for topic interpretation. Here we assumed that topics constituting a paper can be positioned on an interaction map, which belongs to a high-dimensional Euclidean space. Based on this assumption, after summarizing topics with their topic-word distributions using the biterm topic model, we mapped these latent topics on networks to visualize relationships among the topics. Moreover, in the proposed approach, we developed a score that is helpful to select meaningful words that characterize the topic. We interpret the relationships among topics by tracking the change of relationships among topics using a trajectory plot generated with different levels of word richness. These results together provide deeply mined and intuitive representation of relationships among topics related to a specific research area. The application of this proposed framework to the PubMed literature shows that our approach facilitates understanding of the topics constituting the COVID-19 knowledge.

Since the emergence of the worldwide pandemic of COVID-19, relevant research has been published at a dazzling pace, which makes it hard to follow the research in this area without dedicated efforts. It is practically impossible to implement this task manually due to the high volume of the relevant literature. Text mining has been considered to be a powerful approach to address this challenge, especially the topic modeling, a well-known unsupervised method that aims to reveal latent topics from the literature. However, in spite of its potential utility, the results generated from this approach are often investigated manually. Hence, its application to the COVID-19 literature is not straightforward and expert knowledge is needed to make meaningful interpretations. In order to address these challenges, we propose a novel analytical framework for estimating topic interactions and effective visualization for topic interpretation. Here we assumed that topics constituting a paper can be positioned on an interaction map, which belongs to a high-dimensional Euclidean space. Based on this assumption, after summarizing topics with their topic-word distributions using the biterm topic model, we mapped these latent topics on networks to visualize relationships among the topics. Moreover, in the proposed approach, we developed a score that is helpful to select meaningful words that characterize the topic. We interpret the relationships among topics by tracking the change of relationships among topics using a trajectory plot generated with different levels of word richness. These results together provide deeply mined and intuitive representation of relationships among topics related to a specific research area. The application of this proposed framework to the PubMed literature shows that our approach facilitates understanding of the topics constituting the COVID-19 knowledge. others, as profiled and categorized in LitCovid, a biomedical literature database dedicated to COVID-19 (Chen, Allot and Lu, 2020) . In order to facilitate understanding of relevant mechanisms and come up with effective treatment and preventive strategies, it is critical to follow up and digest these publications that are being published in a fast phase. However, given the rich volume of COVID-19 literature and its rapid publication phase, it is practically impossible for biomedical experts to trace all of these literature manually in real time.

Statistical and computational approaches, especially text mining, can be a powerful solution for the researchers to address this challenge. Since literature is unstructured text data, a document is usually first summarized with the corpus, representing word counts. The tfidf approach is most commonly used to summarize the document with the words' frequency (Chowdhury, 2010) . In spite of its popularly, this approach is still naive and limited in the sense that each single word is treated independently and interactions among them are totally ignored. However, such latent relationships can provide invaluable information that leads to effective modeling of literature and ultimately deeper understanding of the literature. Therefore, it is of great interest to cluster words effectively and understand meaning of these clusters.

In the text mining area, topic modeling is the most well-known approach for this purpose. It essentially extracts topics based on the words' distribution in each document and various algorithms have been proposed for topic modeling. Blei, Ng and Jordan (2003) made an important step in this direction and proposed a generative probabilistic model for word set, called the Latent Dirichlet Allocation (LDA). LDA assumes each document is structured in the context of topics while words are allocated to these topics. The topics are regarded as latent variables and characterized by words with probabilities assigned to each topic. To estimate latent topics, LDA uses Bayesian statistical methods such as Expectation-Maximization (EM) algorithm and variational inference to construct the posterior distributions of topic distributions and topic-word distributions. The topic distribution models how likely each topic is relevant to a given document. The topic-word distribution represents the probability how likely each given word is associated with a particular topic. Other topic modeling based on LDA is to model literature text data includes author-document models (McCallum, Corrada-Emmanuel and Wang, 2005) and abstract-reference models (Erosheva, Fienberg and Lafferty, 2004) . From the perspective of biological or bimedical literature text mining, Liu et al. (2016) applied topic modeling to biological literature and showed that topic modeling can be a practical solution for the bioinformatics research.

Since COVID-19 is a hot topic and its research is active and ongoing, some articles are released without full context but with only abstract. Abstracts of the COVID-19 literature are categorized as short text. Unfortunately, with short texts, conventional topic models including LDA often suffer from poor performance due to the sparsity problem, which causes inferior topic inferences. There have been some attempts to handle this problem by using additional assumptions for short text data. For instance, Zhao et al. (2011) and Lakkaraju, Bhattacharya and Bhattacharyya (2012) trained a topic model on tweet data as a mixture of unigrams (Nigam et al., 2000) , where a document is considered as a word set drawn independently from a single topic. Likewise, Gruber, Weiss and Rosen-Zvi (2007) assumed that the words within the same sentence share the same topic. Although these constraints may help alleviate the data sparsity problem caused by short text data, this improvement was achieved by using more information to capture multiple topics from each document. Moreover, these assumptions tend to result in peaked posteriors of topics in a document, making the model susceptible to overfitting (Blei, Ng and Jordan, 2003) . Yan et al. (2013) made important progress for modeling of short text data, so-called Biterm Topic Model (BTM). Unlike the LDA, BTM replaced words with bi-terms, where a bi-term is defined as a set of two words occurring in the same document. This approach attempted to make up for the lack of words in a short text by pairing two words, thereby creating more words in the document. This allows considering that if two words are mentioned together, they are more likely to belong to the same topic. Some studies apply BTM to short text data such as micro-blog (Li et al., 2016) , large-scale short texts, and tweets for clustering and classification (Chen and Kao, 2017) .

1.1. Interaction Map of Topic Relationships. It is important to note that it is hard to find the data that contain independent topics towards one subject in the real world. Still, instead, multiple topics are often inter-correlated. Therefore, to tackle this practical issue, there have been attempts to model interactions among topics by modeling the correlated structure within the topic model or combining the topic model with statistical models. For instance, Blei and Lafferty (2007) regarded the topic of correlation as a structure of heterogeneity. To capture the heterogeneity of topics, they suggest Correlated Topic Model (CTM), which models topic proportions to exhibit correlation through the logistic normal distribution within LDA. They validated their performance by applying CTM to the articles from the journal Science. On the other hand, Rusch et al. (2013) combined LDA with model trees to interpret the topic relationships from Afghanistan war logs. It classified the topics into tree structures which helped to understand the different circumstances in the Afghanistan war.

One of statistical methods for handling correlated structure is network analysis. Specifically, network modeling estimates the relationships of interactions among nodes based on their dependency structure. It can provide global and local representation of nodes at the same time. In this context, Mei et al. (2008) tried to identify topical communities by combining a topic model with a social network. Specifically, it estimated topic relationships given the known network structure and tried to separate out topics based on connectivity among them. However, in this approach, the network information needs to be given to construct and visualize the topic network, which is not a trivial task in practice.

In addition, there has been an attempt to model the hierarchical structure among topics. Specifically, Airoldi and Bischof (2016) modeled the count for a specific word contributed by each topic using a Poisson distribution. Their model generated the hierarchical topic structure with an estimated rate determined based on the length of documents and memberships to the topic. Furthermore, they evaluated the model using the novel score called FREX, which quantifies words' closeness within the topic based on the word frequency and exclusivity. For example, FREX i,j reflects how word j is exclusively close to topic i compared to other topics. The FREX score ranked the closeness of words within the topic i, taking into account the hierarchical topic structure. Their model captures the hierarchical relationships among topics and validates the model performance as well. However, assuming the hierarchical structure among topics is not trivial without prior information from data expert knowledge. Therefore, we focus on general interactions among topics without considering the hierarchical directions.

To estimate the interactions among topics and visualize their dependency relationships simultaneously, we consider a network model to capture the dependency structure among topics. In addition, we also aim to overcome a static topic network by dynamically representing fundamental associations among topics through tracking the change in relationships. Our approach consists of four steps to estimate the interaction relationship among topics. In the first step, text mining is conducted to extract nouns and corpus is constructed for modeling literature data. In the second step, we use BTM (Yan et al., 2013) to estimate topics and their associated words. In the third step, latent topic positions are estimated based on the distance that measures similarity between a topic and a word. In the last step, we visualize relationships among topics by tracking the transition of latent positions of topics. This approach represents interactions among topics on the latent positions and this facilitates understanding of distinct properties of the topics. Specifically, the more words topic A shares with topic B, the more likely topic A is located closer to topic B, which nicely depicts the topic relationships identified by BTM.

This article is composed of three parts. First, we will briefly summarize the contributions of this paper. Second, we will introduce our methods and describe how we combine two different statistical approaches: (1) the topic modeling, specifically the BTM, a topic model alternative to LDA for the purpose of tackling the short text problem; and (2) the latent space item response model (Jeon et al., 2021) , which estimates associations between items based on the latent distance between item and response. Finally, we will apply our approach to the COVID-19 literature to evaluate and demonstrate the usefulness of our approach.

There are four key contributions of this work. First of all, our paper estimates and visualizes the topic relationships by mapping unobserved interactions among topics onto the interaction map based on the latent distance between topics and words, where the interactions among topics are quantified by their interactions with words. This embedded derivation of topic relationships will reduce the burden of data analysts because it does not require prior knowledge about relationships between topics, e.g., a connectivity matrix. Instead, our approach utilizes the topic-word matrix for calculating the latent positions of each topic, where each cell corresponds to the probability of a word's priority linkage to a topic. Since our approach uses continuous values ranging from 0 to 1 for the modeling purpose, it can quantify closeness between topics by reflecting degrees of linking of words to topics and, hence, using latent positions for topics, we could represent relationships among topics more precisely.

Secondly, we provide the score s i,j that quantifies the closeness of word j to topic i. High score words helps to understand the topics more precisely without experts knowledge. In general, topic models return the probability that measures how frequently words are mentioned within the topics. Since estimated probability from topic models does not reflect the exclusivity of words toward topics, it may cause misunderstanding of topics. Moreover, it requires expert knowledge to figure out meaningful words for each topic and characterize the topic using these words. Our approach makes it plausible to get exclusiveness of word j with topic i without subjective interpretation. By discarding redundant information using score, it helps to figure out the true meanings of each topic. Moreover, our score s i,j assists in evaluating the performance of our approach through how top scored words from each topic are relevant in context. Third, we visualize the topic relationship as a trajectory plot to detect the change of interactions of topics over different set of words. This feature has two important properties:

(1) it could measure the main location of the topic, which is steadily positioned in a similar place in spite of differing the network structure; and (2) we could distinguish popular topics mentioned across articles and recently emerging topics by scanning the latent position of each topic. Specifically, if a particular topic shares most of the words it with other topics, it is more likely to be located in the center on the latent interaction map. In contrast, if a specific topic consists mostly of words unique to that topic (e.g., a rare topic or an independent topic containing its own referring words), it is more likely to be located away from the center on the interaction map. For example, in the context of COVID-19, it is more likely that common subjects like 'outbreak' and 'diagnosis' are located in the center while more specific subjects like 'Cytokine Storm' are located more outside on the interaction map.

Finally, this approach helps organize an tremendous amount of literature and mine underlying relationships among topics based on the literature. The topic network visualizes the relationship topics in an intuitive way, which can assist researchers in designing their studies. For instance, if some researchers want to investigate a specific topic, say Topic A, our framework can assist them by providing information answering the three following questions:

(1) Which set of words are associated with Topic A? Researchers can obtain this information from the BTM results. In addition, since we extract meaningful words that distinctively represent each topic's meaning, our approach could further support this investigation with a more refined word set. (2) Are there any other topics related to Topic A, which can be used for extending and elaborating research? Researchers can answer this question by intuitive interrogating the final visualization. (3) Is Topic A a common or specific topic? Since we could trace change of latent topic positions as a function of words set, our method provides relevant insights through topic locations on the interaction map. Specifically, researchers can consider Topic A to be common if it is located in the center, and to be specific if it is located away from the center.

3. COVID-19 Biomedical Literature. We applied our workflow to the COVID-19 literature to investigate its latent semantic structure. In our implementation, we first downloaded the COVID-19 articles published between December 1st, 2019 and August 3rd, 2020 from the PubMed database (https://pubmed.ncbi.nlm.nih.gov) . Specifically, we collected articles of which titles contain "coronavirus2", "covid-19" or "SARS-CoV-2" and this resulted in total 35,585 articles. After eliminating articles without abstracts (i.e., only titles or abstract keywords), our final text data contained the total 15,015 documents.

To construct the corpus, we used abstract keywords that concisely capture messages delivered by the paper. To achieve the richer corpus, we also used the word2vec (Mikolov et al., 2013a) to train against relationships between nouns from the abstract and the abstract keywords. Specifically, word2vec extracted nouns from abstracts, which were embedded near to the abstract keywords, and added those selected words to the corpus.

To train the words' network, Mikolov et al. (2013b) suggested the negative-sampling approach, which fits the network of words by training the near words and unrelated words, and this approach was reported to be efficient in vectorizing the words' relationships. Goldberg and Levy (2014) expanded the negative-sampling strategy to model the words and the contexts through the joint modeling of words and contexts, which makes the problem nonconvex.

Before implementing the negative sampling, we need to assign the window size that defines how many neighbors of words to consider. For example, let's assume that we want to select four neighboring words of 'princess'. According to the word embedding network, there are only a few words located close in the latent Euclidean space, e.g., 'horse', 'money', 'king', 'queen', 'princess', 'prince', 'palace', 'flowers', and '...'. Since we want to select four neighboring words, we set the window size to 2 so that two words from each side centering 'princess' can be considered. Therefore, we define 'king', 'queen', 'prince', and 'palace' as near words for the word 'princess'. On the other hand, we can sample 20 negative words that are not included in these near word sets. By repeating this process, it trains the words' network that reflects the context. In this way, we could obtain the word embedding network with 256 dimensions. Using the trained word2vec network, we selected ten words from the abstract nouns that were near to each abstract keyword. The corpus construction results with 9,643 words from 15,015 documents. We further filtered out noise words including single alphabet, numbers, and other words that are not meaningful, e.g., 'p.001', 'p.05', 'n1427', 'l.', and 'ie'. Finally, to obtain more meaningful topics, we removed common words like 'data', 'analysis', 'fact', and 'disease'. The full list of filtered words can be found in the Supplementary Materials.

We developed a novel framework for estimating topic relationships through an interaction map, which positions topics on an Euclidean latent space. This framework consists of the following four steps, as illustrated in Figure 1 . In the first step, we implement text mining with natural language processing and construct the word corpus by expanding a set of words using the word2vec model (Griffiths and Steyvers, 2004) . In the second step, based on this word corpus, we extract the topic-word distribution, where each topic is characterized with the corresponding topic-specific distribution of words estimated using the BTM. Using this topic-word distribution, we can extract meaningful words that affect topic characteristics. In the third step, the topic relationships are estimated using the latent space item response model (Jeon et al., 2021) , which provides latent topic positions. Finally, in the fourth step, we can map the relationships among topics with topic-specific traces.

Step 1: Processing of text mining results with natural language processing and construction of the word corpus.

Step 2: Extraction of topics, each of which is defined by its own word distribution estimated using the Biterm Topic Model (BTM).

Step 3: Estimate the interactions among topics based on their word distributions as networks, using the latent space item response model (Jeon et al., 2021) .

Step 4: Visualization of relationships among topics using the trajectory plot showing transition of topics' coordinates.

4.1. Estimating Latent Topics using Biterm Topic Model. The text data from biomedical papers will be input for BTM. We extract information from the text using a morphology analysis, one type of natural language processing techniques. Specifically, it splits each word with suffix to identify the base element unit of a term. Among the basic units of words, we first extract nouns in their most basic forms. This set of nouns is called a corpus. We further expand the corpus by adding relevant words using word2vec, which vectorizes distances among words in the Euclidean latent space according to their similarity in meaning. We extract neighboring words based on these distances and enlarge the word set by collecting these words. Following this, it is necessary to understand the overall semantic structures of a given text. It summarizes the latent structures, called topics, by identifying topics and estimating their corresponding clusters of words. That is, we can estimate topics and their distributions of words using topic modeling. The abstract is an excellent source to understand the overall text. Since most abstracts are limited to 200 words, they could be regarded as short texts. Therefore, we use the BTM for literature mining. To implement BTM, we first need a word corpus. Then, we extract biterms from each paper, which is input for the BTM.

The BTM is based on the following assumptions describing how bi-terms and topics jointly related to each other:

• Two words are assumed to belong to some hidden clustered set of words.

• It is assumed that there are hidden topics, each of which represents similar meaning of words. • It is assumed that those hidden clustered sets of words are hidden topics.

Based on these assumptions, as shown in Figure 2 , the likelihood of BTM consists of both topic-word distribution and topic distribution. Therefore, we need two sets of parameters to estimate the topic distribution θ and the topic-word distribution φ z . The whole likelihood of BTM is constructed as follows. First, the prior distribution for words (φ z ) is set to Dirichlet distribution with hyper-parameter β while the prior distribution for topics is set to Dirichlet distribution with hyper-parameter α. Next, we represent the topic with the latent variable z, which follows Multinomial distribution with parameter θ. Likewise, each word follows Multinomial distribution with parameter φ z so that each word can be generated from a specific topic. Therefore, there are three parameters to estimate, including φ z , θ, and z. The likelihood construction process can be summarized as follows:

Step 1. Draw a whole topic distribution from θ ∼ Dirichlet(α).

Step 2. For each biterm b from biterm set B, calculate to which among latent topics z those two words belong: z ∼ Multinomial(θ)

Step 3. Draw a topic-word distribution of topic z from φ z ∼ Dirichlet(β).

Step 4. Draw probabilities for two words from the topic-word distribution corresponding to the selected topic:

After the above steps, the joint likelihood of all biterm set B is calculated as:

The conditional posterior for a latent topic z, p(z|z −b , B, α, β), is given as:

where n z is the number of times that the biterm b is assigned to topic z, z −b refers to the topic without biterm b, and n w|z is the number of times of the word w assigned to the topic z.

Because a direct application of the Gibbs sampling for p(z|z −b , B, α, β) sometimes can result in lack of convergence due to dependency between variables, we use the collapsed Gibbs sampling to constrain unnecessary parameters by integrating out (Liu, 1994) . In particular, because the prior distributions are Dirichlet distributions, φ z and θ can be integrated out.

After some iterations, we can construct distribution of φ w|z and θ z with estimated statistics n w|z and n z , given as follows:

(3) φ w|z = n w|z w n w|z + M β , and θ z = n + α |B| + Kα After then, we can obtain the topic-word distribution φ w|z and the topic distribution θ z . Each topic contains words and their corresponding probabilities so it is meaningful to compare each topic based on their word distributions. The simplest way to distinguish topics is to compare word memberships between topics. Since the output of each topic-word distribution includes all words, we might not be able to determine characteristics of topics if we use all the words that topics share. Therefore, we select only meaningful words from each topic to estimate the relationships among topics based on their representative words.

According to Figure 2 , through the BTM, we can construct the X matrix with dimension N × P , where N denotes the number of words, P refers the number of topics, and each Algorithm 1 Gibbs sampler for BTM Input: the number of topics; K, hyper-parameters; α ,β, biterm set; B Output: distribution ; φ w|z and θz 1: initialize topic assignments randomly for all the biterms 2: for iteration= 1, 2, . . . , N do 3: for b = 1, 2, . . . , B do 4: sample z b from p(z|z −b , B, α, β) 5: update statistics n w|z and nz 6:

compute the parameter φ w|z and θz 7: end for 8: end for The likelihood of BTM is obtained as the joint distribution of topic-word relationships and topics. BTM provides the topic-word distribution φ w|z and the topic distribution θ z . Figure on the right: Through the BTM, we can construct the X matrix, of which dimension is N × P , where each cell is composed of probability for the degree of association of each word with each topic. We extract sets of meaningful words representing characteristics of topics using two statistical measurements, including coefficient variation and maximum probability from each row of matrix X i . cell represents probabilities that each word belongs to each topic. To determine a criterion to choose words, we calculate coefficient variation and maximum probability from each row of matrix X. Each value represents the variation of word probabilities among different topics and the degree of word's linkage to a particular topic. Since BTM results in topic-word distribution where each word's probability is measured within each topic, we need to adjust their variation rather than simply calculating a variance. Therefore, we divide the standard deviation by the mean of word's probabilities from each topic.

There are two reasons for choosing a coefficient variation and maximum probability from each row of matrix X as a criterion to select words. First, it is expected that important words have large variation in probabilities among topics because if a word has low variation across topics, it is likely that word does not represent any topic specifically. Therefore, meaningful words can be selected based on their variations and using coefficient variation can further scale the words' dispersion among topics. Second, in order to be a meaningful word, it should also have high probability in at least one topic. For example, if a word has high variation but only low possibilities across topics, this word still cannot differentiate topics. Therefore, we can effectively characterize the topic by selecting a word with high probability in at least one topics and those with large variation.

Latent Space Item Response Model. We estimate interactions among topics and visualize their relationships by mapping on the interaction map. Hoff, Raftery and Handcock (2002) proposed the latent space model, which expresses a relationship between actors of a network in an unobserved "social space", so-called latent space. Inspired by Hoff, Raftery and Handcock (2002) , Jeon et al. (2021) proposed the latent space item response model (LSIRM) that viewed item response as a bipartite network and estimated the interaction between respondents and items using the latent space.

LSIRM models two parts: an attribute part (β i and θ j in Figure 3 ) and an interaction part (distance term ||v i − u j || in Figure 3 ). In the attribute part, LSIRM estimates how much respondents respond to certain items and how many items are answered by some respondents. In the interaction part, LSIRM estimates the interaction between items and respondents, along with the latent position of each item and respondent on the interaction map. Given our goal to estimate the interactions among topics and visualize their latent positions on the interaction map based on their associated words, we use LSIRM with X as a bipartite network, where an item indicates a topic and respondents refers to words. However, the original LSIRM proposed by Jeon et al. (2021) cannot be directly applicable here because it was designed for binary item response dataset, where each cell in the item response data is binary valued (0 or 1). On the contrary, here our input data X has continuous probabilities indicating how likely each word belongs to each topic. Therefore, in order to apply LSIRM to our input data X, we need to expand the Jeon et al. (2021) model to a Gaussian version, which is described in detail below.

The modified Gaussian version of LSIRM can be written as

where x j,i indicates the probability of word j belongs to topic i, for i = 1, · · · , P and j = 1, · · · , N . Because the original LSIRM use logit link function to handle the binary data, here we use the linearity assumption between x j,i and the attribute part with the interaction part. We add an error term j,i ∼ N (0, σ 2 ) to satisfy the normality equation. We use the notation

and ||v i − u j || represents the Euclidean distance between latent positions of word j and topic i. Here, the shorter distance between v i and u j implies the higher probability that word j links to in topic i. Therefore, latent positions of topics can be estimated based on the distances with words. Given the model described above, we use Bayesian inference to estimate parameters in the Gaussian version of LSIRM. We specify prior distributions for the parameters as follows:

where 0 is a d-vector of zeros and I d is the d × d identify matrix. We fixed τ 2 β as constant value. The posterior distribution of LSIRM is

and we use Markov Chain Montel Carlo (MCMC) to estimate the parameters of LSIRM. In this way, we can obtain latent positions of u j and v i on the interaction map R d . Since we are interested in constructing the topic network, we utilize v i , i = 1, · · · , P and make it as matrices of A ∈ R d×P . Let we have K number of different word sets in X, we can construct the various sets of matrices X k , k = 1, · · · , K to estimate and trace their interactions based on different word sets. After proceeding with the LSIRM model with various sets of matrices X k , we could obtain matrices of A k , composed of coordinates of each topic. In order to further improve interpretation of relationships among topics, we trace how their latent positions change as a function of word sets. Specifically, we compare topics' latent positions A k from each matrix X k with the following steps. First, we implement a Procrustes matching two times: (1) within the MCMC samples generated from LSIRM, to tackle the invariance property (so-called withinmatrix matching); and (2) for the estimated matrices, to locate topics on the same quadrant (so-called between-matrix matching). Second, we take average of the distances of topics' latent positions from the origin, to measure the degree of dependency structure. Specifically, the longer distance of latent positions from the origin implies the stronger dependency of the network. To locate matrix X k onto the same quadrant, we choose the baseline matrix, which maximizes the dependency structure among topics. It helps nicely show the change of topics' latent positions because those rotated positions A k from each matrix X k are based on the most stretched out network from the origin. Note that the baseline matrix is A max . Finally, we rotate the axes to improve the interpretability of the relationship among topics using oblique factor rotation (Jennrich, 2002) . After implementing these (which correspond to (c) and (d) in Figure 3 ), we visualize relationships among topics by tracing the coordinates of each topic.

As fitting the LSIRM model, we collect MCMC samples for each topic's latent positions. Note that there exists multiple possible realizations for latent positions because the distances between pairs of respondents and items, which are included in a likelihood function, are invariant under rotation, translation, and reflection (Hoff, Raftery and Handcock, 2002; Shortreed, Handcock and Hoff, 2006) . In order to tackle this invariance property for determining latent positions, we implement within-matrix Procrustes matching (Borg and Groenen, 2005) as post-processing of MCMC samples. After implementing the within-matrix matching for the matrix A k , we additionally execute the between-matrix Procrustes matching based on the baseline matrix A max to compare the transition of the topic's latent positions for the different input matrices A k . We assign A max to the matrix that maximizes A k dependency. Finally, we can obtain the re-positioned matrices A * k , which still maintain the dependency structure among topics but located in the same quadrant. With the oblique rotation, the interpretability of axes can be further improved and topics can be categorized based on these axes. For this purpose, we apply the oblim rotation (Jennrich, 2002) to the estimated topic position matrix A * k , using the R package GPAroation (Bernaards and Jennrich, 2005) . We denote the rotated topic position metric by B k . To interpret the trajectory plot showing traces of topics' latent positions, we extract the rotation information matrix (R) resulting from an oblique rotation as the baseline matrix B base . Then, we multiply each matrix (B k ) by the rotation matrix (R) to plot the topics' latent positions.

Scoring the Words Relation to Topics using the Information from Biterm Topic Model and Latent Space Item Response Model. By combining the idea from FREX i,j (Airoldi and Bischof, 2016) with the interaction information among topics in our problem, here we propose the score s i,j which measures the exclusiveness of word j in the topic i. We define score s i,j as

where w 1 , w 2 , w 3 , and w 4 are the weight for exclusivity (here, we set w 1 = w 2 = w 3 = w 4 = 0.25) and ECDF is the empirical CDF function. Here, δ i,j denotes the probability of word j belonging to topic i given by BTM. On the other hand, γ i,j is the distance between the latent position of word j and topic i estimated by LSIRM. The higher value of δ i,j corresponds to the closer relationship between word j and topic i. On the other hand, the smaller value of γ i,j indicates the shorter distance between word j and topic i. To make meaning of the shorter distance and the higher probability consistent as both contribute higher score, we substract ECDF δ.,j (δ i,j ) and ECDF δi,. (δ i,j ) from 2. Note that here we set 2 to avoid the zero in the denominator. For example, if δ i,j has a high probability within the topic i and between the other topics and word j, then both 2 − ECDF δ.,j(δi,j) and 2 − ECDF δi,.(δi,j) will have small values. This means that word j is distinctive enough to represent the meaning of topic i. Likewise, if the latent distance between word j and topic i has the shortest distance within topic i, and between the other topics and word j, then the ECDF γ .,j (γi,j) and ECDF γ i,. (γi,j) has the smallest value contributing to a high score s i,j . Based on s i,j , we can determine whether word j and topic i are close enough to be mentioned in the same document. By collecting the high score words, we can characterize the topics and name them.

To implement BTM, we set the number of topic to 20. For the hyperparameters, we assigned α = 3 and β = 0.01. Since our main goal is to visualize the topic relationships, we empirically searched and determined the hyper-parameters. The posterior distribution of topic-word was estimated using the Gibbs sampler, where we generated samples with 50,000 iterations after the 20,000 burn-in iterations, and then implemented thinning for every 100 th iteration. Table 1 shows the structure of topic-words distribution indicating degree of relatedness of words to a specific topic. In each topic-word distribution obtained from BTM, words with high probabilities characterize the topic. Figure 4a shows histograms of log-transformed probabilities for each of Topics 1 -4 (histograms for the other topics can be found in Supplementary Materials). Figure 4a indicates bimodal topic-word distribution. Specifically, the mode on the left corresponds to the words that had low possibilities of belonging to a specific topic, whereas the mode in the center corresponds to the words that had high probabilities enough to characterize meaning of the topic. Therefore, to reduce noises, it might be desirable to estimate topic relationships using only the words corresponding to the mode in the center, rather than using all the words. Based on these histograms, the minimum cutoff value that defined the tail of a normal distribution was ranged between -11 and -12. We calculated the number of words whose log-scaled probabilities were above -11 to -12 to specify the minimum number of words for estimating the topic network, and we found that more than 1,000 words are needed to properly represent topics. Based on this rationale, we decided to use at least 1,000 words to estimate positions of topics on the latent space based on positions of words. In order to extract meaningful words that can discriminate characteristics of topics, we chose words with probabilities varying from topic to topic and with relatively high maximum probabilities. Figure 4b shows a plot of the relationship between max probability and coefficient variation. Based on this rationale, we selected words using both max probability and coefficient variation. In this study, rather than using a fixed number of words, we investigated relationships among topics identified with different numbers of words, which were determined using the two criteria described above. Specifically, we obtained multiple matrices corresponding to the top 60% to 40% of words determined based on the two criteria. The numbers of words corresponding to the 60-th and 40-th percentiles were 2,648 and 1,095, respectively. At the end of the day, we used 21 sets of matrices, X k (k = 1, 2, · · · , 21), as the LSIRM input data, where their dimensions were ranged from 2, 648 × 20 to 1, 095 × 20. In this way, we obtained the 21 sets of matrices X k and we considered 20 items for all the matrix sets. To estimate topics' latent positions V = {v i } where i = 1, · · · , 20, MCMC was implemented. The MCMC was run 55,000 iterations and the first 5,000 iterations were discarded as a burned-in process. Then, from the remaining 50,000 iterations, we collected 10,000 samples using thinning of 5. To visualize relationships among topics, we used twodimensional Euclidean space. Additionally, we set 0.28 for β jumping rule, 1 for θ jumping rule, and 0.06 for w j and z i jumping rules. Here, we fixed prior β and let θ follow N(0,1). We set a σ = b σ = 0.001.

(a) Histograms of log-transformed word probabilities for each topic.

(b) Plot of words over the space of maximum probabilities and coefficient of variation (CV), where points (words) are colored according to their ranks. Fig 4: (a) Histograms of log-transformed word probabilities for Topics 1, 2, 3, and 4, which show bimodal shapes. The mode on the left correspond to the words that had low possibilities of association with the topic. The mode on the right corresponds to the words with sufficient information to capture meaning of the topic. (b) The top 5% of words were colored blue based on coefficient variation and maximum probability.

LSIRM takes each matrix X k as input and provides the A k matrix as output after the Procrustes-Matching within the model. Since we calculate topics' distance on the 2dimensional Euclidean space, A k is of dimension 20 × 2. We visualized interactions among topics using the baseline matrix A max chosen so that we can compare topics' latent positions without having identifiability issues from the invariance property. From A k , we calculated the distance between origin and each topic's coordinates. The closer distance of a topic position from the origin indicates the weaker dependency with other topics. Figure 5a showed that dependency structure among topics starts to be built up from A 47% . Based on this rationale, we chose A 47% as the baseline matrix A max . With this baseline matrix A 47% , we implemented the Procrustes matching to align the direction of the topic's latent positions from each matrix A k . Using this process, we could obtain the A * k matrix matched to the baseline matrix A 47% . We named the identified topics based on top ranking words using the A 47% matrix. This is because the baseline matrix A 47% has the most substantial dependency structure comparing other A * k matrix containing the words that characterize topics nicely.

We rotated the original latent space so that the axes better encompassed topics in order to improve interpretation of the identified latent space (e.g., determining the meaning of a topic's transition based on the X-axis or Y-axis). We applied oblim rotation to the estimated topic position matrix A * 47% using the R package GPArotation (Bernaards and Jennrich, 2005) , and obtained matrix B k with the rotation matrix R. In the same way, we rotated the other estimated topic position matrix A * k resulting in B k for k = 40%, · · · , 60%. Figure 5b shows B 47% representing the topics' latent positions. There are two possibilities that can lead to low dependency. First, it is possible that there were a small number of words that could distinguish the characteristics of their topics from the other topics. Second, it is possible that most of the words were common shared with other topics.

We calculated the score s i,j for each word j with topic i based on each B k for k = 40%, · · · , 60%. Figure 6 shows the two different scenarios of the plot of s i,j and affinity (exp (−γ i,j )) in B 47% . The first scenario is that the latent position of a particular topic is located at the center. The plot of score and affinity in this case shows two clusters; one with a high score and high affinity, and the other with a low score but high affinity. Because this topic has many words in common with other nearby topics, although thre are some words 19 in B 47% which has many common words shared by other near located topics. Some words are close to that topic but not enough to distinguish the characteristics of that topic in the given information of BTM (b) Plot of Topic 3 in B 47% which is located outside the center. It has more distinctive characteristics compared to other topics.

close to that topic, they were not close enough to distinguish the characteristics of that topic based on the probability of words belonging to it (Figure 6a ). The second scenario is that the latent positions of some topics are located far from the center, indicating that they have distinctive characteristics compared to topics. There is only one pattern in this case; it has a high score and affinity because those words are both close to those topics and have a high probability within those topics (Figure 6b ). We extracted the top 20% of words with high value in score to name the topic. These words can be found in the Supplementary Materials. After then we collected the meaningful words that commonly appear on the top of the list among every portion of words set B k , k = 40%, · · · , 60%. Table 2 shows the name of each topic determined based on the selected words with its top score words.

To eliminate visualization bias due to selection of the number of words, we tracked the topics' latent positions by using different input matrices B k . We interrogated what kind of topics have been extensively studied in the biomedical literature on COVID-19. In addition, we also studied how those topics were related to each other, based on their closeness in the sense of latent positions. We also partially clustered the topics based on their relationships using the quadrants. This allows us to check which studies about COVID19 are relevant to each other and could be integrated. Figure 7 displays the trajectory plot and it shows how topics were positioned on the latent space and how these topics make transition. More specifically, the direction of arrows refers to how topics' coordinates changed as a function of numbers of words, where each arrow moves from B 60% to B 40% as the number of words decreases. According to Figure 7 , we observed two distinct groups; the one topic group does not depend on other topics, and the other group shows a dependent relationship with other topics. The former group consists of topics that have common words that are shared by other topics as well. On the other hand, the latter group consists of topics that have their own distinct words. In the former group, the topics 'Statistical Modeling', 14, and 19, respectively) were located in the center of the plot. This indicates that no matter how many words were used to estimate the topics' latent positions, those topics remained as general topics and have shares many words with other topics. This makes sense given the fact that many articles have mentioned the outbreak of COVID-19, the testing of COIVD-19, and various ways of studying these notorious pandemics through statistical models. For example, among the 137K publications mentioning "COVID-19" in the PubMed database, more than 76,000, 64,782 ,and 44,602 publications also mentioned "outbreak", "testing", and "prevention" respectively. The latter group, the topics 'Social Impact (financial and economical)', 'Social Impact', 'Cardiovascular', and 'Cytokine Storm' (Topics 12, 11, 10, and 6, respectively) were located away from the center in the plot, which implies their dependency structures with other topics. These topics usually stay on the boundary of the plot regardless of the number of words because they consist mainly of unique words. Finally, the topics like 'Psychological Mental Issues', 'Relevance of Treatment', and 'Compound and Drug' (Topics 15, 4, and 2, respectively) start from the origin, move away from the origin for a while, and then return to the origin. This implies that it could not maintain the nature of the topic when less words are considered, and it is likely that those topics are either ongoing research or burgeoning topic that is not studied enough yet.

In addition, we interpreted the topics' meaning based on their latent positions. We now make interpretations using subsets of topics divided by directions. Since we implemented the oblique rotation that maximizes each axis' distinct meaning, we can render meaning to a direction. Figure 7 indicates that there are three topic clusters. First, the center cluster denoted as (A) in Figure 7 are about the outbreak of COVID-19 and its effects on different areas; the outbreak of COVID-19, diagnosis or testing of COVID-19 using statistical models, the effects on the general social, financial, or economic problems, and the mental pain of facing the pandemic shock of COVID-19. Second, the topics located on the bottom side of the plot (cluster (B) in Figure 7 ) are related to the symptoms of COVID-19 and their treatment through its relative studies. For instance, there are studies of COVID-19 relating to their symptoms, such as cardiovascular diseases, with lung scan images. On the other side, there are topics related to treatment based on symptoms relevant to COVID-19 and risk prediction markers that validate the treatment effects. These subjects pertain to 'Literature Review (8)', 'Lung Scan Imaging (1)', 'Symptoms Comorbidity (9),' 'Lung Scan (16)', 'Treatment (17)', 'Relevance of Treatment (4)', 'COVID-19 Risk Prediction Markers (7)', and 'Cardiovascular (10)'. Finally, the cluster (C) is related to what happens inside our body in response to COVID-19, e.g., cytokine storm, immune system responding to the COVID-19 infection, and compounding of drug which investigate the mechanism of SARS-CoV-2. Note that we can also interpret each axis. For example, the x-axis can reflect a spectrum from macro (more negative) to micro perspectives (more positive).

In summary, we identified three main groups of the COVID-19 literature; the outbreak of COVID-19 and its effects on society, the studies of symptoms and treatment of COVID-19, and COVID-19 effect on the body, including molecular changes by COVID-19 infection. We can derive another insight from the locations of clusters. Specifically, from Cluster (A) to Cluster (C), we can observe a counter-clockwise transition from macro perspectives to micro perspectives. Specifically, this flow starts with the center cluster (A) related to the occurrence of COVID-19 and the social impact of COVID-19, followed by studies of the symptoms and treatment of COVID-19 (cluster (B)) and then ends with the clusters (C), which are related to micro-level events, e.g., how SARS-CoV-2 binding occurs and how the immune system responds to upon the COVID-19 infection.

6. Discussion. In this manuscript, we proposed a novel analysis framework to estimate and visualize relationships among topics based on the text mining results. It allows enhancing our understanding of COVID-19 knowledge reported in the biomedical literature, by evaluating topics' networks through their latent positions estimated based on topic sharing patterns. The proposed approach overcame limitations of existing approaches, especially discrete and static visualization of relationship among topics. First, because we position topics on a latent space, relationships among topics can be intuitively investigated and also easy to interpret. Second, using the score based on their latent relationships, our approach can select meaningful words that describe the topics. Based on chosen words, we can define the name of topics without having expert knowledge. Third, our method allows deeper understanding of the network among topics and capturing their dynamics with a continuous representation by tracking the change of topics' relationships. To the authors' best knowledge, this is the first attempt to integrate the biterm model and the latent space item response model within a unified framework.

The application of our method to COVID-19 literature indicated that there are three main subjects in the COVID-19 biomedical literature. The proposed framework can still be further improved in several ways, especially by allowing word-level inference, i.e., extraction of meaningful words that characterize each topic. Although distance between each topic and relevant words is taken into account in our model when estimating topics' latent positions, simultaneous representation and visualization of words are still not embedded in the current framework. We believe that adopting a variable selection procedure to determine key words can potentially address this issue and this will be an interesting future research avenue.

Improving and evaluating topic models and other models of text

Gradient projection algorithms and software for arbitrary rotation criteria in factor analysis

A correlated topic model of science

Latent dirichlet allocation

Modern multidimensional scaling: Theory and applications

Keep up with the latest coronavirus research

Word co-occurrence augmented topic model in short text

Introduction to modern information retrieval

An interactive web-based dashboard to track COVID-19 in real time

Mixed-membership models of scientific publications

word2vec Explained: deriving Mikolov et al.'s negative-sampling wordembedding method

Finding scientific topics

Hidden topic markov models

Latent space approaches to social network analysis

A simple general method for oblique rotation

Mapping Unobserved Item-Respondent Interactions: A Latent Space Item Response Model with Interaction Map

Dynamic multi-relational Chinese restaurant process for analyzing influences on users in social media

Micro-blog topic detection method based on BTM topic model and K-means clustering algorithm

The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem

An overview of topic modeling and its current applications in bioinformatics

The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email

Topic modeling with network regularization

Efficient estimation of word representations in vector space

Distributed representations of words and phrases and their compositionality

Text classification from labeled and unlabeled documents using EM

Model trees with topic model preprocessing: An approach for data journalism illustrated with the WikiLeaks Afghanistan war logs

Positional estimation within a latent space model for networks

A biterm topic model for short texts

Comparing twitter and traditional media using topic models