key: cord-0189400-gevximd0 authors: Wang, Lingzhi; Li, Jing; Zeng, Xingshan; Wong, Kam-Fai title: Successful New-entry Prediction for Multi-Party Online Conversations via Latent Topics and Discourse Modeling date: 2021-08-16 journal: nan DOI: nan sha: 670139c64518fb4b44cf58a2b28fb5e024438ef2 doc_id: 189400 cord_uid: gevximd0 With the increasing popularity of social media, online interpersonal communication now plays an essential role in people's everyday information exchange. Whether and how a newcomer can better engage in the community has attracted great interest due to its application in many scenarios. Although some prior works that explore early socialization have obtained salient achievements, they are focusing on sociological surveys based on the small group. To help individuals get through the early socialization period and engage well in online conversations, we study a novel task to foresee whether a newcomer's message will be responded to by other participants in a multi-party conversation (henceforth Successful New-entry Prediction). The task would be an important part of the research in online assistants and social media. To further investigate the key factors indicating such engagement success, we employ an unsupervised neural network, Variational Auto-Encoder (VAE), to examine the topic content and discourse behavior from newcomer's chatting history and conversation's ongoing context. Furthermore, two large-scale datasets, from Reddit and Twitter, are collected to support further research on new-entries. Extensive experiments on both Twitter and Reddit datasets show that our model significantly outperforms all the baselines and popular neural models. Additional explainable and visual analyses on new-entry behavior shed light on how to better join in others' discussions. Online conversations are a crucial part of our daily communication -many people now turn to social media to share ideas and exchange information, especially when facing the lockdown caused by an epidemic outbreak (such as the COVID- 19) . Entrance to ongoing conversations (both online and offline [1] ) requires a socialization process where newcomers seek to gain feedback, and the early socialization experiences have a long-term impact for newcomers [12] . In our everyday life, one should engage in a wide variety of conversations, ranging from online meetings advancing project collaborations to chitchats forming personal ideologies. However, not everyone is good at socializing [25] and online newcomers face special difficulties as a result of the diffuse, decentralized, and anonymous textbased interactions [2, 12] . It is rather challenging for a newcomer to engage in an online multi-party conversation. In light of these concerns, there exists a pressing need to develop a conversation management toolkit to predict the conversation's future trajectory and advance the interpersonal communication quality [26, 54] . Therefore, here we focus on a novel task to predict successful new-entries -whether a newcomer's message will be replied by others and concretely contribute to the conversation's continuity. This task is inspired by some previous works, such as newcomers' socialization [12, 20, 21] , social expression [16] and response prediction [3, 4] , etc. The reasons why we employ "receiving a reply" to represent successful engagement can be summarized as twofold: (1) Pethe and Skiena [36] shows that replies, likes (which are used to show appreciation for a post [34] ), retweets are highly correlated. (2) We conduct an interesting human evaluation (see Table 6 ), and the results show that posts that received replies are more successful than posts with silent responses in four indicators. Hence we decide to employ such a definition to simplify the task and relieve the burden of data collection and construction. Overall, this task can help newcomers avoid killing the conversations [26] and decrease the risk of withdrawal [2] because the silence outcomes can be interpreted as rudeness or unfriendliness [18] Figure 1 : A Reddit conversation on the left part ( : the -th user). 3 made a successful engagement (i.e. receiving a reply, we omit here to save space). The right column shows 3 's chatting history, where the topic words are in bold and italic. the newcomers' future participation [2, 6] . More importantly, our research will potentially benefit the development of chatbots or online sales bots to understand when, what and how to say in a multi-party conversation since they should speak as few times as possible but keep the conversation going. To solve the successful new-entry prediction task, we propose a novel framework consisting of two parts: topic and discourse modeling (TDM) and successful new-entry prediction (SNP). We examine both the conversation's ongoing contexts (henceforth conversation contexts) and the newcomer's chatting history (user history) and hypothesize that both the chatting topics (what are said) and discourse behavior (how they are said) will affect other participants' attitude to the fresh blood. To elaborate the motivation of this hypothesis, Figure 1 shows an example of a Reddit conversation snippet, where both 4 and 5 are newcomers. It is observed that 5 posted a question about a new point "NSFL" (not safe for life) and hence drew future participants' attention and feedback, while 4 made a statement via echoing "deep-fry butter" concerned before and didn't receive any responses. To explore the factors that affect such successful and failed new entries, we capture and distinguish topic and discourse factors with an unsupervised neural module based on variational auto-encoders (VAE) [28] . Salient words reflecting topic content (such as "pizza" and "ham" in Figure 1 ) derived from history conversations and discourse behavior (e.g., "what" and "?" as the question indicators) representing current trajectory are identified, serving as part of inputs for prediction module (i.e. the SNP module). Our SNP module contains a hierarchical two-layer Bi-GRU [15] to encode the conversation content for final prediction. To the best of our knowledge, we are the first to study the future trajectory of newcomers-engaged conversations and how topic and discourse factors influence their engagement success (or failure). To summarize, the contributions of this work are as follows: • We first formulate the task of successful new-entry prediction and contribute two large-scale datasets, Twitter and Reddit. The SNP task can benefit the development of online assistants and early socialization strategies. • We propose a novel framework combining unsupervised and supervised neural networks. VAE and RNN-based modules are incorporated for the personalized user engagement prediction via learning latent topic and discourse representation. • Experimental results on both Twitter and Reddit show that the proposed model significantly outperforms the baselines. For example, we achieve 34.6 F1 on Reddit compared with 32.5 achieved by a BERT-based method [19] . • Extensive analytical experiments are conducted to show the effectiveness of the modules. We probe into the learned topics and discourse and make the results explainable. Some successful entry strategies for early socialization are given based on the analysis on top of the differences between successful and failed new-entries. Our work is in line with newcomer socialization, response prediction, and conversation modeling on social media platforms. Newcomer socialization [12, 20, 21] , which analyzes the process that newcomers make the transition from being organizational outsiders to being insiders [5] , is an essential research for supporting, socializing and integrating members to virtual environments [24] . A range of user behaviors have been investigated, including information process [1] , socialization [20] , and social expression [16] . Most of the previous researches are based on sociological surveys in a small group while our work is based on large-scale datasets. Despite focusing on early socialization, our work also benefits the development of online assistants, e.g., chatbots. Research on response prediction aims to predict whether a given online content will receive desired responses by analyzing the propagation patterns of social media content, such as the prediction of user responses [3, 7] , thread-ending turns [26] , and re-entry behavior [46, 54] , etc. Some of them tackle post-level responses [3, 7, 39, 40] , which focus on modeling single post content; and others concern users' future behavior prediction enhanced with context modeling [26, 52, 54, 56] , to achieve conversational level trajectory prediction. For the methodology, earlier works depend on the extraction of handcrafted features related to their objectives, e.g., social and sentiment features [3] , arrival patterns and timing features [4] , group information [11] , etc. More recent achievements obtained from probabilistic graphical methods [8] , neural models like RNN-based models [26, 54] and graph nerual networks [55] , due to their better ability in leveraging abundant context information. However, none of them considers newcomers in a multi-party conversation and how to help them better engage in, which is extensively investigated in this work. We are also inspired by the existing methods for conversation modeling, which develop models to encode conversation context from rich information, such as user interactions [47, 50, 55] , temporal orders of turns [14, 26, 44] , latent topics and discourse [45, 51] , etc. Among them, Jiao et al. [26] and Tanaka et al. [44] only consider temporal orders of conversation turns and use RNN-based methods to learn temporal and content information, while Wei et al. [49] and Zeng et al. [55] formulate multi-party conversations as tree structure and apply Graph Convolutional Networks (GCN) [29] to extract contextaware representations. Ritter et al. [37] and Zeng et al. [51] explore unsupervised methods to discover discourse-level information for online conversations, while the latter further leverages neural models to jointly learn distinguished topics and discourse. In our model, the way we learn latent topic and discourse representations is built upon the success of neural latent variable models [32] for unsupervised conversation understanding [51] . Compared with the original design, we also consider user chatting history to leverage newcomers' personal interests and investigate how topics and discourse factors affect newcomers' involvement in multi-party conversations, which is beyond the capability of Zeng et al. [51] . Here we present our successful new-entry prediction framework. Figure 2 shows our overall framework. In the following, we first introduce the input and output in Section 3.1. Then in Section 3.2, we discuss how the TDM module works. The SNP module will be later described in Section 3.3. The learning objective of the entire framework will be given at last in Section 3.4. The input for our model can be divided into two parts: the observed conversation and the history conversation set ℎ = { ℎ 1 , ℎ 2 , ..., ℎ } of the newcomer , where is the number of history conversations obtained from training set. The conversation is formalized as a sequence of turns (e.g., posts or tweets) { 1 , 2 , ..., | | }, and the | | ℎ turn is posted by the newcomer (we predict whether can get others' response afterwards). The conversations in user's history conversation set ℎ are organized similarly into the sequences of turns. For output, we yield a Bernoulli distribution ( , ℎ , ) to indicate the estimated likelihood of whether gets responses from other participants (successful new-entries). Inspired by Zeng et al. [51] , we learn distributional word clusters that reflect the latent topic in conversation , and discourse behaviors = ⟨ 1 , 2 , ..., | | ⟩ for each turn in . At corpus level, we assume there are topics, each is represented by a word distribution ( = 1, 2, ..., ). It also contains discourse behaviors represented with word distributions ( = 1, 2, ..., ). To learn the topic and discourse representations, each turn in one conversation, referred to as a target message, is fed into the TDM sequentially in the form of bag-of-words (BoW). For factor modeling, TDM employs the extended framework of variational auto-encoders (VAE) [28] to resemble the data generative process via two steps [32, 41] . First, given the target turn and its conversation context (i.e. the other turns in the same conversation), TDM converts them into two latent variables: topic variable and discourse variable . Then, we reconstruct the target turn with the intermediate representations captured by and . In the following, we first describe the encoder followed by the decoder in detail. Encode step. We learn the parameter , and from the input and (the BoW form of the conversation and target turn ) following the formula below: where * (·) is neural perceptrons performing linear transformations activated with an ReLU function [33] . Decode step. In general, the decoder learns to reconstruct the words in target message . The following is the procedure: • Draw latent topic z ∼ N ( , 2 ). • Topic mixture = softmax( (z)). • Draw the latent discourse ∼ Multi( ). • For the -th word in the conversation: In particular, the weight matrix of (·) (after the softmax normalization) is considered as the topic-word distribution . We can also get the discourse-word distribution in a similar way. We use TDM to encode both contexts of the target conversation and chatting history. For the target conversation , we model and denote the topic variable as , and the discourse behaviors of its turns as = ⟨ 1 , ..., | | ⟩. For the chatting history conversation set ℎ of the newcomer , we learn the topic variables for all the conversations. Then they are averaged as 's representation, denoted as . This can be regarded as a kind of user embedding, which reflects their preferences and interests learned from the user history. This section describes how we encode conversation and predict successful new-entry via leveraging topic and discourse variables learned by TDM (described in Section 3.2). It mainly contains three parts: personalized turn encoder, discourse-aware conversation encoder, and the final prediction layer. Personalized Turn Encoder. For each turn in conversation , we learn turn representations that are aware of personalized topic vector . To that end, we first feed each word in into an embedding layer and get the word representation . Then a bidirectional gated recurrent unit (Bi-GRU) [15] is used to encode the word vector sequence of turn , denoted as ⟨ 1 , 2 , ..., , ⟩. is the number of words in . We divide the observed conversation turns into context turns (turns before the last turn) and query turn (last turn, posted by newcomer ). For query turn, we use 's topic representation (produced by TDM module in Section 3.2) to initialize the aforementioned Bi-GRU. For the context turns, the topic representation for conversation is similarly utilized for initialization. Concretely, the initial states for both directions are where is or . For all turns, the hidden states of Bi-GRU are defined as: The representation of turn is the concatenation of last hidden states of both directions of Bi-GRU: ]. Finally, we get the turn-level representations of conversation : ⟨ 1 , 2 , .., | | ⟩. Discourse-aware Conversation Encoder. We incorporate latent discourse behavior to model the turn interactions in the conversations, which allows better understanding of how users interact with each other in the discourse. We first concatenate the turn-level representations with the discourse variable learned in TDM, as the input for the second Bi-GRU layer. This Bi-GRU layer is used to model conversation structure and defined as: where = [ ; ] and the representation of each turn after GRU is ← − ]. Then, we design an attention mechanism (henceforth discourseaware attention) to identify discourse behavior in contexts that contribute more to signal successful engagement. Our intuition is that different discourse behaviors represent different functions, and therefore should be distinguished in the weights to make predictions. For example, a turn raising a question might be more important than a simple agreement response. Therefore, we assign different attention weights to the turns, based on their discourse behaviors: where argmax( ) means the learned discourse behavior to turn , and (·) maps the discourse behaviors to different weight values. Finally, to produce the whole conversation representation, we concatenate the hidden state of | | ℎ turn (i.e. the query turn) | | and the weighted sum of all conversation turns: Prediction Layer. For prediction, we employ a linear projection function, activated by a sigmoid activation function, to predict how likely the newcomer can successfully chip in the conversation: where and are trainable, and (·) is the sigmoid activation function. For parameter learning in our model, we design the objective function as follows, to allow the joint learning of TDM and SNP modules: Objective Function of TDM. Following Zeng et al. [51] , L is defined as follows: where L and L are objectives about learning topics and discourse, L is the loss for target message reconstruction, and L ensures that topics and discourse learn differently. To learn the latent topics and discourse, TDM employs the variational inference [9] to approximate posterior distribution over the latent topic and the latent discourse given all the training data. L and L is defined as follows: where ( | ) and ( | ) are approximated posterior probabilities describing how the latent topic and the latent discourse are generated from the conversations and message turns. ( | ) and ( | ) represent the corpus likelihoods conditioned on the latent variables. ( ) follows the standard normal prior N (0, I) and ( ) is the uniform distribution (0, 1). refers to the Kullback-Leibler divergence that ensures the approximated posteriors to be close to the true ones. For L that ensures the learned latent topics and discourse can reconstruct target turn , it is defined as below: We also leverage on L to guide the model to separate word distributions that represent topics and discourse. L is defined as: Objective Function of SNP. The objective function of SNP is designed to be binary cross-entropy loss as following: whereˆdenotes the probability estimated from ( , ℎ , ) for the i-th instance, and is the corresponding binary ground truth label (1 for successful entries and 0 for the opposite). To take the potential data imbalance into account, we also adopt a trade-off weight to give more weight to the minority class. is set based on the proportion of positive and negative instances in the training set. We construct two new conversation datasets from Twitter and Reddit. The raw data for the Twitter dataset is released by Zeng et al. [53] , containing Twitter conversations formed based on the TREC 2011 Microblog Track 1 . The raw data for the Reddit dataset contains posts and comments from Jan to May 2015, which is obtained from a publicly available Reddit corpus 2 . For both datasets, we follow the common practice to form conversations with in-reply-to relations [48, 53] , where a post or a reply (comment) is considered as a conversation turn. Our work focuses on the multi-party conversations with new users engaging in later. To that end, we remove the conversations with < 4 turns and those with < 3 participants. Finally, the datasets are randomly divided into 80%, 10%, and 10%, for training, validation, and test. The statistics of the two datasets are shown in Table 1 . As can be seen, Twitter users tend to respond to newcomers while new-entries in Reddit are more likely to be failed, probably because Twitter users are more open to public discussions compared with Reddit. We can also see that about 60% newcomers has user chatting history, which means that 60% newcomers in the test set are involved in other discussions in training data. We further study the newcomer's distribution for the number of history conversations in Figure 3 . Most of the newcomers engaged in less than 5 conversations before. The sparsity in user history might pose challenges to learn their interests. For Twitter dataset, we applied Glove tweet preprocessing toolkit [35] . As for Reddit, we first utilized the open-source natural language toolkit (NLTK) [31] for word tokenization. We then removed all the non-alphabetic tokens and replaced links with the generic tag "URL". For both datasets, a vocabulary was built and maintained with all the remaining tokens, including emoticons and punctuation. For the TDM module of our model, all stopwords were removed for topic modeling following common practice [10] . The parameters in the TDM module are set up following Zeng et al. [51] . For the parameters in SNP module, we first initialize the embedding layer with 200-dimensional Glove embedding [35] , whose Twitter version is used for Twitter dataset and Common Crawl version is applied for Reddit 3 . For the Bi-GRU layer, we set the size of its hidden states for each direction to 100 (200 for final output). The batch size is set to 64. In model training, we employ Adam optimizer [27] with initial learning rate selected among {1 -3, 1 -4, 1 -5} and early stop adoption [13] . Dropout strategy [42] is used to alleviate overfitting. All the hyper-parameters are tuned on the validation set by grid search. How to jointly train the TDM and SNP modules is critical for the performance of our model. We design to first train the TDM module for 1 epochs (e.g., 1 = 100 ) and train SNP module for 2 epochs (e.g., 2 = 5) respectively (while the other one is fixed), as pre-training procedure. Then we jointly train the whole model alternatively. Figure 4 is a state transition diagram showing the process of our joint training method. Comparisons. We first compare with two simple baselines: RAN-DOM, which randomly selects label 0 or 1 for prediction; HISTORY, which predicts based on the ratio of successful entries in newcomers' history conversations (random prediction is adopted for newcomers without user history). We further compare our model with 5 different models. 1) SVM: SVM-based binary classifier [17] with features (e.g., TF-IDF, topic distribution, post length, thread length, etc.) gathering from Jiao et al. [26] , Suh et al. [43] , and Hong et al. [23] , 2) BILSTM: hierarchical BiLSTM used to encode conversation contexts -a BiLSTM as turn encoder and another BiLSTM to model the turn sequence, and an MLP layer works for the predictions (like our model). 3) BERT: a pretrained BERT is adopted to learn the turn representations and another BiLSTM to model the whole conversation, an MLP layer works for the predictions (like our model). 4) CONVERNET: the state-of-the-art model to predict conversation killers [26] , where a few features are ignored (sentiment, background, etc.) because they are unavailable in our datasets. 5) JECUH: the state-of-the-art model to predict conversation re-entries [54] , where the implementation is based on their code. Evaluation Metrics. To measure the performance, we adopt popular evaluation metrics for binary classification and consider area under the ROC Curve (AUC), accuracy, precision, and F1 scores. We first compare model performance in Section 5.1 including an ablation study. Then in Section 5.2, we analyze the effects of topics and discourse over successful new-entries. Section 5.3 shows the differences between successful and failed cases, and verifies with a human evaluation. At last, Section 5.4 gives more discussions on our model output with quantitative and qualitative analyses. Table 2 reports the main results on the two datasets, where our model significantly outperforms all comparison models. Here are more observations: • All models perform better on Twitter than Reddit. This suggests models' sensitivity to label imbalance, where on Reddit, we observe more sparse positive samples to learn successful new-entries compared with the negative (see Table 1 ). • Successful New-entry Prediction (SNP) is challenging. Simple baselines such as RANDOM and HISTORY perform poorly. This indicates that SNP is challenging and impossible to be well tackled relying on simple strategies. We also find that neural models exhibit better performance than non-neural, probably benefited from their ability to learn deep semantic features from complex online interactions, which is beyond the capability of shallow features crafted manually. • Newcomers' chatting history is useful. By leveraging user history, JECUH and our model perform better than both BILSTM and ConverNet. It might be attributed to the continuity of user interests, which allows the use of more context from user history to better understand the new-entry and how it is related to the conversation contexts. • The coupled effects of topics and discourse are helpful to SNP. It is seen that the joint modeling of topics and discourse helps our model to obtain the best performance. More performance gain is observed on Reddit dataset, suggesting our ability to alleviate the overfitting caused by sparse positive samples, possibly because richer features can be learned from latent topic and discourse clusters. Ablation Study. We also examine the contributions of some components in our framework with an ablation study presented in Table 3 . We compare our full model with its variants without using topics for turn encoder initialization (W/O TOPIC INIT), without concatenating Topic Analysis. To analyze the topics extracted from our TDM model, we adopt scores measured via the open-source Palmetto toolkit 4 to evaluate the topic coherence. scores assume the top N words in a coherent topic (ranked by likelihood) tend to co-occur in the same document and have shown comparable evaluation results to human judgments [38] . We present scores for top 5 and 10 words of the learned topics in Table 4 and compare them with LDA [22] and NTM [32] . The comparison results show that our model can learn a better topic representation. To further analyze the topics learned from user history, we set the topic number = 2, sample three users, and visualize their history conversations' topic mixtures in Figure 5 . It is seen that users tend to be drawn by discussions with similar topics, which interprets why incorporating chatting history allows better understanding of the new-entries' topics. Discourse Analysis. Compared with latent topics, discourse behavior is harder to be understood. So to interpret what is learned for discourse, Table 5 shows 5 example discourse behaviors from Reddit with the top 5 terms by likelihood, where meaningful discourse words are found to represent different user discourse behaviors. To further examine newcomers' discourse behavior, Figure 6 (a) shows the distribution of the latent discourse learned for new-entries. The results show that all discourse behaviors have similar frequency to be used , while some are relatively more popular, such as D4 and D6. We then compare the distribution of discourse behavior over successful and failed new-entries to Reddit conversations in Figure 6 (b) (the top 2 discourse predictions are considered). As can be seen, some discourse behaviors tend to result in successful outcomes (e.g., D3 and D5), while some may easily lead to the opposite (e.g., D4 and D6). This indicates the importance for newcomers to use the right manner so as to better chip in others' discussions. Differences in Topic Similarity. We carry out another study to examine how topics affect new-entries. We measure the similarity between the newcomers' topic distribution in their chatting history and the target conversations' contexts for successful cases (SN) and the failed ones (FN). It turns out that the similarity scores for Figure 7 : Y-axis: F1 score. In 7(a), X-axis: user history conversation numbers. In 7(b), LEN in X-axis: the -th quantile by turn numbers (smaller , shorter length). SN and FN are 38.7 and 42.7 in Twitter, while in Reddit they are 20.0 and 22.8 respectively. Interestingly, lower similarity results in more chances for success -newcomers mentioning points that are discussed less in prior history (probably cutting in with a question) are more likely to be responded to by others. Human Evaluation. We conduct a human evaluation to explore the differences between successful and failed new-entry. Four postgraduate volunteers proficient in English are invited, and 50 conversations (25 conversations are successful cases and the others are failed) are randomly sampled from Twitter and Reddit datasets, respectively. Inspired by Arguello et al. [2] and Burke et al. [12] , we propose four evaluation indicators, on-topics (OT), asking questions (AQ), complex language (CL) and controversial statement (CS). Volunteers give 0 (not match) or 1 (match) for each conversation according to the four indicators. From the results showed in Table 6 , the posts of successful newcomers are more on-topic, asking questions more often, in simple language, and usually more controversial, which is consistent with Arguello et al. [2] . Effect of History Number. As discussed in Section 5.1 and 5.3, user history is essential to learn topic factors. We further analyze the change of prediction results over varying user history lengths. Test set is then divided into four subsets with the number of history conversations ( ) involving a newcomer, where = 0, 1 − 4, 5 − 9, and > 9. Our F1 scores over them are displayed in Figure 7 (a). Better F1 is achieved on newcomers with longer history (engaged in more conversations), as sparse history provides limited contexts to learn topics, which will further affect SNP performance. Effect of Conversation Length. After showing how our model performs over varying sparsity degree of user history (in Figure 7 (a)), here we are interested in the model sensitivity over varying lengths of conversation contexts. Figure 7 (b) shows the F1 scores over varying turn numbers in conversation contexts in Reddit. Better F1 scores are observed for conversations with more turns, as longer contexts can benefit feature learning. Also, more performance gain is observed from our model for LEN1 (very short contexts), signaling our ability to cope with data sparsity. Qualitative Analysis. To provide more insights, we use the example in Figure 1 to conduct a qualitative analysis. It is found that our model assigns more attention weights ( in Eq.4) to the turn posted before 3 as it concerns the topic "NSFL" leading to successful new-entry. We also notice that 3 has a wide range of interests (shown in user history) and open to engaging in different types of discussions. This might be another reason why our model predicts positive outcomes, following our previous findings (Section 5.2). Advises for Newcomers. As we showed in Section 5.3 that successful and failed newcomers show differences in topic similarity and four evaluation indicators, we give two suggestions as follows: 1) Contribute new and interesting information to the community, even it's controversial. 2) Use simple language and ask on-topic questions more for better communication. This paper first formulate the task of successful new-entry prediction and collect two large-scale datasets, Twitter and Reddit. A novel model is proposed to predict successful new-entries via modeling latent topics and discourse in conversation contexts and user chatting history. This paper also explores the roles of topic and discourse played in newcomers' engagement to multi-party conversations. Extensive experiments have shown that the proposed model achieves significantly better performance than baselines and the model has learned meaningful topics and discourse representations, which are able to further signal how to make successful new-entries. Socialization in virtual groups Talk to me: foundations for successful individual-group interactions in online communities Predicting responses to microblog posts Characterizing and curating conversation threads: expansion, focus, volume, re-entry Newcomer adjustment during organizational socialization: a metaanalytic review of antecedents, outcomes, and methods Interpreting soap operas and creating community: Inside a computer-mediated fan culture WWW '16). International World Wide Web Conferences Steering Committee Modeling a retweet network via an adaptive bayesian approach Variational inference: A review for statisticians On participation in group chats on twitter Membership claims and requests: Conversation-level newcomer socialization strategies in online groups Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping A factored neural network model for characterizing online discussions in vector space Learning phrase representations using RNN encoder-decoder for statistical machine translation Beyond knowledge sharing: interactions in online discussion communities Support-vector networks Attribution in distributed work groups Bert: Pre-training of deep bidirectional transformers for language understanding Socializing volunteers in an online community: a field experiment Am I doing what's expected? New member socialization in virtual groups. In Our virtual world: The transformation of work, play and life via technology Hierarchical topic models and the nested chinese restaurant process Predicting popular messages in twitter The Semantic Web-ISWC The needs and difficulties in socializing the young in contemporary China: Early childhood education experts' perspectives Find the conversation killers: A predictive study of thread-ending posts Adam: A method for stochastic optimization Auto-encoding variational bayes Semi-Supervised Classification with Graph Convolutional Networks Content analysis: An introduction to its methodology NLTK: The Natural Language Toolkit Discovering Discrete Latent Topics with Neural Variational Inference Rectified Linear Units Improve Restricted Boltzmann Machines The rate of reply and nature of responses to suicide-related posts on Twitter Glove: Global vectors for word representation The Trumpiest Trump? Identifying a Subject's Most Characteristic Tweets Unsupervised modeling of twitter conversations Exploring the space of topic coherence measures Mining and comparing engagement dynamics across multiple social media platforms Predicting discussions on the social semantic web Autoencoding variational inference for topic models Dropout: A Simple Way to Prevent Neural Networks from Overfitting Want to be retweeted? large scale analytics on factors impacting retweet in twitter network Dialogue-Act Prediction of Future Responses Based on Conversation History Continuity of Topic, Interaction, and Query: Learning to Quote in Online Conversations Re-entry Prediction for Online Conversations via Self-Supervised Learning Quotation Recommendation and Interpretation Based on Transformation from Queries to Quotations Topic-Aware Neural Keyphrase Generation for Social Media Language Modeling Conversation Structure and Temporal Dynamics for Jointly Predicting Rumor Stance and Veracity What You Say and How You Say it: Joint Modeling of Topics and Discourse in Microblog Conversations What Changed Your Mind: The Roles of Dynamic Topics and Discourse in Argumentation Process Microblog conversation recommendation via joint modeling of topics and discourse Joint Effects of Context and User History for Predicting Online Conversation Re-entries Neural Conversation Recommendation with Online Interaction Modeling Modeling Global and Local Interactions for Online Conversation Recommendation The research described in this paper is partially supported by RGC GRF #14204118 and RGC RSFS #3133237. Jing Li is supported by NSFC Young Scientists Fund (62006203).