key: cord-020794-d3oru1w5 authors: Leekha, Maitree; Goswami, Mononito; Jain, Minni title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_28 sha: doc_id: 20794 cord_uid: d3oru1w5 Consumer reviews online may contain suggestions useful for improving commercial products and services. Mining suggestions is challenging due to the absence of large labeled and balanced datasets. Furthermore, most prior studies attempting to mine suggestions, have focused on a single domain such as Hotel or Travel only. In this work, we introduce a novel over-sampling technique to address the problem of class imbalance, and propose a multi-task deep learning approach for mining suggestions from multiple domains. Experimental results on a publicly available dataset show that our over-sampling technique, coupled with the multi-task framework outperforms state-of-the-art open domain suggestion mining models in terms of the F-1 measure and AUC. Consumers often express their opinions towards products and services through online reviews and discussion forums. These reviews may include useful suggestions that can help companies better understand consumer needs and improve their products and services. However, manually mining suggestions amid vast numbers of non-suggestions can be cumbersome, and equated to finding needles in a haystack. Therefore, designing systems that can automatically mine suggestions is essential. The recent SemEval [6] challenge on Suggestion Mining saw many researchers using different techniques to tackle the domain-specific task (in-domain Suggestion Mining). However, open-domain suggestion mining, which obviates the need for developing separate suggestion mining systems for different domains, is still an emerging research problem. We formally define the problem of open-domain suggestion mining as follows: Building on the work of [5] , we design a framework to detect suggestions from multiple domains. We formulate a multitask classification problem to identify both the domain and nature (suggestion or non-suggestion) of reviews. Furthermore, we also propose a novel language model-based text over-sampling approach to address the class imbalance problem. We use the first publicly available and annotated dataset for suggestion mining from multiple domains created by [5] . It comprises of reviews from four domains namely, hotel, electronics, travel and software. During pre-processing, we remove all URLs (eg. https:// ...) and punctuation marks, convert the reviews to lower case and lemmatize them. We also pad the text with start S and end E symbols for over-sampling. One of the major challenges in mining suggestions is the imbalanced distribution of classes, i.e. the number of non-suggestions greatly outweigh the number of suggestions (refer Table 1 ). To this end, studies frequently utilize Synthetic Minority Over-sampling Technique (SMOTE) [1] to over-sample the minority class samples using the text embeddings as features. However, SMOTE works in Table 1 . Datasets and their sources used in our study [5] . The class ratio column highlights the extent of class imbalance in the datasets. The travel datasets have lower inter-annotator agreement than the rest, indicating that they may contain confusing reviews which are hard to confidently classify as suggestions or non-suggestions. This also reflects in our classification results. the euclidean space and therefore does not allow an intuitive understanding and representation of the over-sampled data, which is essential for qualitative and error analysis of the classification models. We introduce a novel over-sampling technique, Language Model-based Over-sampling Technique (LMOTE), exclusively for text data and note comparable (and even slightly better sometimes) performance to SMOTE. We use LMOTE to over-sample the number of suggestions before training our classification model. For each domain, LMOTE uses the following procedure to over-sample suggestions: Find Top η n-Grams: From all reviews labelled as suggestions (positive samples), sample the top η = 100 most frequently occurring n-grams (n = 5). For example, the phrase "nice to be able to" occurred frequently in many domains. Train a BiLSTM language model on the positive samples (suggestions). The BiLSTM model predicts the probability distribution of the next word (w t ) over the whole vocabulary (V ∪ E) based on the last n = 5 words (w t−5 , . . . , w t−1 ), i.e., the model learns to predict the probability distribution n-Grams: Using the language model and a randomly chosen frequent 5-gram as the seed, we generate text by repeatedly predicting the most probable next word (w t ), until the end symbol E is predicted. Table 2 comprises of the most frequent 5-grams and their corresponding suggestions 'sampled' using LMOTE. In our study, we generate synthetic positive reviews till the number of suggestion and non-suggestion class samples becomes equal in the training set. seed ← random(n grams) 6: sample ← LMOTEGenerate(language model, seed) 7: S ← S ∪ sample 8: end while 9: return S Algorithm 1 summarizes the LMOTE over-sampling methodology. Following is a brief description of the sub-procedures used in the algorithm: • LMOTEGenerate(language model, seed): The procedure takes as input the trained language model and a randomly chosen n-gram from the set of top η n-grams as seed, and starts generating a review till the end tag, E is produced. The procedure is repeated until we have a total of N suggestion reviews. Multi-task learning (MTL) has been successful in many applications of machine learning since sharing representations between auxiliary tasks allows models to generalize better on the primary task. Figure 1B illustrates 3-dimensional UMAP [4] visualization of text embeddings of suggestions, coloured by their domain. These embeddings are outputs of the penultimate layer (dense layer before the final softmax layer) of the Single task (STL) ensemble baseline. It can be clearly seen that suggestions from different domains may have varying feature representations. Therefore, we hypothesize that we can identify suggestions better by leveraging domain-specific information using MTL. Therefore, in the MTL setting, given a review r i in the dataset, D, we aim to identify both the domain of the review, as well as its nature. We use an ensemble of three architectures namely, CNN [2] to mirror the spatial perspective and preserve the n-gram representations; Attention Network to learn the most important features automatically; and a BiLSTM-based text RCNN [3] model to capture the context of a text sequence (Fig. 2) . In the MTL setting, the ensemble has two output softmax layers, to predict the domain and nature of a review. The STL baselines on the contrary, only have a singe softmax layer to predict the nature of the review. We use ELMo [7] word embeddings trained on the dataset, as input to the models. We conducted experiments to assess the impact of over-sampling, the performance of LMOTE and the multi-task model. We used the same train-test split as provided in the dataset for our experiments. All comparisons have been made in terms of the F-1 score of the suggestion class for a fair comparison with prior work on representational learning for open domain suggestion mining [5] (refer Baseline in Table 3 ). For a more insightful evaluation, we also compute the Area under Receiver Operating Characteristic (ROC) curves for all models used in this work. Tables 3, 4 Over-Sampling Improves Performance. To examine the impact of oversampling, we compared the performance of our ensemble classifier with and without over-sampling i.e. we compared results under the STL, STL + SMOTE and STL + LMOTE columns. Our results confirm that in general, over-sampling suggestions to obtain a balanced dataset improves the performance (F-1 score & AUC) of our classifiers. We compared the performance of SMOTE and LMOTE in the single task settings (STL + SMOTE and STL + LMOTE ) and found that LMOTE performs comparably to SMOTE (and even outperforms it in the electronics and software domains). LMOTE also has the added advantage of resulting in intelligible samples which can be used to qualitatively analyze and troubleshoot deep learning based systems. For instance, consider suggestions created by LMOTE in Table 2 . While the suggestions may not be grammatically correct, their constituent phrases are nevertheless semantically sensible. Multi-task Learning Outperforms Single-Task Learning. We compared the performance of our classifier in single and multi-task settings (STL + LMOTE and MTL + LMOTE ) and found that by multi-task learning improves the performance of our classifier. We qualitatively analysed the single and multi task models, and found many instances where by leveraging domain-specific information the multi task model was able to accurately identify suggestions. For instance, consider the following review: "Bring a Lan cable and charger for your laptop because house-keeping doesn't provide it." While the review appears to be an assertion (non-suggestion), by predicting its domain (hotel), the multitask model was able to accurately classify it as a suggestion. In this work, we proposed a Multi-task learning framework for Open Domain Suggestion Mining along with a novel language model based over-sampling technique for text-LMOTE. Our experiments revealed that Multi-task learning combined with LMOTE over-sampling outperformed considered alternatives in terms of both the F1-score of the suggestion class and AUC. SMOTE: synthetic minority over-sampling technique Convolutional neural networks for sentence classification Recurrent convolutional neural networks for text classification UMAP: uniform manifold approximation and projection for dimension reduction Suggestion mining from text SemEval-2019 task 9: suggestion mining from online reviews and forums. In: SemEval@NAACL-HLT Deep contextualized word representations