id author title date pages extension mime words sentences flesch summary cache txt cord-020916-ds0cf78u Fard, Mazar Moradi Seed-Guided Deep Document Clustering 2020-03-17 .txt text/plain 5079 265 57 The main contributions of this study can be summarized as follows: (a) We introduce the Seed-guided Deep Document Clustering (SD2C) framework, 1 the first attempt, to the best of our knowledge, to constrain clustering with seed words based on a deep clustering approach; and (b) we validate this framework through experiments based on automatically selected seed words on five publicly available text datasets with various sizes and characteristics. The constrained clustering problem we are addressing in fact bears strong similarity with the one of seed-guided dataless text classification, which consist in categorizing documents based on a small set of seed words describing the classes/clusters. This can be done by enforcing that seed words have more influence either on the learned document embeddings, a solution we refer to as SD2C-Doc, or on the cluster representatives, a solution we refer to as SD2C-Rep. Note that the second solution can only be used when the clustering process is based on cluster representatives (i.e., R = {r k } K k=1 with K the number of clusters), which is indeed the case for most current deep clustering methods [1] . ./cache/cord-020916-ds0cf78u.txt ./txt/cord-020916-ds0cf78u.txt