id author title date pages extension mime words sentences flesch summary cache txt work_35b465fwlnexvbuebingsjly4y Ying Liu Imbalanced text classification: A term weighting approach 2009 12 .pdf application/pdf 8401 864 64 We tackle this problem using a simple probability based term weighting scheme to better distinguish documents in minor categories. weighting schemes over two benchmarking data sets, including Reuters-21578, shows significant improvement for minor categories, while Our approach has suggested a simple and effective solution to boost the performance of text classification over skewed data sets. Keywords: Text classification; Imbalanced data; Term weighting scheme used in TC are either naturally skewed or artificially imbalanced especially in the binary and so called ''one-againstall" settings, classifiers often perform far less than satisfactorily for minor categories (Lewis, Yang, Rose, & Li, 2004; weighting has long been formulated in a form as term frequency times inverse documents frequency, i.e. tfidf (BaezaYates & Ribeiro-Neto, 1999; Salton & Buckley, 1988; Salton & McGill, 1983; van-Rijsbergen, 1979). Macro-averaged F1-values of TFIDF and probability based term weights F1 scores of TFIDF and the probability based term weighting scheme tested over Reuters-21578 using both SVM and CompNB. ./cache/work_35b465fwlnexvbuebingsjly4y.pdf ./txt/work_35b465fwlnexvbuebingsjly4y.txt