id author title date pages extension mime words sentences flesch summary cache txt work_nif4ctnsxrbwdivnzcjhhmgwlu Jefrey Lijffijt Significance testing of word frequencies in corpora 2014.0 52 .pdf application/pdf 13207 1059 70 Comparison of word frequencies is among the core methods in corpus linguistics and is statistical significance of differences in word frequencies between corpora. the case of comparing the frequencies of a given word in two corpora the test statistic is tests provide more conservative p-values than those that are provided by bag-of-wordsbased models (i.e. tests based on the assumption that all words are statistically The χ2 and log-likelihood ratio tests are based on the bag-of-words model (illustrated in British National Corpus and the expected frequency distribution in the bag-of-words The p-value can be obtained by comparing the test statistic to a table of χ2 distributions. p-value is computed by comparing the test statistic to a table of χ2 distributions. Fig. 4 The results of the uniformity test for all six methods based on random word assignments (rather which word-frequency distributions display statistically significant gender differences ./cache/work_nif4ctnsxrbwdivnzcjhhmgwlu.pdf ./txt/work_nif4ctnsxrbwdivnzcjhhmgwlu.txt