id author title date pages extension mime words sentences flesch summary cache txt 3797 Dattola, R. T. A Fast Algorithm for Automatic Classification 1969-03-01 18 .pdf application/pdf 6356 775 81 In information retrieval applications, the number of elements may approach several hundred thousand or even a million documents, as in the case of a large library. This immediately poses two serious problems: the storage space necessary to store the matrix increases as the square of the number of documents, and the time required to calculate the matrix also increases quadratically. Tables 3 and 4 show the results of scoring the documents in the sample collection against the profiles from Table 1 (cutoff= 10). Clusters Resulting from Document Scoring the new algorithm, documents are assigned to more than one cluster on loose documents to be assigned to clusters after the first iteration; a document to score higher against profiles of smaller clusters. a document to score higher against profiles of smaller clusters. If the number of documents in the cluster is not too large, number of clusters desired; 2) approximate percentage of loose documents ./cache/3797.pdf ./txt/3797.txt