mv: ‘./input-file.zip’ and ‘./input-file.zip’ are the same file Creating study carrel named subject-separatedPeople-gutenberg Initializing database Unzipping Archive: input-file.zip creating: ./tmp/input/input-file/ inflating: ./tmp/input/input-file/4469.txt inflating: ./tmp/input/input-file/4467.txt inflating: ./tmp/input/input-file/4468.txt inflating: ./tmp/input/input-file/4466.txt inflating: ./tmp/input/input-file/4465.txt inflating: ./tmp/input/input-file/541.txt inflating: ./tmp/input/input-file/metadata.csv caution: excluded filename not matched: *MACOSX* === DIRECTORIES: ./tmp/input === DIRECTORY: ./tmp/input/input-file === metadata file: ./tmp/input/input-file/metadata.csv === found metadata file === updating bibliographic database Building study carrel named subject-separatedPeople-gutenberg FILE: cache/4468.txt OUTPUT: txt/4468.txt FILE: cache/4466.txt OUTPUT: txt/4466.txt FILE: cache/4467.txt OUTPUT: txt/4467.txt FILE: cache/4469.txt OUTPUT: txt/4469.txt FILE: cache/4465.txt OUTPUT: txt/4465.txt FILE: cache/541.txt OUTPUT: txt/541.txt === file2bib.sh === id: 4466 author: Meredith, George title: Diana of the Crossways — Volume 2 date: pages: extension: .txt txt: ./txt/4466.txt cache: ./cache/4466.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 1 resourceName b'4466.txt' Traceback (most recent call last): File "/data-disk/reader-compute/reader-classic/bin/file2bib.py", line 107, in text = textacy.preprocessing.normalize.normalize_quotation_marks( text ) File "/data-disk/python/lib/python3.8/site-packages/textacy/preprocessing/normalize.py", line 32, in normalize_quotation_marks return text.translate(QUOTE_TRANSLATION_TABLE) AttributeError: 'NoneType' object has no attribute 'translate' 4469 txt/../pos/4469.pos === file2bib.sh === id: 4465 author: Meredith, George title: Diana of the Crossways — Volume 1 date: pages: extension: .txt txt: ./txt/4465.txt cache: ./cache/4465.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'4465.txt' Traceback (most recent call last): File "/data-disk/reader-compute/reader-classic/bin/file2bib.py", line 107, in text = textacy.preprocessing.normalize.normalize_quotation_marks( text ) File "/data-disk/python/lib/python3.8/site-packages/textacy/preprocessing/normalize.py", line 32, in normalize_quotation_marks return text.translate(QUOTE_TRANSLATION_TABLE) AttributeError: 'NoneType' object has no attribute 'translate' 4467 txt/../wrd/4467.wrd Traceback (most recent call last): File "/data-disk/reader-compute/reader-classic/bin/txt2keywords.py", line 54, in for keyword, score in ( yake( doc, ngrams=NGRAMS, topn=TOPN ) ) : File "/data-disk/python/lib/python3.8/site-packages/textacy/ke/yake.py", line 96, in yake word_scores = _compute_word_scores(doc, word_occ_vals, word_freqs, stop_words) File "/data-disk/python/lib/python3.8/site-packages/textacy/ke/yake.py", line 205, in _compute_word_scores freq_baseline = statistics.mean(freqs_nsw) + statistics.stdev(freqs_nsw) File "/data-disk/python/lib/python3.8/statistics.py", line 315, in mean raise StatisticsError('mean requires at least one data point') statistics.StatisticsError: mean requires at least one data point === file2bib.sh === id: 4467 author: Meredith, George title: Diana of the Crossways — Volume 3 date: pages: extension: .txt txt: ./txt/4467.txt cache: ./cache/4467.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 1 resourceName b'4467.txt' Traceback (most recent call last): File "/data-disk/reader-compute/reader-classic/bin/file2bib.py", line 107, in text = textacy.preprocessing.normalize.normalize_quotation_marks( text ) File "/data-disk/python/lib/python3.8/site-packages/textacy/preprocessing/normalize.py", line 32, in normalize_quotation_marks return text.translate(QUOTE_TRANSLATION_TABLE) AttributeError: 'NoneType' object has no attribute 'translate' 4465 txt/../ent/4465.ent 4469 txt/../wrd/4469.wrd Traceback (most recent call last): File "/data-disk/reader-compute/reader-classic/bin/txt2keywords.py", line 54, in for keyword, score in ( yake( doc, ngrams=NGRAMS, topn=TOPN ) ) : File "/data-disk/python/lib/python3.8/site-packages/textacy/ke/yake.py", line 96, in yake word_scores = _compute_word_scores(doc, word_occ_vals, word_freqs, stop_words) File "/data-disk/python/lib/python3.8/site-packages/textacy/ke/yake.py", line 205, in _compute_word_scores freq_baseline = statistics.mean(freqs_nsw) + statistics.stdev(freqs_nsw) File "/data-disk/python/lib/python3.8/statistics.py", line 315, in mean raise StatisticsError('mean requires at least one data point') statistics.StatisticsError: mean requires at least one data point === file2bib.sh === id: 4469 author: Meredith, George title: Diana of the Crossways — Volume 5 date: pages: extension: .txt txt: ./txt/4469.txt cache: ./cache/4469.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 1 resourceName b'4469.txt' Traceback (most recent call last): File "/data-disk/reader-compute/reader-classic/bin/file2bib.py", line 107, in text = textacy.preprocessing.normalize.normalize_quotation_marks( text ) File "/data-disk/python/lib/python3.8/site-packages/textacy/preprocessing/normalize.py", line 32, in normalize_quotation_marks return text.translate(QUOTE_TRANSLATION_TABLE) AttributeError: 'NoneType' object has no attribute 'translate' 4468 txt/../ent/4468.ent === file2bib.sh === id: 4468 author: Meredith, George title: Diana of the Crossways — Volume 4 date: pages: extension: .txt txt: ./txt/4468.txt cache: ./cache/4468.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'4468.txt' Traceback (most recent call last): File "/data-disk/reader-compute/reader-classic/bin/file2bib.py", line 107, in text = textacy.preprocessing.normalize.normalize_quotation_marks( text ) File "/data-disk/python/lib/python3.8/site-packages/textacy/preprocessing/normalize.py", line 32, in normalize_quotation_marks return text.translate(QUOTE_TRANSLATION_TABLE) AttributeError: 'NoneType' object has no attribute 'translate' 4465 txt/../pos/4465.pos 4467 txt/../pos/4467.pos 4466 txt/../pos/4466.pos 4469 txt/../ent/4469.ent 4468 txt/../pos/4468.pos 4466 txt/../wrd/4466.wrd Traceback (most recent call last): File "/data-disk/reader-compute/reader-classic/bin/txt2keywords.py", line 54, in for keyword, score in ( yake( doc, ngrams=NGRAMS, topn=TOPN ) ) : File "/data-disk/python/lib/python3.8/site-packages/textacy/ke/yake.py", line 96, in yake word_scores = _compute_word_scores(doc, word_occ_vals, word_freqs, stop_words) File "/data-disk/python/lib/python3.8/site-packages/textacy/ke/yake.py", line 205, in _compute_word_scores freq_baseline = statistics.mean(freqs_nsw) + statistics.stdev(freqs_nsw) File "/data-disk/python/lib/python3.8/statistics.py", line 315, in mean raise StatisticsError('mean requires at least one data point') statistics.StatisticsError: mean requires at least one data point 4466 txt/../ent/4466.ent 4467 txt/../ent/4467.ent 4465 txt/../wrd/4465.wrd Traceback (most recent call last): File "/data-disk/reader-compute/reader-classic/bin/txt2keywords.py", line 54, in for keyword, score in ( yake( doc, ngrams=NGRAMS, topn=TOPN ) ) : File "/data-disk/python/lib/python3.8/site-packages/textacy/ke/yake.py", line 96, in yake word_scores = _compute_word_scores(doc, word_occ_vals, word_freqs, stop_words) File "/data-disk/python/lib/python3.8/site-packages/textacy/ke/yake.py", line 205, in _compute_word_scores freq_baseline = statistics.mean(freqs_nsw) + statistics.stdev(freqs_nsw) File "/data-disk/python/lib/python3.8/statistics.py", line 315, in mean raise StatisticsError('mean requires at least one data point') statistics.StatisticsError: mean requires at least one data point 4468 txt/../wrd/4468.wrd Traceback (most recent call last): File "/data-disk/reader-compute/reader-classic/bin/txt2keywords.py", line 54, in for keyword, score in ( yake( doc, ngrams=NGRAMS, topn=TOPN ) ) : File "/data-disk/python/lib/python3.8/site-packages/textacy/ke/yake.py", line 96, in yake word_scores = _compute_word_scores(doc, word_occ_vals, word_freqs, stop_words) File "/data-disk/python/lib/python3.8/site-packages/textacy/ke/yake.py", line 205, in _compute_word_scores freq_baseline = statistics.mean(freqs_nsw) + statistics.stdev(freqs_nsw) File "/data-disk/python/lib/python3.8/statistics.py", line 315, in mean raise StatisticsError('mean requires at least one data point') statistics.StatisticsError: mean requires at least one data point 541 txt/../wrd/541.wrd 541 txt/../pos/541.pos 541 txt/../ent/541.ent === file2bib.sh === id: 541 author: Wharton, Edith title: The Age of Innocence date: pages: extension: .txt txt: ./txt/541.txt cache: ./cache/541.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 5 resourceName b'541.txt' Done mapping. Reducing subject-separatedPeople-gutenberg === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === id = 541 author = Wharton, Edith title = The Age of Innocence date = pages = extension = .txt mime = text/plain words = 104804 sentences = 5962 flesch = 80 summary = looks a little bare to old-fashioned eyes," Mrs. Welland had explained, "Good-bye; come and see me some day," she said, still looking at Archer. No one alluded to Ellen Olenska; but Archer knew that Mrs. Welland was "It's a pity the Beauforts asked her," Mrs. Archer said gently. "Oh, necessarily; Beaufort is a vulgar man," said Mrs. Archer. that Olenska woman's comings and goings I don't see," Mrs. Archer "Janey!" said her mother; and Miss Archer blushed and tried to look Mrs. Archer and her son and daughter, like every one else in New York, "It's just my old-fashioned feeling; dear May is my ideal," said Mrs. Archer. Archer had left St. Augustine charged with many messages for old Mrs. Mingott; and a day or two after his return to town he called on her. As Mrs. Archer said, it made "another thing of London" to know Mrs. Carfry and Miss Harle; and by the time that Newland became engaged the cache = ./cache/541.txt txt = ./txt/541.txt Building ./etc/reader.txt 541 4469 4468 541 4469 4468 number of items: 6 sum of words: 104,804 average size in words: 104,804 average readability score: 80 nouns: man; eyes; room; people; house; one; time; family; face; things; way; wife; hand; husband; moment; day; mother; life; door; head; nothing; evening; voice; side; woman; fact; thing; world; lady; men; dinner; carriage; something; hands; course; drawing; cousin; kind; table; society; place; smile; night; box; words; tone; marriage; law; case; years verbs: had; was; be; were; have; been; said; ''s; do; is; did; see; know; made; go; looked; say; ''ve; come; felt; knew; are; seemed; saw; thought; think; went; came; going; turned; ''m; has; asked; ''re; stood; tell; sat; want; being; looking; continued; take; put; having; told; met; make; heard; look; get adjectives: old; young; other; little; own; long; such; good; same; first; last; new; more; poor; dear; few; great; white; small; many; sure; pale; only; next; usual; right; much; least; silent; large; black; afraid; faint; private; different; aware; full; whole; real; yellow; best; very; social; low; certain; alone; happy; foreign; dark; blue adverbs: not; n''t; so; up; out; then; back; always; as; only; never; down; away; here; still; too; again; even; now; just; more; off; there; all; on; ever; very; suddenly; once; much; together; almost; over; most; rather; long; in; after; perhaps; really; already; yet; well; far; else; of; less; simply; instead; home pronouns: her; he; she; his; it; i; you; him; they; their; me; my; them; we; himself; your; its; herself; our; one; us; themselves; itself; myself; yourself; hers; yours; i''m; ourselves; mine; ''s; ''em; theirs; you''re; you''ll; yes--; thought--; settled--; ours; oh"--she; je; evadee-- proper nouns: archer; mrs.; olenska; mr.; may; madame; new; beaufort; york; newland; welland; van; mingott; ellen; luyden; der; countess; jackson; manson; letterblair; janey; sillerton; miss; dallas; lefferts; winsett; medora; m.; duke; riviere; granny; lovell; paris; avenue; catherine; opera; luydens; washington; struthers; lawrence; mingotts; dr.; fifth; london; beauforts; st.; skuytercliff; louisa; regina; olenski keywords: york; winsett; welland; sillerton; riviere; paris; olenska; newland; new; mrs.; mr.; miss; mingott; medora; manson; madame; luyden; letterblair; lefferts; janey; jackson; granny; ellen; duke; countess; beaufort; avenue; archer one topic; one dimension: archer file(s): titles(s): Diana of the Crossways — Volume 5 three topics; one dimension: archer; zest; zest file(s): ./cache/541.txt, , titles(s): The Age of Innocence | Diana of the Crossways — Volume 5 | Diana of the Crossways — Volume 5 five topics; three dimensions: archer mrs said; zest immovably impartial; zest immovably impartial; zest immovably impartial; zest immovably impartial file(s): ./cache/541.txt, , , , titles(s): The Age of Innocence | Diana of the Crossways — Volume 5 | Diana of the Crossways — Volume 5 | Diana of the Crossways — Volume 5 | Diana of the Crossways — Volume 5 Type: gutenberg title: subject-separatedPeople-gutenberg date: 2021-06-09 time: 23:06 username: emorgan patron: Eric Morgan email: emorgan@nd.edu input: facet_subject:"Separated people" ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: 4469 author: Meredith, George title: Diana of the Crossways — Volume 5 date: words: nan sentences: nan pages: flesch: nan cache: txt: summary: id: 4467 author: Meredith, George title: Diana of the Crossways — Volume 3 date: words: nan sentences: nan pages: flesch: nan cache: txt: summary: id: 4468 author: Meredith, George title: Diana of the Crossways — Volume 4 date: words: nan sentences: nan pages: flesch: nan cache: txt: summary: id: 4466 author: Meredith, George title: Diana of the Crossways — Volume 2 date: words: nan sentences: nan pages: flesch: nan cache: txt: summary: id: 4465 author: Meredith, George title: Diana of the Crossways — Volume 1 date: words: nan sentences: nan pages: flesch: nan cache: txt: summary: id: 541 author: Wharton, Edith title: The Age of Innocence date: words: 104804.0 sentences: 5962.0 pages: flesch: 80.0 cache: ./cache/541.txt txt: ./txt/541.txt summary: looks a little bare to old-fashioned eyes," Mrs. Welland had explained, "Good-bye; come and see me some day," she said, still looking at Archer. No one alluded to Ellen Olenska; but Archer knew that Mrs. Welland was "It''s a pity the Beauforts asked her," Mrs. Archer said gently. "Oh, necessarily; Beaufort is a vulgar man," said Mrs. Archer. that Olenska woman''s comings and goings I don''t see," Mrs. Archer "Janey!" said her mother; and Miss Archer blushed and tried to look Mrs. Archer and her son and daughter, like every one else in New York, "It''s just my old-fashioned feeling; dear May is my ideal," said Mrs. Archer. Archer had left St. Augustine charged with many messages for old Mrs. Mingott; and a day or two after his return to town he called on her. As Mrs. Archer said, it made "another thing of London" to know Mrs. Carfry and Miss Harle; and by the time that Newland became engaged the ==== make-pages.sh questions ==== make-pages.sh search ==== make-pages.sh topic modeling corpus Zipping study carrel