mv: 'input-file.zip' and './input-file.zip' are the same file Creating study carrel named subject-sex-freebo Initializing database Unzipping Archive: input-file.zip inflating: ./tmp/input/A38586.xml inflating: ./tmp/input/A55529.xml inflating: ./tmp/input/xml2htm.xsl inflating: ./tmp/input/metadata.csv inflating: ./tmp/input/A39351.xml caution: excluded filename not matched: *MACOSX* === DIRECTORIES: ./tmp/input === DIRECTORY: === metadata file: ./tmp/input/metadata.csv === found metadata file === updating bibliographic database Building study carrel named subject-sex-freebo May 25, 2021 12:04:49 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. May 25, 2021 12:04:49 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files unless you've excluded the TesseractOCRParser from the default parser. Tesseract may dramatically slow down content extraction (TIKA-2359). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. May 25, 2021 12:04:49 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Starting Apache Tika 1.24.1 server INFO Setting the server's publish address to be http://localhost:9998/ INFO Logging initialized @3107ms to org.eclipse.jetty.util.log.Slf4jLog INFO jetty-9.4.27.v20200227; built: 2020-02-27T18:37:21.340Z; git: a304fd9f351f337e7c0e2a7c28878dd536149c6c; jvm 1.8.0_281-b09 INFO Started ServerConnector@3e74829{HTTP/1.1, (http/1.1)}{localhost:9998} INFO Started @3211ms WARN Empty contextPath INFO Started o.e.j.s.h.ContextHandler@62010f5c{/,null,AVAILABLE} INFO Started Apache Tika server at http://localhost:9998/ INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) FILE: cache/A39351.xml OUTPUT: txt/A39351.txt FILE: cache/A38586.xml OUTPUT: txt/A38586.txt FILE: cache/A55529.xml OUTPUT: txt/A55529.txt === file2bib.sh === INFO Detecting media type for Filename: b'A39351.xml' INFO Detecting media type for Filename: b'A38586.xml' INFO Detecting media type for Filename: b'A55529.xml' INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) A39351 txt/../pos/A39351.pos A39351 txt/../ent/A39351.ent A39351 txt/../wrd/A39351.wrd === file2bib.sh === id: A39351 author: Elys, Edmund, ca. 1634-ca. 1707. title: An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley's lascivious and prophane verses / by a dutiful son of the Church of England. date: 1670 pages: extension: .xml txt: ./txt/A39351.txt cache: ./cache/A39351.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 30 resourceName b'A39351.xml' A38586 txt/../pos/A38586.pos A55529 txt/../pos/A55529.pos A38586 txt/../ent/A38586.ent A55529 txt/../ent/A55529.ent A38586 txt/../wrd/A38586.wrd A55529 txt/../wrd/A55529.wrd === file2bib.sh === id: A55529 author: A. L. title: The woman as good as the man, or, The equallity of both sexes written originally in French and translated into English by A.L. date: 1677 pages: extension: .xml txt: ./txt/A55529.txt cache: ./cache/A55529.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 100 resourceName b'A55529.xml' === file2bib.sh === id: A38586 author: Cotton, Charles, 1630-1687. title: Erōtopolis, the present state of Betty-land date: 1684 pages: extension: .xml txt: ./txt/A38586.txt cache: ./cache/A38586.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 88 resourceName b'A38586.xml' Done mapping. Reducing subject-sex-freebo === reduce.pl bib === id = A38586 author = Cotton, Charles, 1630-1687. title = Erōtopolis, the present state of Betty-land date = 1684 pages = extension = .xml mime = application/xml words = 27500 sentences = 8225 flesch = 96 summary = This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). The general aim of EEBO-TCP is to encode one copy (usually the first edition) of every monographic English-language title published between 1473 and 1700 available in EEBO. EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). Selection was intended to range over a wide variety of subject areas, to reflect the true nature of the print record of the period. cache = ./cache/A38586.xml txt = ./txt/A38586.txt === reduce.pl bib === id = A39351 author = Elys, Edmund, ca. 1634-ca. 1707. title = An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley's lascivious and prophane verses / by a dutiful son of the Church of England. date = 1670 pages = extension = .xml mime = application/xml words = 4219 sentences = 1158 flesch = 87 summary = An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley's lascivious and prophane verses / by a dutiful son of the Church of England. An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley's lascivious and prophane verses / by a dutiful son of the Church of England. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). cache = ./cache/A39351.xml txt = ./txt/A39351.txt === reduce.pl bib === id = A55529 author = A. L. title = The woman as good as the man, or, The equallity of both sexes written originally in French and translated into English by A.L. date = 1677 pages = extension = .xml mime = application/xml words = 36429 sentences = 10398 flesch = 91 summary = The woman as good as the man, or, The equallity of both sexes written originally in French and translated into English by A.L. The woman as good as the man, or, The equallity of both sexes written originally in French and translated into English by A.L. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). Selection was intended to range over a wide variety of subject areas, to reflect the true nature of the print record of the period. cache = ./cache/A55529.xml txt = ./txt/A55529.txt Building ./etc/reader.txt A55529 A38586 A39351 A55529 A39351 A38586 number of items: 3 sum of words: 68,148 average size in words: 22,716 average readability score: 91 nouns: men; land; women; nothing; others; time; things; reason; thing; self; man; part; one; manner; mind; body; nature; vvomen; sex; world; way; times; t; knowledge; country; hand; use; parts; end; none; place; people; selves; work; woman; truth; sort; kind; words; night; day; text; pleasure; hath; author; ▪; works; trouble; tho; shepherdesses verbs: is; are; have; be; had; was; were; been; being; make; do; find; made; say; see; know; take; said; give; having; think; let; come; put; according; speak; makes; quoth; called; learned; did; taken; believe; has; does; consider; go; found; seeing; render; look; learn; thought; set; observe; done; known; hear; considered; appear adjectives: great; other; many; more; such; same; good; much; own; little; capable; true; poor; most; different; whole; greatest; certain; natural; first; necessary; greater; common; several; young; old; ordinary; least; general; better; very; proper; large; less; strange; particular; few; fit; able; peculiar; new; best; equal; perfect; small; short; present; full; fair; contrary adverbs: not; so; more; as; most; very; only; then; never; up; well; thereof; now; out; less; here; there; much; therefore; often; no; far; too; together; therein; even; also; still; sometimes; all; ever; easily; yet; enough; at; in; almost; again; indeed; else; altogether; over; long; thus; rather; likewise; thereby; first; down; quite pronouns: they; it; their; them; we; his; he; i; her; themselves; she; him; us; you; our; my; me; himself; its; your; thy; ours; one; ye; thee; theirs; herself; whosoever; l; ve; mine; hers; em; ''em proper nouns: 〉; ◊; 〈; betty; men; syrens; shepherd; ●; women; vertue; world; country; sciences; eumolpus; eucolpius; tcp; love; god; husbandmen; nature; law; hath; vvomen; man; sexes; english; farmer; syren; natural; sun; shepherdesses; persons; society; husbandman; spirit; shepherds; custom; soyl; learning; discourse; mr.; lord; church; authority; justice; wit; truth; text; shepherdess; farm keywords: tcp; love; world; women; vertue; syrens; sun; soyl; shepherdess; shepherd; sexes; sex; sciences; reason; persons; notions; nature; natural; mind; men; man; law; land; knowledge; husbandmen; great; god; farms; farmer; eumolpus; eucolpius; cowley; country; children; body; betty one topic; one dimension: men file(s): ./cache/A55529.xml titles(s): The woman as good as the man, or, The equallity of both sexes written originally in French and translated into English by A.L. three topics; one dimension: men; land; shall file(s): ./cache/A55529.xml, ./cache/A38586.xml, ./cache/A39351.xml titles(s): The woman as good as the man, or, The equallity of both sexes written originally in French and translated into English by A.L. | Erōtopolis, the present state of Betty-land | An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley''s lascivious and prophane verses / by a dutiful son of the Church of England. five topics; three dimensions: men great land; 01 intreat lumbum; 01 intreat lumbum; 01 intreat lumbum; 01 intreat lumbum file(s): ./cache/A55529.xml, ./cache/A39351.xml, ./cache/A39351.xml, ./cache/A39351.xml, ./cache/A39351.xml titles(s): The woman as good as the man, or, The equallity of both sexes written originally in French and translated into English by A.L. | An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley''s lascivious and prophane verses / by a dutiful son of the Church of England. | An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley''s lascivious and prophane verses / by a dutiful son of the Church of England. | An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley''s lascivious and prophane verses / by a dutiful son of the Church of England. | An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley''s lascivious and prophane verses / by a dutiful son of the Church of England. Type: zip2carrel title: subject-sex-freebo date: 2021-05-25 time: 12:04 username: emorgan patron: Eric Morgan email: emorgan@nd.edu input: input-file.zip ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: A55529 author: A. L. title: The woman as good as the man, or, The equallity of both sexes written originally in French and translated into English by A.L. date: 1677 words: 36429 sentences: 10398 pages: flesch: 91 cache: ./cache/A55529.xml txt: ./txt/A55529.txt summary: The woman as good as the man, or, The equallity of both sexes written originally in French and translated into English by A.L. The woman as good as the man, or, The equallity of both sexes written originally in French and translated into English by A.L. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). Selection was intended to range over a wide variety of subject areas, to reflect the true nature of the print record of the period. id: A38586 author: Cotton, Charles, 1630-1687. title: Erōtopolis, the present state of Betty-land date: 1684 words: 27500 sentences: 8225 pages: flesch: 96 cache: ./cache/A38586.xml txt: ./txt/A38586.txt summary: This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). The general aim of EEBO-TCP is to encode one copy (usually the first edition) of every monographic English-language title published between 1473 and 1700 available in EEBO. EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). Selection was intended to range over a wide variety of subject areas, to reflect the true nature of the print record of the period. id: A39351 author: Elys, Edmund, ca. 1634-ca. 1707. title: An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley''s lascivious and prophane verses / by a dutiful son of the Church of England. date: 1670 words: 4219 sentences: 1158 pages: flesch: 87 cache: ./cache/A39351.xml txt: ./txt/A39351.txt summary: An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley''s lascivious and prophane verses / by a dutiful son of the Church of England. An exclamation to all those that love the Lord Jesus in sincerity against an apology written by an ingenious person, for Mr. Cowley''s lascivious and prophane verses / by a dutiful son of the Church of England. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). ==== make-pages.sh questions ==== make-pages.sh search ==== make-pages.sh topic modeling corpus Zipping study carrel