mv: 'input-file.zip' and './input-file.zip' are the same file Creating study carrel named subject-labor-freebo Initializing database Unzipping Archive: input-file.zip inflating: ./tmp/input/A44144.xml inflating: ./tmp/input/xml2htm.xsl inflating: ./tmp/input/metadata.csv inflating: ./tmp/input/A41337.xml caution: excluded filename not matched: *MACOSX* === DIRECTORIES: ./tmp/input === DIRECTORY: === metadata file: ./tmp/input/metadata.csv === found metadata file === updating bibliographic database Building study carrel named subject-labor-freebo May 24, 2021 7:26:09 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. May 24, 2021 7:26:09 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files unless you've excluded the TesseractOCRParser from the default parser. Tesseract may dramatically slow down content extraction (TIKA-2359). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. May 24, 2021 7:26:09 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Starting Apache Tika 1.24.1 server INFO Setting the server's publish address to be http://localhost:9998/ INFO Logging initialized @1819ms to org.eclipse.jetty.util.log.Slf4jLog INFO jetty-9.4.27.v20200227; built: 2020-02-27T18:37:21.340Z; git: a304fd9f351f337e7c0e2a7c28878dd536149c6c; jvm 1.8.0_281-b09 INFO Started ServerConnector@3e74829{HTTP/1.1, (http/1.1)}{localhost:9998} INFO Started @1897ms WARN Empty contextPath INFO Started o.e.j.s.h.ContextHandler@51fadaff{/,null,AVAILABLE} INFO Started Apache Tika server at http://localhost:9998/ INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) FILE: cache/A44144.xml OUTPUT: txt/A44144.txt FILE: cache/A41337.xml OUTPUT: txt/A41337.txt === file2bib.sh === INFO Detecting media type for Filename: b'A44144.xml' INFO rmeta/text (autodetecting type) INFO Detecting media type for Filename: b'A41337.xml' INFO rmeta/text (autodetecting type) A44144 txt/../pos/A44144.pos A44144 txt/../wrd/A44144.wrd A44144 txt/../ent/A44144.ent A41337 txt/../pos/A41337.pos A41337 txt/../ent/A41337.ent === file2bib.sh === id: A44144 author: Hale, Matthew, Sir, 1609-1676. title: A discourse touching provision for the poor written by Sir Matthew Hale ... date: 1683 pages: extension: .xml txt: ./txt/A44144.txt cache: ./cache/A44144.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 23 resourceName b'A44144.xml' A41337 txt/../wrd/A41337.wrd === file2bib.sh === id: A41337 author: Firmin, Thomas, 1632-1697. title: Some proposals for the imployment of the poor, and for the prevention of idleness and the consequence thereof, begging a practice so dishonourable to the nation, and to the Christian religion : in a letter to a friend / by T.F. date: 1681 pages: extension: .xml txt: ./txt/A41337.txt cache: ./cache/A41337.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 33 resourceName b'A41337.xml' Done mapping. Reducing subject-labor-freebo === reduce.pl bib === id = A44144 author = Hale, Matthew, Sir, 1609-1676. title = A discourse touching provision for the poor written by Sir Matthew Hale ... date = 1683 pages = extension = .xml mime = application/xml words = 9734 sentences = 2600 flesch = 92 summary = This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. A discourse touching provision for the poor written by Sir Matthew Hale ... A discourse touching provision for the poor written by Sir Matthew Hale ... EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). After proofreading, the encoding was enhanced and/or corrected and characters marked as illegible were corrected where possible up to a limit of 100 instances per text. cache = ./cache/A44144.xml txt = ./txt/A44144.txt === reduce.pl bib === id = A41337 author = Firmin, Thomas, 1632-1697. title = Some proposals for the imployment of the poor, and for the prevention of idleness and the consequence thereof, begging a practice so dishonourable to the nation, and to the Christian religion : in a letter to a friend / by T.F. date = 1681 pages = extension = .xml mime = application/xml words = 18851 sentences = 4988 flesch = 100 summary = This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. Some proposals for the imployment of the poor, and for the prevention of idleness and the consequence thereof, begging a practice so dishonourable to the nation, and to the Christian religion : in a letter to a friend / by T.F. Some proposals for the imployment of the poor, and for the prevention of idleness and the consequence thereof, begging a practice so dishonourable to the nation, and to the Christian religion : in a letter to a friend / by T.F. EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). cache = ./cache/A41337.xml txt = ./txt/A41337.txt Building ./etc/reader.txt A41337 A44144 A44144 A41337 number of items: 2 sum of words: 28,585 average size in words: 14,292 average readability score: 96 nouns: work; people; time; persons; way; times; pence; children; years; places; others; men; person; want; day; means; nothing; course; t; money; thing; things; reason; man; l.; charity; charge; year; self; text; relief; trade; shillings; penny; kind; provision; pounds; loss; hath; advantage; sort; place; number; hands; end; viz; supply; something; price; power verbs: be; is; are; have; do; were; been; work; being; make; give; was; had; made; take; set; live; said; prevent; get; come; am; put; go; spin; taken; let; brought; found; begging; say; done; pay; known; know; keep; imployed; find; see; means; bring; given; taught; leave; having; employed; employ; answer; receive; provide adjectives: poor; such; many; other; great; able; more; good; own; much; little; better; least; several; idle; same; first; necessary; honest; greater; true; old; few; common; like; due; sure; late; small; reasonable; most; less; fit; convenient; yearly; industrious; whole; general; early; young; short; private; full; english; double; worse; respective; considerable; charitable; best adverbs: not; so; very; more; much; up; as; well; also; out; now; yet; then; only; in; here; rather; indeed; therefore; otherwise; never; first; soon; about; long; even; else; away; too; most; there; again; wholly; once; better; thereof; sometimes; no; greatly; off; enough; already; thus; over; home; hardly; hard; especially; still; possibly pronouns: it; they; i; their; them; our; he; we; you; his; themselves; him; my; us; me; her; your; she; its; himself; one; itself; ye; theirs; thee; ours; hitherto proper nouns: poor; parish; trade; stock; flax; employment; england; cloth; persons; hath; children; tcp; imployment; idleness; kingdom; hemp; city; manufacture; house; linnen; work; master; law; parishes; cloath; woollen; london; laws; god; begging; year; pounds; provision; overseers; age; statute; relief; manufactures; labour; english; compulsary; charity; c.; text; tei; prison; places; peace; loss; justices keywords: trade; poor; stock; persons; people; parish; kingdom; employment; cloth; children one topic; one dimension: poor file(s): ./cache/A41337.xml titles(s): Some proposals for the imployment of the poor, and for the prevention of idleness and the consequence thereof, begging a practice so dishonourable to the nation, and to the Christian religion : in a letter to a friend / by T.F. three topics; one dimension: poor; work; iv file(s): ./cache/A41337.xml, ./cache/A44144.xml, ./cache/A44144.xml titles(s): Some proposals for the imployment of the poor, and for the prevention of idleness and the consequence thereof, begging a practice so dishonourable to the nation, and to the Christian religion : in a letter to a friend / by T.F. | A discourse touching provision for the poor written by Sir Matthew Hale ... | A discourse touching provision for the poor written by Sir Matthew Hale ... five topics; three dimensions: poor people work; work poor stock; bread case thousand; bread case thousand; bread case thousand file(s): ./cache/A41337.xml, ./cache/A44144.xml, ./cache/A44144.xml, ./cache/A44144.xml, ./cache/A44144.xml titles(s): Some proposals for the imployment of the poor, and for the prevention of idleness and the consequence thereof, begging a practice so dishonourable to the nation, and to the Christian religion : in a letter to a friend / by T.F. | A discourse touching provision for the poor written by Sir Matthew Hale ... | A discourse touching provision for the poor written by Sir Matthew Hale ... | A discourse touching provision for the poor written by Sir Matthew Hale ... | A discourse touching provision for the poor written by Sir Matthew Hale ... Type: zip2carrel title: subject-labor-freebo date: 2021-05-24 time: 19:12 username: emorgan patron: Eric Morgan email: emorgan@nd.edu input: input-file.zip ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: A41337 author: Firmin, Thomas, 1632-1697. title: Some proposals for the imployment of the poor, and for the prevention of idleness and the consequence thereof, begging a practice so dishonourable to the nation, and to the Christian religion : in a letter to a friend / by T.F. date: 1681 words: 18851 sentences: 4988 pages: flesch: 100 cache: ./cache/A41337.xml txt: ./txt/A41337.txt summary: This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. Some proposals for the imployment of the poor, and for the prevention of idleness and the consequence thereof, begging a practice so dishonourable to the nation, and to the Christian religion : in a letter to a friend / by T.F. Some proposals for the imployment of the poor, and for the prevention of idleness and the consequence thereof, begging a practice so dishonourable to the nation, and to the Christian religion : in a letter to a friend / by T.F. EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). id: A44144 author: Hale, Matthew, Sir, 1609-1676. title: A discourse touching provision for the poor written by Sir Matthew Hale ... date: 1683 words: 9734 sentences: 2600 pages: flesch: 92 cache: ./cache/A44144.xml txt: ./txt/A44144.txt summary: This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. A discourse touching provision for the poor written by Sir Matthew Hale ... A discourse touching provision for the poor written by Sir Matthew Hale ... EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). After proofreading, the encoding was enhanced and/or corrected and characters marked as illegible were corrected where possible up to a limit of 100 instances per text. ==== make-pages.sh questions ==== make-pages.sh search ==== make-pages.sh topic modeling corpus Zipping study carrel