mv: 'input-file.zip' and './input-file.zip' are the same file Creating study carrel named subject-patentMedicines-freebo Initializing database Unzipping Archive: input-file.zip inflating: ./tmp/input/A04775.xml inflating: ./tmp/input/B00564.xml inflating: ./tmp/input/A84859.xml inflating: ./tmp/input/xml2htm.xsl inflating: ./tmp/input/metadata.csv inflating: ./tmp/input/A93444.xml inflating: ./tmp/input/A08786.xml caution: excluded filename not matched: *MACOSX* === DIRECTORIES: ./tmp/input === DIRECTORY: === metadata file: ./tmp/input/metadata.csv === found metadata file === updating bibliographic database Building study carrel named subject-patentMedicines-freebo May 24, 2021 8:07:44 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. May 24, 2021 8:07:45 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files unless you've excluded the TesseractOCRParser from the default parser. Tesseract may dramatically slow down content extraction (TIKA-2359). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. May 24, 2021 8:07:45 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Starting Apache Tika 1.24.1 server INFO Setting the server's publish address to be http://localhost:9998/ INFO Logging initialized @3900ms to org.eclipse.jetty.util.log.Slf4jLog INFO jetty-9.4.27.v20200227; built: 2020-02-27T18:37:21.340Z; git: a304fd9f351f337e7c0e2a7c28878dd536149c6c; jvm 1.8.0_281-b09 INFO Started ServerConnector@3e74829{HTTP/1.1, (http/1.1)}{localhost:9998} INFO Started @4020ms WARN Empty contextPath INFO Started o.e.j.s.h.ContextHandler@51fadaff{/,null,AVAILABLE} INFO Started Apache Tika server at http://localhost:9998/ INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) FILE: cache/A08786.xml OUTPUT: txt/A08786.txt FILE: cache/A93444.xml OUTPUT: txt/A93444.txt FILE: cache/A84859.xml OUTPUT: txt/A84859.txt FILE: cache/A04775.xml OUTPUT: txt/A04775.txt FILE: cache/B00564.xml OUTPUT: txt/B00564.txt === file2bib.sh === INFO Detecting media type for Filename: b'A84859.xml' INFO Detecting media type for Filename: b'A08786.xml' INFO Detecting media type for Filename: b'A04775.xml' INFO Detecting media type for Filename: b'B00564.xml' INFO Detecting media type for Filename: b'A93444.xml' INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) A84859 txt/../pos/A84859.pos A84859 txt/../wrd/A84859.wrd A04775 txt/../pos/A04775.pos A93444 txt/../wrd/A93444.wrd A08786 txt/../pos/A08786.pos A08786 txt/../wrd/A08786.wrd B00564 txt/../pos/B00564.pos A04775 txt/../wrd/A04775.wrd A93444 txt/../ent/A93444.ent A93444 txt/../pos/A93444.pos A08786 txt/../ent/A08786.ent A84859 txt/../ent/A84859.ent A04775 txt/../ent/A04775.ent B00564 txt/../ent/B00564.ent === file2bib.sh === id: B00564 author: Plat, Hugh, Sir, 1552-1611? title: Certaine philosophical preparations of foode and beverage for sea-men, in their long voyages: with some necessary, approoued, and hermeticall medicines and antidotes, fit to be had in readinesse at sea, for preuention or cure of diuers diseases. date: 1607 pages: extension: .xml txt: ./txt/B00564.txt cache: ./cache/B00564.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 21 resourceName b'B00564.xml' B00564 txt/../wrd/B00564.wrd === file2bib.sh === id: A84859 author: Francesse, Peter. title: All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... date: 1656 pages: extension: .xml txt: ./txt/A84859.txt cache: ./cache/A84859.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 5 resourceName b'A84859.xml' === file2bib.sh === id: A93444 author: Snead, Richard, d. 1711. title: Dear Friends all unto whom this may come; date: 1681 pages: extension: .xml txt: ./txt/A93444.txt cache: ./cache/A93444.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 18 resourceName b'A93444.xml' === file2bib.sh === id: A04775 author: Kellicke, Richard. title: Soli deo gloria know all men by these present, that I, Richard Kellicke, professor of physicke and chyrurgery, borne in England, and am now lately come from beyond the seas ... date: 1625 pages: extension: .xml txt: ./txt/A04775.txt cache: ./cache/A04775.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 12 resourceName b'A04775.xml' === file2bib.sh === id: A08786 author: N. P., Master of Arts, and minister of Gods word. title: The vertue and operation of this balsame date: 1615 pages: extension: .xml txt: ./txt/A08786.txt cache: ./cache/A08786.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 9 resourceName b'A08786.xml' Done mapping. Reducing subject-patentMedicines-freebo === reduce.pl bib === id = A84859 author = Francesse, Peter. title = All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... date = 1656 pages = extension = .xml mime = application/xml words = 680 sentences = 110 flesch = 87 summary = All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... An advertisement of a cure for gout and sciatica offered by Peter Francesse.--Thomason catalogue. civilwar no All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that Francesse, Peter. cache = ./cache/A84859.xml txt = ./txt/A84859.txt === reduce.pl bib === id = A93444 author = Snead, Richard, d. 1711. title = Dear Friends all unto whom this may come; date = 1681 pages = extension = .xml mime = application/xml words = 1556 sentences = 266 flesch = 80 summary = This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. Dear Friends all unto whom this may come; Dear Friends all unto whom this may come; EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). The texts were encoded and linked to page images in accordance with level 4 of the TEI in Libraries guidelines. cache = ./cache/A93444.xml txt = ./txt/A93444.txt === reduce.pl bib === id = A08786 author = N. P., Master of Arts, and minister of Gods word. title = The vertue and operation of this balsame date = 1615 pages = extension = .xml mime = application/xml words = 1806 sentences = 326 flesch = 88 summary = This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. "This Balsam, made by N.P. Master of Arts, and Minister of Gods Word, is to be sold in Maiden Lane, at the signe of the Crowne ouer against Goldsmiths Hall, where it hath beene sold, and the premises approued these fourescore yeares. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). cache = ./cache/A08786.xml txt = ./txt/A08786.txt === reduce.pl bib === id = A04775 author = Kellicke, Richard. title = Soli deo gloria know all men by these present, that I, Richard Kellicke, professor of physicke and chyrurgery, borne in England, and am now lately come from beyond the seas ... date = 1625 pages = extension = .xml mime = application/xml words = 1871 sentences = 354 flesch = 90 summary = Soli deo gloria know all men by these present, that I, Richard Kellicke, professor of physicke and chyrurgery, borne in England, and am now lately come from beyond the seas ... Soli deo gloria know all men by these present, that I, Richard Kellicke, professor of physicke and chyrurgery, borne in England, and am now lately come from beyond the seas ... EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). cache = ./cache/A04775.xml txt = ./txt/A04775.txt === reduce.pl bib === id = B00564 author = Plat, Hugh, Sir, 1552-1611? title = Certaine philosophical preparations of foode and beverage for sea-men, in their long voyages: with some necessary, approoued, and hermeticall medicines and antidotes, fit to be had in readinesse at sea, for preuention or cure of diuers diseases. date = 1607 pages = extension = .xml mime = application/xml words = 2191 sentences = 450 flesch = 86 summary = This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. Certaine philosophical preparations of foode and beverage for sea-men, in their long voyages: with some necessary, approoued, and hermeticall medicines and antidotes, fit to be had in readinesse at sea, for preuention or cure of diuers diseases. Certaine philosophical preparations of foode and beverage for sea-men, in their long voyages: with some necessary, approoued, and hermeticall medicines and antidotes, fit to be had in readinesse at sea, for preuention or cure of diuers diseases. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). cache = ./cache/B00564.xml txt = ./txt/B00564.txt Building ./etc/reader.txt B00564 A93444 A08786 B00564 A93444 A08786 number of items: 5 sum of words: 8,104 average size in words: 1,620 average readability score: 86 nouns: text; texts; characters; works; xml; image; books; time; page; images; work; keying; elements; eebo; edition; space; project; nature; medicines; encoding; data; sea; men; day; body; author; users; title; purposes; place; others; markup; wine; sets; selection; schema; instances; hath; guidelines; editions; morning; gout; euery; drops; diseases; variety; transcription; times; sheet; parts verbs: is; be; was; are; have; were; encoded; been; cureth; made; take; come; being; based; taken; let; had; -; represented; published; marked; make; hath; do; created; create; corrected; according; said; performed; known; helpeth; haue; am; scanned; reviewed; remain; recommend; providing; owned; modified; edited; distributed; described; copied; coded; co; assigned; asking; using adjectives: early; good; english; many; first; available; other; such; short; general; same; original; illegible; second; long; cold; textual; great; sore; own; little; large; keyboarded; financial; commercial; wide; usual; true; syntactic; sufficient; subject; structural; readable; quality; public; proofread; possible; pleased; overall; necessary; much; most; monographic; markup; lossless; light; later; greater; external; eligible adverbs: also; not; so; then; very; online; therefore; now; sometimes; more; early; well; above; out; never; in; even; vpon; variously; usually; respectfully; over; notably; mainly; further; commonly; away; accurately; together; thereof; therein; onely; long; lately; first; beere; as; often; most; likewise; here; fully; exceedingly; especially; yet; wonderfull; without; whatsoever; vanisheth; ureth pronouns: it; their; them; i; they; his; my; we; he; him; you; our; me; her; your; us; themselves; she proper nouns: tcp; text; tei; eebo; english; oxford; balsam; england; proquest; phase; partnership; creation; richard; london; god; transcribed; online; utf-8; unicode; peter; p5; ncbel; michigan; medicines; friends; p.; kellicke; francesse; therewith; stc; sixe; master; haue; gods; books; balsum; balsame; universal; tiff; thomason; sir; sampled; qc; n.; keyed; iv; iohn; estc; eng; creative keywords: tcp; tei; richard; peter; medicines; early; balsam one topic; one dimension: text file(s): ./cache/A84859.xml titles(s): All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... three topics; one dimension: tcp; text; fully file(s): ./cache/B00564.xml, ./cache/A84859.xml, ./cache/A84859.xml titles(s): Certaine philosophical preparations of foode and beverage for sea-men, in their long voyages: with some necessary, approoued, and hermeticall medicines and antidotes, fit to be had in readinesse at sea, for preuention or cure of diuers diseases. | All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... | All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... five topics; three dimensions: tcp text cureth; tcp text eebo; text francesse peter; library end doth; library end doth file(s): ./cache/B00564.xml, ./cache/A08786.xml, ./cache/A84859.xml, ./cache/A84859.xml, ./cache/A84859.xml titles(s): Certaine philosophical preparations of foode and beverage for sea-men, in their long voyages: with some necessary, approoued, and hermeticall medicines and antidotes, fit to be had in readinesse at sea, for preuention or cure of diuers diseases. | The vertue and operation of this balsame | All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... | All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... | All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... Type: zip2carrel title: subject-patentMedicines-freebo date: 2021-05-24 time: 19:56 username: emorgan patron: Eric Morgan email: emorgan@nd.edu input: input-file.zip ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: A84859 author: Francesse, Peter. title: All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... date: 1656 words: 680 sentences: 110 pages: flesch: 87 cache: ./cache/A84859.xml txt: ./txt/A84859.txt summary: All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that hath brought with him out of the kingdome of Persia, perfect remedy for the gout, the sciatica, the running gout, and all aches in the limbs, ... An advertisement of a cure for gout and sciatica offered by Peter Francesse.--Thomason catalogue. civilwar no All gentlemen and others, may be pleased to take notice, that there is a stranger come into these parts, whose name is Peter Francesse that Francesse, Peter. id: A04775 author: Kellicke, Richard. title: Soli deo gloria know all men by these present, that I, Richard Kellicke, professor of physicke and chyrurgery, borne in England, and am now lately come from beyond the seas ... date: 1625 words: 1871 sentences: 354 pages: flesch: 90 cache: ./cache/A04775.xml txt: ./txt/A04775.txt summary: Soli deo gloria know all men by these present, that I, Richard Kellicke, professor of physicke and chyrurgery, borne in England, and am now lately come from beyond the seas ... Soli deo gloria know all men by these present, that I, Richard Kellicke, professor of physicke and chyrurgery, borne in England, and am now lately come from beyond the seas ... EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). id: A08786 author: N. P., Master of Arts, and minister of Gods word. title: The vertue and operation of this balsame date: 1615 words: 1806 sentences: 326 pages: flesch: 88 cache: ./cache/A08786.xml txt: ./txt/A08786.txt summary: This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. "This Balsam, made by N.P. Master of Arts, and Minister of Gods Word, is to be sold in Maiden Lane, at the signe of the Crowne ouer against Goldsmiths Hall, where it hath beene sold, and the premises approued these fourescore yeares. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). id: B00564 author: Plat, Hugh, Sir, 1552-1611? title: Certaine philosophical preparations of foode and beverage for sea-men, in their long voyages: with some necessary, approoued, and hermeticall medicines and antidotes, fit to be had in readinesse at sea, for preuention or cure of diuers diseases. date: 1607 words: 2191 sentences: 450 pages: flesch: 86 cache: ./cache/B00564.xml txt: ./txt/B00564.txt summary: This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. Certaine philosophical preparations of foode and beverage for sea-men, in their long voyages: with some necessary, approoued, and hermeticall medicines and antidotes, fit to be had in readinesse at sea, for preuention or cure of diuers diseases. Certaine philosophical preparations of foode and beverage for sea-men, in their long voyages: with some necessary, approoued, and hermeticall medicines and antidotes, fit to be had in readinesse at sea, for preuention or cure of diuers diseases. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). id: A93444 author: Snead, Richard, d. 1711. title: Dear Friends all unto whom this may come; date: 1681 words: 1556 sentences: 266 pages: flesch: 80 cache: ./cache/A93444.xml txt: ./txt/A93444.txt summary: This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. Dear Friends all unto whom this may come; Dear Friends all unto whom this may come; EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). The texts were encoded and linked to page images in accordance with level 4 of the TEI in Libraries guidelines. ==== make-pages.sh questions ==== make-pages.sh search ==== make-pages.sh topic modeling corpus Zipping study carrel