mv: 'input-file.zip' and './input-file.zip' are the same file Creating study carrel named subject-orthodoxEasternChurch-freebo Initializing database Unzipping Archive: input-file.zip inflating: ./tmp/input/A60569.xml inflating: ./tmp/input/A58002.xml inflating: ./tmp/input/xml2htm.xsl inflating: ./tmp/input/metadata.csv inflating: ./tmp/input/A42632.xml caution: excluded filename not matched: *MACOSX* === DIRECTORIES: ./tmp/input === DIRECTORY: === metadata file: ./tmp/input/metadata.csv === found metadata file === updating bibliographic database Building study carrel named subject-orthodoxEasternChurch-freebo May 24, 2021 8:05:18 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. May 24, 2021 8:05:18 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files unless you've excluded the TesseractOCRParser from the default parser. Tesseract may dramatically slow down content extraction (TIKA-2359). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. May 24, 2021 8:05:18 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Starting Apache Tika 1.24.1 server INFO Setting the server's publish address to be http://localhost:9998/ INFO Logging initialized @3104ms to org.eclipse.jetty.util.log.Slf4jLog INFO jetty-9.4.27.v20200227; built: 2020-02-27T18:37:21.340Z; git: a304fd9f351f337e7c0e2a7c28878dd536149c6c; jvm 1.8.0_281-b09 INFO Started ServerConnector@3e74829{HTTP/1.1, (http/1.1)}{localhost:9998} INFO Started @3213ms WARN Empty contextPath INFO Started o.e.j.s.h.ContextHandler@1fa1cab1{/,null,AVAILABLE} INFO Started Apache Tika server at http://localhost:9998/ INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) FILE: cache/A42632.xml OUTPUT: txt/A42632.txt FILE: cache/A60569.xml OUTPUT: txt/A60569.txt FILE: cache/A58002.xml OUTPUT: txt/A58002.txt === file2bib.sh === INFO Detecting media type for Filename: b'A42632.xml' INFO rmeta/text (autodetecting type) INFO Detecting media type for Filename: b'A58002.xml' INFO Detecting media type for Filename: b'A60569.xml' INFO rmeta/text (autodetecting type) INFO rmeta/text (autodetecting type) A42632 txt/../ent/A42632.ent A42632 txt/../pos/A42632.pos A42632 txt/../wrd/A42632.wrd === file2bib.sh === id: A42632 author: Geōrgarinēs, Iōsēph, 17th cent. title: From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. date: 1682 pages: extension: .xml txt: ./txt/A42632.txt cache: ./cache/A42632.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 7 resourceName b'A42632.xml' A58002 txt/../pos/A58002.pos A60569 txt/../pos/A60569.pos A58002 txt/../ent/A58002.ent A58002 txt/../wrd/A58002.wrd A60569 txt/../wrd/A60569.wrd A60569 txt/../ent/A60569.ent === file2bib.sh === id: A58002 author: Rycaut, Paul, Sir, 1628-1700. title: The present state of the Greek and Armenian churches, anno Christi 1678 written at the command of His Majesty by Paul Ricaut. date: 1679 pages: extension: .xml txt: ./txt/A58002.txt cache: ./cache/A58002.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 144 resourceName b'A58002.xml' === file2bib.sh === id: A60569 author: Smith, Thomas, 1638-1710. title: An account of the Greek church as to its doctrine and rites of worship with several historicall remarks interspersed, relating thereunto : to which is added an account of the state of the Greek church under Cyrillus Lucaris, Patriarch of Constantinople, with a relation of his sufferings and death / by Tho. Smith. date: 1680 pages: extension: .xml txt: ./txt/A60569.txt cache: ./cache/A60569.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 152 resourceName b'A60569.xml' Done mapping. Reducing subject-orthodoxEasternChurch-freebo === reduce.pl bib === id = A60569 author = Smith, Thomas, 1638-1710. title = An account of the Greek church as to its doctrine and rites of worship with several historicall remarks interspersed, relating thereunto : to which is added an account of the state of the Greek church under Cyrillus Lucaris, Patriarch of Constantinople, with a relation of his sufferings and death / by Tho. Smith. date = 1680 pages = extension = .xml mime = application/xml words = 68030 sentences = 22255 flesch = 88 summary = An account of the Greek church as to its doctrine and rites of worship with several historicall remarks interspersed, relating thereunto : to which is added an account of the state of the Greek church under Cyrillus Lucaris, Patriarch of Constantinople, with a relation of his sufferings and death / by Tho. Smith. An account of the Greek church as to its doctrine and rites of worship with several historicall remarks interspersed, relating thereunto : to which is added an account of the state of the Greek church under Cyrillus Lucaris, Patriarch of Constantinople, with a relation of his sufferings and death / by Tho. Smith. EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). cache = ./cache/A60569.xml txt = ./txt/A60569.txt === reduce.pl bib === id = A42632 author = Geōrgarinēs, Iōsēph, 17th cent. title = From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. date = 1682 pages = extension = .xml mime = application/xml words = 1603 sentences = 274 flesch = 83 summary = This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). cache = ./cache/A42632.xml txt = ./txt/A42632.txt === reduce.pl bib === id = A58002 author = Rycaut, Paul, Sir, 1628-1700. title = The present state of the Greek and Armenian churches, anno Christi 1678 written at the command of His Majesty by Paul Ricaut. date = 1679 pages = extension = .xml mime = application/xml words = 68582 sentences = 19998 flesch = 88 summary = This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. The present state of the Greek and Armenian churches, anno Christi 1678 written at the command of His Majesty by Paul Ricaut. The present state of the Greek and Armenian churches, anno Christi 1678 written at the command of His Majesty by Paul Ricaut. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). cache = ./cache/A58002.xml txt = ./txt/A58002.txt Building ./etc/reader.txt A58002 A60569 A42632 A60569 A58002 A42632 number of items: 3 sum of words: 138,215 average size in words: 46,071 average readability score: 86 nouns: time; day; people; place; year; times; life; part; manner; days; years; name; order; way; others; men; words; use; persons; person; p.; man; parts; churches; body; religion; honour; reason; power; things; death; church; form; self; saints; number; hand; account; nothing; hands; rest; side; hath; state; priests; history; devotion; country; world; bread verbs: is; are; be; was; have; being; were; had; called; been; made; having; make; according; do; call; used; said; take; give; say; taken; has; found; observed; believe; put; see; receive; following; done; 〈; sent; read; given; did; received; eat; set; let; consecrated; performed; written; find; seem; mentioned; come; taking; seems; says adjectives: other; great; same; such; several; first; holy; many; ancient; little; own; good; more; most; present; much; new; whole; second; poor; particular; necessary; true; least; certain; old; third; like; large; religious; last; greater; common; few; dead; late; chief; general; full; fit; better; publick; usual; high; glorious; small; able; turkish; sick; different adverbs: not; so; then; more; up; most; now; very; also; as; only; onely; well; first; much; there; here; therefore; afterwards; out; far; thus; rather; yet; still; thereof; never; together; ever; especially; always; about; wholly; sometimes; over; again; that; once; is; too; usually; soon; indeed; before; formerly; immediately; down; anciently; long; away pronouns: they; their; it; his; he; them; i; him; our; we; my; us; themselves; himself; its; me; her; thy; you; your; thee; she; one; theirs; ours; itself; f; b proper nouns: 〉; ◊; 〈; church; s.; holy; god; greek; patriarch; greeks; turks; constantinople; christ; priest; bishop; c.; city; christian; lord; sacrament; rome; b; christians; saviour; spirit; faith; john; pag; bishops; virgin; bread; churches; father; d.; son; roman; oyl; confession; cross; doctrine; ●; emperour; world; c; divine; religion; monasteries; de; l.; council keywords: church; turks; religion; priest; patriarch; holy; god; father; confession; city; churches; christian; world; virgin; tcp; sunday; spirit; son; saviour; saints; sacrament; roman; prayers; oyl; mountain; monastery; monasteries; lord; greeks; greek; government; festival; faith; emperour; doctrine; cross; country; council; constantinople; communion; chap; bread; body; bishops; bishop; authority; armenian; altar one topic; one dimension: church file(s): ./cache/A58002.xml titles(s): The present state of the Greek and Armenian churches, anno Christi 1678 written at the command of His Majesty by Paul Ricaut. three topics; one dimension: church; church; consented file(s): ./cache/A60569.xml, ./cache/A58002.xml, ./cache/A42632.xml titles(s): An account of the Greek church as to its doctrine and rites of worship with several historicall remarks interspersed, relating thereunto : to which is added an account of the state of the Greek church under Cyrillus Lucaris, Patriarch of Constantinople, with a relation of his sufferings and death / by Tho. Smith. | The present state of the Greek and Armenian churches, anno Christi 1678 written at the command of His Majesty by Paul Ricaut. | From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. five topics; three dimensions: church holy greek; church holy great; sheet denying desisted; sheet denying desisted; sheet denying desisted file(s): ./cache/A58002.xml, ./cache/A60569.xml, ./cache/A42632.xml, ./cache/A42632.xml, ./cache/A42632.xml titles(s): The present state of the Greek and Armenian churches, anno Christi 1678 written at the command of His Majesty by Paul Ricaut. | An account of the Greek church as to its doctrine and rites of worship with several historicall remarks interspersed, relating thereunto : to which is added an account of the state of the Greek church under Cyrillus Lucaris, Patriarch of Constantinople, with a relation of his sufferings and death / by Tho. Smith. | From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. | From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. | From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. Type: zip2carrel title: subject-orthodoxEasternChurch-freebo date: 2021-05-24 time: 19:53 username: emorgan patron: Eric Morgan email: emorgan@nd.edu input: input-file.zip ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: A42632 author: Geōrgarinēs, Iōsēph, 17th cent. title: From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. date: 1682 words: 1603 sentences: 274 pages: flesch: 83 cache: ./cache/A42632.xml txt: ./txt/A42632.txt summary: This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. From the Arch-Bishop of the Isle of Samos in Greece An account of his building the Grecian church in So-hoe Feilds, and the disposal thereof by the masters of the parish of St. Martins in the Feilds. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). id: A58002 author: Rycaut, Paul, Sir, 1628-1700. title: The present state of the Greek and Armenian churches, anno Christi 1678 written at the command of His Majesty by Paul Ricaut. date: 1679 words: 68582 sentences: 19998 pages: flesch: 88 cache: ./cache/A58002.xml txt: ./txt/A58002.txt summary: This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. The present state of the Greek and Armenian churches, anno Christi 1678 written at the command of His Majesty by Paul Ricaut. The present state of the Greek and Armenian churches, anno Christi 1678 written at the command of His Majesty by Paul Ricaut. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). id: A60569 author: Smith, Thomas, 1638-1710. title: An account of the Greek church as to its doctrine and rites of worship with several historicall remarks interspersed, relating thereunto : to which is added an account of the state of the Greek church under Cyrillus Lucaris, Patriarch of Constantinople, with a relation of his sufferings and death / by Tho. Smith. date: 1680 words: 68030 sentences: 22255 pages: flesch: 88 cache: ./cache/A60569.xml txt: ./txt/A60569.txt summary: An account of the Greek church as to its doctrine and rites of worship with several historicall remarks interspersed, relating thereunto : to which is added an account of the state of the Greek church under Cyrillus Lucaris, Patriarch of Constantinople, with a relation of his sufferings and death / by Tho. Smith. An account of the Greek church as to its doctrine and rites of worship with several historicall remarks interspersed, relating thereunto : to which is added an account of the state of the Greek church under Cyrillus Lucaris, Patriarch of Constantinople, with a relation of his sufferings and death / by Tho. Smith. EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). ==== make-pages.sh questions ==== make-pages.sh search ==== make-pages.sh topic modeling corpus Zipping study carrel