item: #1 of 28 id: ardanuy-dataset-2022 author: ardanuy title: ardanuy-dataset-2022 date: 2022 words: 3334 flesch: 38 summary: Our dataset differs from others in its emphasis on the geographical aspect of newspaper data. Establishing benchmark datasets like this provides a foundation for others to assess the performance of methods related to the identification and location of places in historical newspapers. keywords: ardanuy; dataset; doi; london; newspapers; toponym cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/ardanuy-dataset-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/ardanuy-dataset-2022.txt item: #2 of 28 id: aronson-oregon-2022 author: aronson title: aronson-oregon-2022 date: 2022 words: 3189 flesch: 48 summary: However, we include several data columns that reference these files to create more contextual information. Students conduct original research in primary sources to compile data and to compose short narratives about Oregon movie theaters during the period of study (1894–1929). keywords: cinema; data; humanities; oregon; project; theater cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/aronson-oregon-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/aronson-oregon-2022.txt item: #3 of 28 id: bagga-hathi-2022 author: bagga title: bagga-hathi-2022 date: 2022 words: 5327 flesch: 51 summary: The distribution of four features from our Enriched Feature set – average sentence length, Tuldava score, NRC positive score, and VADER positive score – across our dataset of fiction pages (red) and non-fiction pages (blue) sampled from 1800 to 1999. Studying long time scales necessarily requires large data collections as each time unit (year/decade) becomes sparser the less data one has. keywords: data; doi; fiction; historical; non; page; work cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/bagga-hathi-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/bagga-hathi-2022.txt item: #4 of 28 id: chen-china-2022 author: chen title: chen-china-2022 date: 2022 words: 3200 flesch: 37 summary: A growing number of articles are published every year that use CBDB data to explore topics ranging from career trajectory, regional composition, and family connections of civil officials to intellectual and social networks of Neo-Confucian moral philosophers, antiquities collectors, and members of political factions. For a full list of publications that use CBDB data, see https:// projects.iq.harvard.edu/cbdb/publications-use-cbdb-data. keywords: biographical; cbdb; china; data; database; university cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/chen-china-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/chen-china-2022.txt item: #5 of 28 id: erlin-transcomp-2022 author: erlin title: erlin-transcomp-2022 date: 2022 words: 2763 flesch: 46 summary: Given that the set of original language works was larger than the set of translations, we also randomly downsampled each year of our original publications to match the number of translations. Following the precedent established by Toury’s (1980) and Baker’s (1993) pioneering work on translation universals, our aim has been to create two independent corpora that enable researchers to evaluate translated texts as they relate to target language texts in general, rather than to compile a corpus of translations and their corresponding source texts. keywords: data; doi; literary; translations cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/erlin-transcomp-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/erlin-transcomp-2022.txt item: #6 of 28 id: faghihi-teaching-2022 author: faghihi title: faghihi-teaching-2022 date: 2022 words: 9801 flesch: 45 summary: Oversight is provided by a board whose remit includes advice and training on the creation of TEI descriptions. Training was delivered in a series of structured workshops where the creation of TEI descriptions, with a particular focus on use of the authority files (lists of standard forms for certain entities in the data such as names and works), was embedded in a complete workflow involving collaborative working with GitHub. keywords: context; data; encoding; humanities; learning; manuscript; teaching; tei; text; text encoding cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/faghihi-teaching-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/faghihi-teaching-2022.txt item: #7 of 28 id: fekete-accessing-2022 author: fekete title: fekete-accessing-2022 date: 2022 words: 1895 flesch: 41 summary: Second, adult environmental education can profit from further analysis by examining the level of environmental awareness about wood and trees in adults. Specifically, new aspects of environmental pedagogy, environmental education, sustainable development, climate protection, sylviculture, environmental awareness of families, adult environmental education, and education policies can also be investigated from the perspective of environmental awareness. keywords: data; variables; wood cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/fekete-accessing-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/fekete-accessing-2022.txt item: #8 of 28 id: felbur-crosslingusitic-2022 author: felbur title: felbur-crosslingusitic-2022 date: 2022 words: 8223 flesch: 53 summary: While much effort is currently being invested in attempts to develop tools that will segment Chinese texts into words (some of them specifically designed to segment Buddhist materials, e.g. Wang, 2020), these tools remain unusable to us, since the underlying models themselves are often not openly released, and the training data used to create them is often not available. We then define Tibetan texts parallel to the Chinese sūtras as the ‘target.’ keywords: alignment; buddhist; chinese; doi; embeddings; results; similarity; text; tibetan; word cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/felbur-crosslingusitic-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/felbur-crosslingusitic-2022.txt item: #9 of 28 id: gaber-forming-2022 author: gaber title: gaber-forming-2022 date: 2022 words: 2319 flesch: 32 summary: Goran Gaber École des Hautes Études en Sciences Sociales (LIER-FYT), Paris, France; Maison Française d’Oxford, Oxford, UK goran.gaber@ehess.fr KEYWORDS: critique; title pages; union catalogues; dataset; book history; history of concepts TO CITE A complementary and interconnected “data package” was deposited on Zenodo, comprising: (1) a classical text-based bibliography, supplemented by (2) a CSV dataset of information contained therein, (3) the images of title pages not readily available online, and (4) a comprehensive BibTeX dataset. keywords: critique; dataset; pages; title cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/gaber-forming-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/gaber-forming-2022.txt item: #10 of 28 id: gerardi-kahd-2022 author: gerardi title: gerardi-kahd-2022 date: 2022 words: 5444 flesch: 50 summary: Such databases, beside elucidating the internal classification of language families, play a role in the understanding of displacement and linguistic contact, for example, through borrowing. Apart from its value for (computational) historical linguistics mentioned in the previous section, the KAHD database also serves as language documentation and preservation effort for Amazonian language families since, as shown in Section 1.1, the number of speakers for some of the languages is diminishing at a fast rate (see e.g. D’Ávila 2019). keywords: arawan; data; database; doi; language; list cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/gerardi-kahd-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/gerardi-kahd-2022.txt item: #11 of 28 id: hagedorn-bearing-2022 author: hagedorn title: hagedorn-bearing-2022 date: 2022 words: 5578 flesch: 50 summary: MEASURE VALUE Number of tales 1518 Number of tale types 182 Mean tokens per tale 979.1 Median tokens per tale 642 Minimum tokens per tale 10 Maximum tokens per tale 12,406 Mean sentences per tale 45.7 Median sentences per tale 31 ATU ID TALE NAME N OF TALES 275 The tales compiled in the aft data are annotated by ATU tale type, and represent 182 distinct types. keywords: darányi; data; dataset; doi; journal; open; research; tale cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/hagedorn-bearing-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/hagedorn-bearing-2022.txt item: #12 of 28 id: han-reddit-2022 author: han title: han-reddit-2022 date: 2022 words: 2184 flesch: 47 summary: Reddit’s data structure and limited restrictions on posting content provide opportunities to study online language use, communication processes, public opinions, online culture, online communities, and online social movements. Thus, this dataset will help study online social movements and its relationship with online culture. keywords: dataset; reddit; sentiment; stock cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/han-reddit-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/han-reddit-2022.txt item: #13 of 28 id: jauhiainen-social-2022 author: jauhiainen title: jauhiainen-social-2022 date: 2022 words: 3687 flesch: 52 summary: Entries for document names could not be identified in the structure of the PDF file, and the identification and extraction of documents is thus based on concordance lists and document names attested in other PNA volumes. The earlier PNA volumes (1/I–3/I) were available to us as plain text files that were used to typeset the printed publications. keywords: assyrian; data; neo; network; pna cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/jauhiainen-social-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/jauhiainen-social-2022.txt item: #14 of 28 id: kelleher-place-2022 author: kelleher title: kelleher-place-2022 date: 2022 words: 6508 flesch: 45 summary: The Nakala data set includes full data management documentation, full ethics documentation in English and French, concept notes in English and French and participant data files that include .csv metadata sheets, .wav audio recordings of interviews, .jpeg photographs of the place of the interview and open ELAN (MPI, 2021) Places data is opened on the Nakala data repository that is overseen by the Digital Humanities Very Large Research Infrastructure (Sciences Humaines Numériques Très Grande Infrastructure de Recherche – TGIR Huma-Num) (CNRS, 2022). keywords: data; doi; march; nakala; open; places; project; research; science cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/kelleher-place-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/kelleher-place-2022.txt item: #15 of 28 id: kuys-representing-2022 author: kuys title: kuys-representing-2022 date: 2022 words: 6880 flesch: 52 summary: The principal source in this project, A.J. van der Aa’s Geographical Dictionary, has plenty of event descriptions. Data underpinning any private interpretations by van der Aa (or by others) should be confined to an RDF graph or namespace of their own. keywords: data; der; events; model; time; van cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/kuys-representing-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/kuys-representing-2022.txt item: #16 of 28 id: maignant-drama-2022 author: maignant title: maignant-drama-2022 date: 2022 words: 6689 flesch: 51 summary: It also enables us to contribute to the field of English literature by proposing the first reusable dataset to offer numerous theatre reviews on journalistic and digital criticism. Creating this corpus based on digital reviews was less time-consuming than the first one because the reviews were already in a textual format. keywords: blog; corpus; data; digital; humanities; july; open; reviews; theatre cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/maignant-drama-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/maignant-drama-2022.txt item: #17 of 28 id: marongiu-static-2022 author: marongiu title: marongiu-static-2022 date: 2022 words: 7624 flesch: 51 summary: We focus on the case of modal meanings in the Latin language and we showcase how we transposed the gathered data from a discursive to a visual form. Our set of modal maps features some impersonal verbs or constructions, e.g., respectively decet, licet, oportet and aequus est, necesse est, meum est among others. keywords: dell’oro; diachronic; latin; maps; meanings; modal; modality; semantic cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/marongiu-static-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/marongiu-static-2022.txt item: #18 of 28 id: melanie-oupoco-2022 author: melanie title: melanie-oupoco-2022 date: 2022 words: 1678 flesch: 52 summary: Its contribution is marginal as only seven sonnets come from this database (that covers other kinds of French poems, most of them not being sonnets). The sonnets come from different sources from the Internet, or not: we especially want to thank the Bibliothèque nationale de France (BnF) (French National Library) that gave us access to a large corpus, from which we were able to extract an invaluable number of French poems. keywords: data; french; sonnets cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/melanie-oupoco-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/melanie-oupoco-2022.txt item: #19 of 28 id: nurmikko-teaching-2022 author: nurmikko title: nurmikko-teaching-2022 date: 2022 words: 6965 flesch: 44 summary: edu.au KEYWORDS: Linked Open Data; bibliographic metadata; pedagogy; participant evaluations TO CITE In recognition of the role of collaboration and co- authoring in digital humanities (DH) research (Needham & Haas, 2019), workshop participants are encouraged to work together and communicate openly as a group. keywords: data; digital; fuller; humanities; information; ld4dh; open; participants; workshop cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/nurmikko-teaching-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/nurmikko-teaching-2022.txt item: #20 of 28 id: oneill-text-2022 author: oneill title: oneill-text-2022 date: 2022 words: 2212 flesch: 46 summary: This paper introduces the state of the field in Newar literature, Newar manuscripts, and HTR engines. Deep learning neural networks have made it possible to build HTR models based on images of handwritten text linked with corresponding transcriptions (called “ground truth”). keywords: data; manuscripts; model; newar cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/oneill-text-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/oneill-text-2022.txt item: #21 of 28 id: pala-tracing-2022 author: pala title: pala-tracing-2022 date: 2022 words: 6331 flesch: 54 summary: Benefits of this approach lie in its ability to quantify change, to study complex 3D material, and to analyse large datasets of objects, opening the possibility of constructing new large-scale studies of object shape across time and geographical regions. The method can be scaled to large datasets of 3D objects scans where changes can be computed automatically, without the need for human intervention. keywords: approach; distance; objects; points; shape; study; vessel cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/pala-tracing-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/pala-tracing-2022.txt item: #22 of 28 id: pan-networking-2022 author: pan title: pan-networking-2022 date: 2022 words: 7225 flesch: 41 summary: In relational databases, edges usually only convey directions and at most labels (categories), but they can carry easily expandable and modifiable properties in graph databases. This means that for long term projects such as this one (which, because of the current incompleteness of the source data, calls for continuing addition of data), graph database allows for more possibilities in terms of efficient and versatile querying and expansion. keywords: database; graph; graph database; japanese; lawsuits; movement; network; reparation cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/pan-networking-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/pan-networking-2022.txt item: #23 of 28 id: piper-conlit-2022 author: piper title: piper-conlit-2022 date: 2022 words: 2462 flesch: 44 summary: As we show with the overview of our data (Table 1), our institutional frameworks can include bestseller lists, prize committee shortlists, book review lists, user-generated “choice awards”, or corporate forms of categorization. We define “popular” through multiple criteria that include user-generated awards or lists, elite prize committee lists or book reviews, or bestseller tags on platforms like Amazon or the New York Times. keywords: books; data; fiction; genre cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/piper-conlit-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/piper-conlit-2022.txt item: #24 of 28 id: pitts-corpus-2022 author: pitts title: pitts-corpus-2022 date: 2022 words: 1822 flesch: 46 summary: These advantages hold true in fragmentary languages such as Venetic or Messapic as much as in large corpus languages such as Classical Latin or Greek. This database was created in the context of a PhD project on language contact in Ancient Italy, entitled The interplay between language contact and language change in a fragmentary linguistic area: the Italic peninsula in the first millennium BCE. keywords: corpus; data; languages; linguistic cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/pitts-corpus-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/pitts-corpus-2022.txt item: #25 of 28 id: turenne-mining-2022 author: turenne title: turenne-mining-2022 date: 2022 words: 6903 flesch: 43 summary: The choice of the pair Chinese–English has several motivations: firstly, the data is more easily available; secondly, there is a demand for English and Chinese tools and datasets, as English is already the lingua franca in many areas (political, economical, cultural, and scientific), and we also see an increasing interest in Chinese, which is now being taught at schools in western countries. This paper is divided into the following sections: we discuss the dataset and its sub-datasets, describe the state- of-the-art research based on bilingual corpora, machine learning, and natural language processing, and then present the results of our experiments. keywords: chinese; corpus; dataset; doi; domain; english; finance; language; parallel; proceedings cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/turenne-mining-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/turenne-mining-2022.txt item: #26 of 28 id: vauth-event-2022 author: vauth title: vauth-event-2022 date: 2022 words: 1962 flesch: 49 summary: These annotations were used for the automation of narratological event annotations (Vauth, Hatzel, Gius, & Biemann, 2021), a reflection of inter annotator agreements in literary studies (Gius & Vauth, 2022) and the development of an event based plot model (Gius & Vauth, accepted). Inter Annotator Agreement (Krippendorff’s α) for event types. keywords: event; gius cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/vauth-event-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/vauth-event-2022.txt item: #27 of 28 id: verbruggen-social-2022 author: verbruggen title: verbruggen-social-2022 date: 2022 words: 7049 flesch: 36 summary: By collecting and enriching a dataset of international organizations and congresses associated with social reform, TIC sought to map cooperation across national lines and across thematic categories. Social Reform International Congresses and Organizations (1846–1914): From Sources to Data RESEARCH PAPER CORRESPONDING AUTHOR: Christophe Verbruggen Department of History – GhentCDH, Ghent University, Ghent, BE christophe.verbruggen@ugent.be KEYWORDS: social reform; transnational history; network analysis; social internationalism; collective action TO CITE keywords: congresses; data; doi; ghent; international; open; organizations; reform; social; university; van cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/verbruggen-social-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/verbruggen-social-2022.txt item: #28 of 28 id: yi-accessibility-2022 author: yi title: yi-accessibility-2022 date: 2022 words: 10694 flesch: 42 summary: Accessibility, Discoverability, and Functionality: An Audit of and Recommendations for Digital Language Archives RESEARCH PAPER CORRESPONDING AUTHOR: Irene Yi Linguistics Department, Yale University, New Haven, CT, US irene.yi@yale.edu KEYWORDS: language archives; documentation; accessibility; discoverability; functionality; linguistics; endangered languages; metadata TO CITE Language archives utilize a number of different content management systems and do not provide uniform functionality (Aznar & Seifart 2020). keywords: access; archives; collections; data; digital; doi; files; information; language; language archives; materials; users cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/yi-accessibility-2022.pdf plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/yi-accessibility-2022.txt