item: #1 of 28
          id: ardanuy-dataset-2022
      author: ardanuy
       title: ardanuy-dataset-2022
        date: 2022
       words: 3334
      flesch: 38
     summary: Our dataset differs from others in its emphasis on the geographical aspect of newspaper data. Establishing benchmark datasets like this provides a foundation for others to assess the performance of methods related to the identification and location of places in historical newspapers.
    keywords: ardanuy; dataset; doi; london; newspapers; toponym
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/ardanuy-dataset-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/ardanuy-dataset-2022.txt

        item: #2 of 28
          id: aronson-oregon-2022
      author: aronson
       title: aronson-oregon-2022
        date: 2022
       words: 3189
      flesch: 48
     summary: However, we include several data columns that reference these files to create more contextual information. Students conduct original research in primary sources to compile data and to compose short narratives about Oregon movie theaters during the period of study (1894–1929).
    keywords: cinema; data; humanities; oregon; project; theater
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/aronson-oregon-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/aronson-oregon-2022.txt

        item: #3 of 28
          id: bagga-hathi-2022
      author: bagga
       title: bagga-hathi-2022
        date: 2022
       words: 5327
      flesch: 51
     summary: The distribution of four features from our Enriched Feature set – average sentence length, Tuldava score, NRC positive score, and VADER positive score – across our dataset of fiction pages (red) and non-fiction pages (blue) sampled from 1800 to 1999. Studying long time scales necessarily requires large data collections as each time unit (year/decade) becomes sparser the less data one has.
    keywords: data; doi; fiction; historical; non; page; work
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/bagga-hathi-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/bagga-hathi-2022.txt

        item: #4 of 28
          id: chen-china-2022
      author: chen
       title: chen-china-2022
        date: 2022
       words: 3200
      flesch: 37
     summary: A growing number of articles are published every year that use CBDB data to explore topics ranging from career trajectory, regional composition, and family connections of civil officials to intellectual and social networks of Neo-Confucian moral philosophers, antiquities collectors, and members of political factions. For a full list of publications that use CBDB data, see https:// projects.iq.harvard.edu/cbdb/publications-use-cbdb-data.
    keywords: biographical; cbdb; china; data; database; university
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/chen-china-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/chen-china-2022.txt

        item: #5 of 28
          id: erlin-transcomp-2022
      author: erlin
       title: erlin-transcomp-2022
        date: 2022
       words: 2763
      flesch: 46
     summary: Given that the set of original language works was larger than the set of translations, we also randomly downsampled each year of our original publications to match the number of translations. Following the precedent established by Toury’s (1980) and Baker’s (1993) pioneering work on translation universals, our aim has been to create two independent corpora that enable researchers to evaluate translated texts as they relate to target language texts in general, rather than to compile a corpus of translations and their corresponding source texts.
    keywords: data; doi; literary; translations
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/erlin-transcomp-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/erlin-transcomp-2022.txt

        item: #6 of 28
          id: faghihi-teaching-2022
      author: faghihi
       title: faghihi-teaching-2022
        date: 2022
       words: 9801
      flesch: 45
     summary: Oversight is provided by a board whose remit includes advice and training on the creation of TEI descriptions. Training was delivered in a series of structured workshops where the creation of TEI descriptions, with a particular focus on use of the authority files (lists of standard forms for certain entities in the data such as names and works), was embedded in a complete workflow involving collaborative working with GitHub.
    keywords: context; data; encoding; humanities; learning; manuscript; teaching; tei; text; text encoding
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/faghihi-teaching-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/faghihi-teaching-2022.txt

        item: #7 of 28
          id: fekete-accessing-2022
      author: fekete
       title: fekete-accessing-2022
        date: 2022
       words: 1895
      flesch: 41
     summary: Second, adult environmental education can profit from further analysis by examining the level of environmental awareness about wood and trees in adults. Specifically, new aspects of environmental pedagogy, environmental education, sustainable development, climate protection, sylviculture, environmental awareness of families, adult environmental education, and education policies can also be investigated from the perspective of environmental awareness.
    keywords: data; variables; wood
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/fekete-accessing-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/fekete-accessing-2022.txt

        item: #8 of 28
          id: felbur-crosslingusitic-2022
      author: felbur
       title: felbur-crosslingusitic-2022
        date: 2022
       words: 8223
      flesch: 53
     summary: While much effort is currently being invested in attempts to develop tools that will segment Chinese texts into words (some of them specifically designed to segment Buddhist materials, e.g. Wang, 2020), these tools remain unusable to us, since the underlying models themselves are often not openly released, and the training data used to create them is often not available. We then define Tibetan texts parallel to the Chinese sūtras as the ‘target.’
    keywords: alignment; buddhist; chinese; doi; embeddings; results; similarity; text; tibetan; word
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/felbur-crosslingusitic-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/felbur-crosslingusitic-2022.txt

        item: #9 of 28
          id: gaber-forming-2022
      author: gaber
       title: gaber-forming-2022
        date: 2022
       words: 2319
      flesch: 32
     summary: Goran Gaber École des Hautes Études en Sciences Sociales (LIER-FYT), Paris, France; Maison Française d’Oxford, Oxford, UK goran.gaber@ehess.fr KEYWORDS: critique; title pages; union catalogues; dataset; book history; history of concepts TO CITE A complementary and interconnected “data package” was deposited on Zenodo, comprising: (1) a classical text-based bibliography, supplemented by (2) a CSV dataset of information contained therein, (3) the images of title pages not readily available online, and (4) a comprehensive BibTeX dataset.
    keywords: critique; dataset; pages; title
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/gaber-forming-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/gaber-forming-2022.txt

        item: #10 of 28
          id: gerardi-kahd-2022
      author: gerardi
       title: gerardi-kahd-2022
        date: 2022
       words: 5444
      flesch: 50
     summary: Such databases, beside elucidating the internal classification of language families, play a role in the understanding of displacement and linguistic contact, for example, through borrowing. Apart from its value for (computational) historical linguistics mentioned in the previous section, the KAHD database also serves as language documentation and preservation effort for Amazonian language families since, as shown in Section 1.1, the number of speakers for some of the languages is diminishing at a fast rate (see e.g. D’Ávila 2019).
    keywords: arawan; data; database; doi; language; list
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/gerardi-kahd-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/gerardi-kahd-2022.txt

        item: #11 of 28
          id: hagedorn-bearing-2022
      author: hagedorn
       title: hagedorn-bearing-2022
        date: 2022
       words: 5578
      flesch: 50
     summary: MEASURE VALUE Number of tales 1518 Number of tale types 182 Mean tokens per tale 979.1 Median tokens per tale 642 Minimum tokens per tale 10 Maximum tokens per tale 12,406 Mean sentences per tale 45.7 Median sentences per tale 31 ATU ID TALE NAME N OF TALES 275 The tales compiled in the aft data are annotated by ATU tale type, and represent 182 distinct types.
    keywords: darányi; data; dataset; doi; journal; open; research; tale
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/hagedorn-bearing-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/hagedorn-bearing-2022.txt

        item: #12 of 28
          id: han-reddit-2022
      author: han
       title: han-reddit-2022
        date: 2022
       words: 2184
      flesch: 47
     summary: Reddit’s data structure and limited restrictions on posting content provide opportunities to study online language use, communication processes, public opinions, online culture, online communities, and online social movements. Thus, this dataset will help study online social movements and its relationship with online culture.
    keywords: dataset; reddit; sentiment; stock
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/han-reddit-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/han-reddit-2022.txt

        item: #13 of 28
          id: jauhiainen-social-2022
      author: jauhiainen
       title: jauhiainen-social-2022
        date: 2022
       words: 3687
      flesch: 52
     summary: Entries for document names could not be identified in the structure of the PDF file, and the identification and extraction of documents is thus based on concordance lists and document names attested in other PNA volumes. The earlier PNA volumes (1/I–3/I) were available to us as plain text files that were used to typeset the printed publications.
    keywords: assyrian; data; neo; network; pna
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/jauhiainen-social-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/jauhiainen-social-2022.txt

        item: #14 of 28
          id: kelleher-place-2022
      author: kelleher
       title: kelleher-place-2022
        date: 2022
       words: 6508
      flesch: 45
     summary: The Nakala data set includes full data management documentation, full ethics documentation in English and French, concept notes in English and French and participant data files that include .csv metadata sheets, .wav audio recordings of interviews, .jpeg photographs of the place of the interview and open ELAN (MPI, 2021) Places data is opened on the Nakala data repository that is overseen by the Digital Humanities Very Large Research Infrastructure (Sciences Humaines Numériques Très Grande Infrastructure de Recherche – TGIR Huma-Num) (CNRS, 2022).
    keywords: data; doi; march; nakala; open; places; project; research; science
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/kelleher-place-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/kelleher-place-2022.txt

        item: #15 of 28
          id: kuys-representing-2022
      author: kuys
       title: kuys-representing-2022
        date: 2022
       words: 6880
      flesch: 52
     summary: The principal source in this project, A.J. van der Aa’s Geographical Dictionary, has plenty of event descriptions. Data underpinning any private interpretations by van der Aa (or by others) should be confined to an RDF graph or namespace of their own.
    keywords: data; der; events; model; time; van
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/kuys-representing-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/kuys-representing-2022.txt

        item: #16 of 28
          id: maignant-drama-2022
      author: maignant
       title: maignant-drama-2022
        date: 2022
       words: 6689
      flesch: 51
     summary: It also enables us to contribute to the field of English literature by proposing the first reusable dataset to offer numerous theatre reviews on journalistic and digital criticism. Creating this corpus based on digital reviews was less time-consuming than the first one because the reviews were already in a textual format.
    keywords: blog; corpus; data; digital; humanities; july; open; reviews; theatre
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/maignant-drama-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/maignant-drama-2022.txt

        item: #17 of 28
          id: marongiu-static-2022
      author: marongiu
       title: marongiu-static-2022
        date: 2022
       words: 7624
      flesch: 51
     summary: We focus on the case of modal meanings in the Latin language and we showcase how we transposed the gathered data from a discursive to a visual form. Our set of modal maps features some impersonal verbs or constructions, e.g., respectively decet, licet, oportet and aequus est, necesse est, meum est among others.
    keywords: dell’oro; diachronic; latin; maps; meanings; modal; modality; semantic
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/marongiu-static-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/marongiu-static-2022.txt

        item: #18 of 28
          id: melanie-oupoco-2022
      author: melanie
       title: melanie-oupoco-2022
        date: 2022
       words: 1678
      flesch: 52
     summary: Its contribution is marginal as only seven sonnets come from this database (that covers other kinds of French poems, most of them not being sonnets). The sonnets come from different sources from the Internet, or not: we especially want to thank the Bibliothèque nationale de France (BnF) (French National Library) that gave us access to a large corpus, from which we were able to extract an invaluable number of French poems.
    keywords: data; french; sonnets
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/melanie-oupoco-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/melanie-oupoco-2022.txt

        item: #19 of 28
          id: nurmikko-teaching-2022
      author: nurmikko
       title: nurmikko-teaching-2022
        date: 2022
       words: 6965
      flesch: 44
     summary: edu.au KEYWORDS: Linked Open Data; bibliographic metadata; pedagogy; participant evaluations TO CITE In recognition of the role of collaboration and co- authoring in digital humanities (DH) research (Needham & Haas, 2019), workshop participants are encouraged to work together and communicate openly as a group.
    keywords: data; digital; fuller; humanities; information; ld4dh; open; participants; workshop
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/nurmikko-teaching-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/nurmikko-teaching-2022.txt

        item: #20 of 28
          id: oneill-text-2022
      author: oneill
       title: oneill-text-2022
        date: 2022
       words: 2212
      flesch: 46
     summary: This paper introduces the state of the field in Newar literature, Newar manuscripts, and HTR engines. Deep learning neural networks have made it possible to build HTR models based on images of handwritten text linked with corresponding transcriptions (called “ground truth”).
    keywords: data; manuscripts; model; newar
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/oneill-text-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/oneill-text-2022.txt

        item: #21 of 28
          id: pala-tracing-2022
      author: pala
       title: pala-tracing-2022
        date: 2022
       words: 6331
      flesch: 54
     summary: Benefits of this approach lie in its ability to quantify change, to study complex 3D material, and to analyse large datasets of objects, opening the possibility of constructing new large-scale studies of object shape across time and geographical regions. The method can be scaled to large datasets of 3D objects scans where changes can be computed automatically, without the need for human intervention.
    keywords: approach; distance; objects; points; shape; study; vessel
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/pala-tracing-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/pala-tracing-2022.txt

        item: #22 of 28
          id: pan-networking-2022
      author: pan
       title: pan-networking-2022
        date: 2022
       words: 7225
      flesch: 41
     summary: In relational databases, edges usually only convey directions and at most labels (categories), but they can carry easily expandable and modifiable properties in graph databases. This means that for long term projects such as this one (which, because of the current incompleteness of the source data, calls for continuing addition of data), graph database allows for more possibilities in terms of efficient and versatile querying and expansion.
    keywords: database; graph; graph database; japanese; lawsuits; movement; network; reparation
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/pan-networking-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/pan-networking-2022.txt

        item: #23 of 28
          id: piper-conlit-2022
      author: piper
       title: piper-conlit-2022
        date: 2022
       words: 2462
      flesch: 44
     summary: As we show with the overview of our data (Table 1), our institutional frameworks can include bestseller lists, prize committee shortlists, book review lists, user-generated “choice awards”, or corporate forms of categorization. We define “popular” through multiple criteria that include user-generated awards or lists, elite prize committee lists or book reviews, or bestseller tags on platforms like Amazon or the New York Times.
    keywords: books; data; fiction; genre
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/piper-conlit-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/piper-conlit-2022.txt

        item: #24 of 28
          id: pitts-corpus-2022
      author: pitts
       title: pitts-corpus-2022
        date: 2022
       words: 1822
      flesch: 46
     summary: These advantages hold true in fragmentary languages such as Venetic or Messapic as much as in large corpus languages such as Classical Latin or Greek. This database was created in the context of a PhD project on language contact in Ancient Italy, entitled The interplay between language contact and language change in a fragmentary linguistic area: the Italic peninsula in the first millennium BCE.
    keywords: corpus; data; languages; linguistic
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/pitts-corpus-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/pitts-corpus-2022.txt

        item: #25 of 28
          id: turenne-mining-2022
      author: turenne
       title: turenne-mining-2022
        date: 2022
       words: 6903
      flesch: 43
     summary: The choice of the pair Chinese–English has several motivations: firstly, the data is more easily available; secondly, there is a demand for English and Chinese tools and datasets, as English is already the lingua franca in many areas (political, economical, cultural, and scientific), and we also see an increasing interest in Chinese, which is now being taught at schools in western countries. This paper is divided into the following sections: we discuss the dataset and its sub-datasets, describe the state- of-the-art research based on bilingual corpora, machine learning, and natural language processing, and then present the results of our experiments.
    keywords: chinese; corpus; dataset; doi; domain; english; finance; language; parallel; proceedings
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/turenne-mining-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/turenne-mining-2022.txt

        item: #26 of 28
          id: vauth-event-2022
      author: vauth
       title: vauth-event-2022
        date: 2022
       words: 1962
      flesch: 49
     summary: These annotations were used for the automation of narratological event annotations (Vauth, Hatzel, Gius, & Biemann, 2021), a reflection of inter annotator agreements in literary studies (Gius & Vauth, 2022) and the development of an event based plot model (Gius & Vauth, accepted). Inter Annotator Agreement (Krippendorff’s α) for event types.
    keywords: event; gius
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/vauth-event-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/vauth-event-2022.txt

        item: #27 of 28
          id: verbruggen-social-2022
      author: verbruggen
       title: verbruggen-social-2022
        date: 2022
       words: 7049
      flesch: 36
     summary: By collecting and enriching a dataset of international organizations and congresses associated with social reform, TIC sought to map cooperation across national lines and across thematic categories. Social Reform International Congresses and Organizations (1846–1914): From Sources to Data RESEARCH PAPER CORRESPONDING AUTHOR: Christophe Verbruggen Department of History – GhentCDH, Ghent University, Ghent, BE christophe.verbruggen@ugent.be KEYWORDS: social reform; transnational history; network analysis; social internationalism; collective action TO CITE
    keywords: congresses; data; doi; ghent; international; open; organizations; reform; social; university; van
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/verbruggen-social-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/verbruggen-social-2022.txt

        item: #28 of 28
          id: yi-accessibility-2022
      author: yi
       title: yi-accessibility-2022
        date: 2022
       words: 10694
      flesch: 42
     summary: Accessibility, Discoverability, and Functionality: An Audit of and Recommendations for Digital Language Archives RESEARCH PAPER CORRESPONDING AUTHOR: Irene Yi Linguistics Department, Yale University, New Haven, CT, US irene.yi@yale.edu KEYWORDS: language archives; documentation; accessibility; discoverability; functionality; linguistics; endangered languages; metadata TO CITE Language archives utilize a number of different content management systems and do not provide uniform functionality (Aznar & Seifart 2020).
    keywords: access; archives; collections; data; digital; doi; files; information; language; language archives; materials; users
       cache: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/cache/yi-accessibility-2022.pdf
  plain text: /Users/eric/Library/CloudStorage/Box-Box/shared-folder/reader-library/johd/txt/yi-accessibility-2022.txt