Linking, publishing and evaluating - Linked Open Data for language resources Linking, publishing and evaluating Linked Open Data for language resources Francesco Mambrini francesco.mambrini@unicatt.it SCS Annual Meeting | Washington, DC | January 3, 2020 1 Table of Contents Introduction Treebanks and Linguistic annotation Linked Open Data LOD for language resources The L-LOD network LiLa: Linking Latin Conclusion Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 2 Treebanks! Morphology TEXT Syntax Figure: Morphosyntactic information stored in a XML file of the Ancient Greek and Latin Dependency Treebank. Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 3 Treebanks: a success story? Figure: A comment posted on Facebook about the workshop of the PapyGreek Project, Helsinki 2018. Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 4 Open problems sparseness: there is a multitude of projects involving linguistic annotation; standardization: projects jealously hang on their guidelines and tagset and refusing to consider any form of standardization; interoperability: no way to make morphosyntactic annotation interact with other levels of information (e.g. lexical resources); usability: lack of general-purpose tools for annotating, manipulating and querying the data. Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 5 Use-evaluation-correction A virtuous circle Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 6 LOD and Semantic Web in the Classics Figure: Linked Open Data: the recommendations. Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 7 Pelagios a LOD network of annotations I annotate place reference using gazetteer URIs from Pleiades I publish annotation using the OAC vocabulary How? | Don’t Unify the Model – Annotate! 4 Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 8 The Pelagios model Strengths and weaknesses decentralization: as Pelagios only links data from many different project; a simple model: based on one minimal vocabulary (no effort of conversion/mapping); community effect: Pelagios is nowadays more than a successful platform; it is a well connected and motivated community of people de facto standard: Pelagios has achived the critical mass to be a de facto standard. Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 9 The L-LOD network Legend Corpora Lexicons and Dictionaries Terminologies, Thesauri and Knowledge Bases Linguistic Resource Metadata Linguistic Data Categories Typological Databases Other PDEV-L... lexinfo Sentim... Lexvo wiktio... Univer... OLiA Univer... Univer... Univer... Polyma... DBpedia Univer... TheSoz... STW Th... Univer... UMTHES GEnera... WOLF W... WordNe... Parole... Arabic... Slovak... Univer... CLLD-P... CLLD-WALS Genera... Catala... RSS-50... MASC-B... BabelNet Univer... Manual... Univer... CLLD-A... CLLD-G... Interc... Aperti... Aperti... Univer... Wordne... MLSA -... Aperti... Univer... CLLD-E... Aperti... MultiW... Lexico... Univer... Univer... Automa... xLiD-L... Croati... Chat G... FiESTA MExiCo ISOcat Aperti... Social... BulTre... Univer... Prince... Muninn... Univer... Univer... gemet-... Univer... Univer... OpenWN... Aperti... de-gaa... Univer... Greek ... DBpedi... DBpedi... Aperti... IATE RDF FrameB... SLI Ga... lemonUby Romani... EMN News-1... Univer... Glottolog FinnWo... Univer... Univer... WordNe... WordNe... Geolog... Hebrew... Aperti... ietflang SALDO-RDF Aperti... Norweg... Japane... CLLD-afbo Univer... Univer... Univer... Univer... GeoWor... Copyri... DBpedi... Univer... Univer... Multil... Univer... Univer... ISOcat... Univer... SIMPLE SweFN-RDF Chines... Wikili... dbnary FAO ge... linked... EuroSe... Phonet... Univer... OLiA D... Univer... LemonW... Univer... ALPINO... Persia... Univer... Aperti... Univer... TDS Linked... IWN Univer... WordNe... lingvo... ItalWo... Swedis... JRC-Na... Univer... Univer... Aperti... ThIST EARTh CLLD-S... Aperti... Chines... Aperti... Open D... Univer... Aperti... Univer... Univer... Aperti... Biblio... Univer... Framester Atlant... Aperti... Univer... Thai W... DanNet... Aperti... Multex... Univer... Univer... Cornet... Aperti... Ontos ... Linked... Univer... Zhishi.me Univer... Univer... Gemeen... Univer... Univer... DBpedi... Pleiades Univer... sloWNe... Reuter... Open B... OLAC M... Galici... LODAC ... Basque... PreMOn Lingui... Univer... Univer... Univer... Univer... Open W... Univer... associ... SALDOM... Univer... Brown ... Greek ... Univer... Univer... Univer... Aperti... Univer... IceWor... KORE 5... Albane... Aperti... World ... Open M... Aperti... Aperti... plWord... Univer... zhishi... CLLD-WOLD Univer... Aperti... PanLex The Linguistic Linked Open Data Cloud from lod-cloud.net Figure: The Linguistic Linked Open Data (LLOD) Cloud. Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 10 Why LOD? sparseness: all independent and self-standing projects can live and prosper across the web; small size/marginality: newcomers can be adequately represented along with the “big players”; lack of interoperability: as many layers of annotations can be added to enccode information about any level of linguistic analysis (syntax, morphology, semantics, pragmatics...); lack of standardization: the adoption of common vocabularies is crucial for any LOD enterprise. usability issues: interoperable and standardized data are ready to be reused; data integrated in a LOD network are easier discover and thus reuse. Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 11 The LiLa project https://lila-erc.eu/ I funded under the ERC program (principal investigator: Marco Passarotti) I aims to connect linguistic resources (lexica, corpora, NLP tools) of Latin I uses the lemma has the linking element (pretty much as Pelagious uses the gazetteer ID) I provides URIs for latin lemmas, using an ontology based on Ontolex I the collection of lemmas (and the first resources linked) can be: I visualized at: https://lila-erc.eu/lodlive/ I queried at: https://lila-erc.eu/sparql/ Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE https://lila-erc.eu/ https://lila-erc.eu/lodlive/ https://lila-erc.eu/sparql/ 12 LiLa: link via the lemma “causa” in Thomas Aq. SCG 1.2.1 Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE 13 Summing up With LOD we can produce data that are: 1. more connected 2. more discoverable 3. more standardized 4. easier to reuse Full name & Full name | Università Cattolica del Sacro Cuore, CIRCSE Introduction Treebanks and Linguistic annotation Linked Open Data LOD for language resources The L-LOD network LiLa: Linking Latin Conclusion