key: cord-0227876-b11dpw37
authors: Broekman, Andrew; Marshall, Linda
title: Linguistic Inspired Graph Analysis
date: 2021-05-13
journal: nan
DOI: nan
sha: ccdc309be2393c644c69ea0d64c4a8dc26074cbd
doc_id: 227876
cord_uid: b11dpw37

Isomorphisms allow human cognition to transcribe a potentially unsolvable problem from one domain to a different domain where the problem might be more easily addressed. Current approaches only focus on transcribing structural information from the source to target structure, ignoring semantic and pragmatic information. Functional Language Theory presents five subconstructs for the classification and understanding of languages. By deriving a mapping between the metamodels in linguistics and graph theory it will be shown that currently, no constructs exist in canonical graphs for the representation of semantic and pragmatic information. It is found that further work needs to be done to understand how graphs can be enriched to allow for isomorphisms to capture semantic and pragmatic information. This capturing of additional information could lead to understandings of the source structure and enhanced manipulations and interrogations of the contained relationships. Current mathematical graph structures in their general definition do not allow for the expression of higher information levels of a source.

This report examines whether using graph isomorphisms to create a target structure from a source structure might exclude certain information, such as syntactic, semantic and pragmatic information. To facilitate the identification process this report considers the domain of linguistics, more specifically looking at Functional Linguistic Theory. An investigation is conducted to determine whether a possible overlap exists between linguistics and graph theory. A corresponding taxonomy and mapping will be derived between the two domains. A mapping allows for the study of overlap by examining which constructs can be mapped. In the derivation of the the associated mappings the use of visual aids and tables will serve as auxiliary proof [Thomas, 2006] .

Section 2 provides a brief overview of linguistics used as a basis for the derivation of the mapping later in the text. Section 3 provides the fundamental definitions for the graph structures, which will be investigated for expansion. The mapping between linguistics and graph theory is presented in Section 4. Future work is discussed in Section 5, followed by the conclusion in Section 6. 

Linguistics can be understood to be the study of human language [Widdowson, 1996] . Chomsky defines a language "to be a set (finite or infinite) of sentences, each finite in length and constructed out of a finite set of elements" [Chomsky, 1957] . The definition of Chomsky is provided formally in Definition 2.1.

Definition 2.1 (Chomsky Formal Definition) . Given a finite set E of elements. The subset S ⊆ E is called a sentence. A language is defined by L :

Modern linguistics is based on Functional Language Theory and divides language into five components: (a) phonology -smallest sound unit in a language, i.e. phoneme; (b) morphology -smallest unit in a language that has meaning, i.e. morpheme; (c) syntax -set of rules on how morphemes can be combined into larger expressions; (d) semantics -concerned with the meaning conveyed by syntax; (e) pragmatics -the context surrounding the use of the language [Widdowson, 1996 , Hoque, 2015 , Hickey, 2005 . Figure 1 presents a visual representation of Functional Language Theory. This section provides a brief overview of each component and the function that it serves within a language of choice.

During the discussion of linguistics, two meta models are present, namely: (a) linguistic constructs -a mental metamodel of abstract constructs used to model linguistic concepts; (b) realised linguistic constructs -a metamodel of the realisation of various abstract constructs typically by a machine in question.

The field of phonetics studies the set of all human sounds. Phonetics focuses on the emission, transmission and reception of sound which is termed articulatory, acoustic and auditive phonetics respectively [Hardcastle et al., 2010 , Hammerström, 2014 , Hickey, 2005 . Articulatory phonetics studies the sounds brought forth by the sender of information (the speaker). The field of auditive phonetics studies how the receiver of information (the hearer), receives and interprets the sounds. Acoustic phonetics studies the medium (the channel) used to transfer the sounds of the information [Hammerström, 2014] . The transference of information using sound needs to occur in a physical medium; examples include sound waves through particles, ink marks on paper or markings on a digital screen.

In contrast the area of phonology is concerned with a subset of human sounds and the classification of said sounds under a language of study. Phonology furthermore also studies the relationships between phonemes [Widdowson, 1996 , Hoque, 2015 , Hickey, 2005 . Phonemes are defined later in this section.

A phoneme is the smallest abstract unit of sound in a language that has meaning. The phonemes of the word cool is given by /k/, /u:/, /l/, where forward slashes are used as delimiters. The phonemes between languages can differ, in other words. a phoneme in one language might not be a phoneme in another language. In English, the r is not a phoneme. The reason is that the r sound variations, i.e. single, flap, and rolled, do not differentiate the meaning of words but are merely a consequence of the letter's location in a word. In other languages, such as Spanish, the r sound can differentiate between words such as [perro] and [pero] representing dog and but respectively. [Widdowson, 1996 , Hickey, 2005 An allophone is the phonetic realisation of a phoneme. Various reasons contribute to the different realisations of a phoneme. Reasons include syllable position, surrounding sounds, the distinction between short and long vowels [Widdowson, 1996 , Hickey, 2005 .

Hickey defines a phone as "the smallest unit of human sound which is recognisable but not classified". The phones in the word peat can be represented as [p] , [i:], [t] . Note that square brackets are used as the delimiters for phones. [Widdowson, 1996 , Hickey, 2005 

Morphology is the study of words within a language including their internal structure and relationships to one another. Morphology can be subdivided into two fields namely: (a) word formation -changes a word undergoes when altered to form a new word; (b) grammatical inflection -changes a word undergoes when assuming a different role in a sentence [Widdowson, 1996 , Hickey, 2005 .

A word is some linguistic element characterised by internal stability, and external mobility [Lyons, 1968 , Hickey, 2005 . Internal stability refers to the fact that a word can not be broken down into a set of two or more independent linguistic elements. An example of a word with internal stability is the word numb, in that the word can not be broken down into further elements, that is n + umb, nu + mb or num + b. The word itself is mobile, as it can be used in various syntactic constructions, such as My arm was numb after the operation and Being caught out in the snow numbed my fingers.

The smallest unit in a language that has meaning is known as a morpheme. Every word is constructed out of one or more morphemes, but morphemes are not necessarily words. The word cats is constructed from two morphemes namely: (a) cat -a root morpheme; (b) -s -a morpheme indicating plurality. The morpheme cat, in the preceding sentence is a word since it can stand alone. Similar to phonology, the allomorph is the realisation of a morpheme. The morpheme "ed" is realised in various phonological allomorphes such as phonentic symbol /Id/ in "partˆed", similarly for the /d/ in "pullˆed" or /t/ in "pushˆed".

Syntax is the rule set of the language under study, concerned with how words and morphemes can be combined into larger expression units, often referred to as sentences. The only field with comparative research and analysis in linguistics is the field of phonology [Hickey, 2005] . Autonomy of syntax refers to the rules of the language not needing to be affected by a context outside the language itself, that is to say grammars of languages are autonomous [Croft, 1995 , Doetjes et al., 1998 , Hickey, 2005 ]. The analysis of syntax focuses on a few critical areas of sentences. The first focus is on the ordering of the elements within the sentence structure. This ordering is concerned with word classes such as nouns, verbs, pronouns, prepositions and adjectives. The second is to examine sentence structure to explain surface ambiguities and how they arise. Finally is to analyse the relatedness of sentences to one another.

Two structures are studied in syntax analysis: (a) deep structure -refers to the structure not visible in sentences; (b) surface structure -refers to the structure of a sentence in its spoken or written form [Hockett, 1958 , Chomsky, 2019 , Hickey, 2005 . The syntactic relations between elements of a sentence, such as an object, subject and predicate, are considered deep structures. Deep structures have the characteristic that they can differentiate sentences when no surface structure differences are present. The surface structure of a sentence concerns itself with superficial characteristics, such as the order of elements.

Linguists make use of a variety of tools to study the syntax of sentences. One such tool is to visually represent the underlying structure using a tree diagram, depicted in Figure 2 . Tree diagrams are often used in the analysis of sentences and can be seen as one possible way of encoding sentences in a particular language. Whether these diagrams accurately reflect mental models is debatable. The use of tree diagrams highlights the abstract properties of: (a) temporal precedence -concerned with which elements precede other elements; (b) dominance relations -concerned with the relations that various units have with regards to one another [Cornell and Rogers, 2014, Hickey, 2005] .

The morphemes, the structural unit of a language, are concerned with expressing an idea and serve a function in the language. The formal definition of a language, Definition 2.1, ignores this fact [Widdowson, 1996] . This omission raises the need to know more about the words, not as structural concepts, rather as concepts that convey meaning together with syntax referred to as semantics. Semantics is concerned with the study of the encoded meaning in a sentence and can be thought of as a functional grammar [Widdowson, 1996 , Feng, 2013 , Hickey, 2005 .

Given a piece of text, say T , various questions can be asked about the text T . Who wrote the text? Why was the text written? What is the information conveyed by the text T ? Who is the intended receiver of the information? Thus, pragmatics studies the meaning of information conveyed by a sender and receiver under a specified context [Widdowson, 1996 , Hickey, 2005 . Examples of context include day-to-day context, emergency context, workplace context, to name a few.

Graph structures are mathematical structures used to represent relationships between two or more entities. These structures provide various disciplines with a concrete mechanism to express, manipulate, and interrogate relationships.

Graphs have found a place among social network analysis [Wasserman et al., 1994] , web page indexing [Page et al., 1999] , metabolic analysis [Jeong et al., 2000] and knowledge representation [Sowa, 2014] . More recent applications, include the epidemiological study and visualisation of infectious disease, especially related to the Coronavirus Disease 2019 (COVID-19) [Verspoor et al., 2020 , Bras et al., 2020 , Cernile et al., 2021 .

A graph structure consists of two sets: a set of vertices V and a set of edges E. The canonical form is presented first, also known as an undirected graph or simply a graph. Following this, the concept of directional edges is introduced, which defines the directed graph or digraph. This report presents a brief overview of graph structures. The interested reader is directed towards Graph Theory with Applications by Bondy and Murty (1976) and Graph Theory by Diestel (2017) for a more in-depth discussion on graph theory.

The representations of the graphs given in Figure 3 are best understood to be mental metamodel constructions as represented by a machine 1 . In contrast to the realisations by some machines of these same graphs provided in Figure 5 .

Various authors provide different definitions for a undirected graph structure. The first definition of Bondy & Murty makes use of an incidence function to define the edges completely. The definition of Bondy & Murty is reproduced in Definition 3.1 [Bondy and Murty, 1976] .

Specification 3.1 is obtained by applying Definition 3.1 on Figure 3a . 

When directionality is introduced onto the edges of a graph G, a directed graph or digraph structure emerges. For the directed graph G, direction on the edges implies that the edge from vertex a to vertex b is different to the edge going from vertex b to vertex a for ∀a, ∀b ∈ V (G). To allow for directionality in graphs the incidence function can be redefined to map an edge to a set of ordered pairs. The incidence function in Defintion 3.1 can be modified to a define a directed graph, as per Definition 3.2.

• V(G) is a non-empty set of vertices • E(G) is a set disjoint from V(G) of edges, and • φ G is an incidence function that associates with each edge of G an ordered pair of vertices G, i.e. φ G :

In the provided context machine refers to an entity that can compute. Therefore either a human or computer can be considered.

Specification 3.2 is obtained by using Definition 3.2 on Figure 3b . 

Section 2 presented the reader with various concepts around linguistics while Section 3 presented a brief overview of graph theory. This section investigates the relationships between various linguistics components and graphs and derives associated mappings between the two domains. The visual reader can refer to Figure 4 for a more detailed analysis of the purpose of this report. The functional linguistics representation from Figure 1 has been coloured to highlight that each of the subconstructs of Fundamental Language Theory is composed out of two metamodels. the mental and realised metamodel indicated by red and teal transparent half-circles respectively. The mathematical graph domain is presented as two individual red and teal circles, one for each metamodel. The realisation mappings, represented by the green dashed line, exist implicitly in both domains.

This report proceeds to investigate and derive mappings between the metamodel represented by the dashed purple line. One mapping between the mental metamodel of the five language subcomponents and the mental metamodel of graphs. The second mapping is between the realised metamodel of the five language subcomponents and the realised metamodel of graphs. In deriving the mapping, the macroscopic characteristics of linguistics and graphs are first considered and mapped at the syntactic level. Thereafter the mapping is refined by moving to more abstract concepts in linguistics, that is moving from the syntax subconstruct towards the phonology subconstruct.

The mapping of syntactic concepts to graphs is studied, after which the mapping of morphology and phonology will be considered. Section 2 identifies that linguistic syntax has the sentence as a core construct which is composed out of smaller units, namely morphemes. From Definition 3.1, a graph structure is comprised out of two smaller units, namely vertices and edges. The first natural mapping is that a node and a vertex is to graph theory what the morpheme is to linguistics. A graph's construction comprises two building blocks, similar to how sentences are constructed from morphological building blocks. The second mapping arises from observing that the graph fulfils a similar function as the concept of a sentence in linguistic syntax. A graph and sentence is a larger unit of expressing meaning.

The incidence function, as defined by Definition 3.1, associates with each edge in the graph a set or ordered pair of vertices, depending on whether the graph is directed or not. The incidence function is represented by φ G : E → {U ⊆ V | 1 ≤ |U | ≤ 2} or φ G : E → {U ⊆ V × V } for undirected and directed graphs respectively. The incidence function define stringent constraints on how the units of a graph can be arranged. Explicitly an edge is allowed only to connect vertices, this means an edge cannot connect to another edge. Hence the incidence function fulfills a similar role as linguistic syntax, defining the rules for a language.

If one considers a graph in terms of set theory, it is natural to define some total or partial ordering of the elements. When an ordering is defined over a graph G, a ordered graph is obtained. This is similar to when syntax defines the ordering of word types onto a language. The ordering of elements map naturally onto the concept of a order relation. Furthermore from set theory it follows that an infimum and supremum can be obtained for the set of vertices and edges. In Section 2.3, the temporal precedence and dominance relations hold very close ties to the ordering of linguistic elements. Similarly consider a tree structure, as a specialised graph structure. In a vertical arrangement, where a node has a single parent and child nodes, it is natural to define dominance relations vertically. Similarly, the relation between sibling nodes map naturally onto temporal precedence of a language. The physical or mental model of a graph G can be naturally mapped onto the surface structure of a sentence. To determine the difference in graphs a difference operator can be defined. The difference in graphs would also show the relatedness of graphs, similar to syntax.

A vertex and an edge are atomic units of a graph structure, exhibiting internal stability similar to linguistics. Internal stability of a vertex and edge refers to the concept that these units can not be subdivided into smaller units.

External mobility refers to the ability of a word to be placed in various positions in a sentence. The moving of a word does change the structure of a sentence; a vertex and an edge do exhibit external mobility. The induction of a structural change by the movement of a vertex or edge in a graph is consistent with linguistic behaviour.

Recall that the allomorph is the realisation of the morpheme. A vertex or edge realised by markings on a medium is similar to the allomorph for morphemes. The morpheme maps naturally to the mental metamodel a machine has about the abstract concept of containment and the connection between two concepts.

Linguistic phonology is concerned with sound, however graph structures are conveyed visually. In this section an additional metamapping between sound and visual communication is considered.

Syntax Syntax "The constituent structure of sentences" [Widdowson, 1996] Incidence function Incidence function provides the rules as to how edges may connect, i.e. only single or two vertex in normal graph, or multiple vertices in hypergraphs Ordering of elements General description of a phone Ordered graph A graph structure with a total or partial ordering Empty Categories A linguistic category assumed to exist in a sentence but without any manifestation [Hickey, 2005] .

Ordered graph A graph structure with a total or partial ordering

Ambiguity arising from the speaker [Hickey, 2005] -Relatedness of sentences Indicates the relatedness of sentences by using movement rules as notational means [Hickey, 2005] .

Graph difference as proposed by Marshall (2014) Surface structure Actual form of a spoken or written sentence [Hickey, 2005] .

Graph G A graph given by G, the visual or mental realisation of the graph Deep structure Level of representation where the meaning of the sentence structure is unambiguous [Hickey, 2005] .

Reflects the precedence relations of the elements in the sentence [Hickey, 2005] .

Total or partial ordering (Horizontal) A total or partial ordering define on the graph G

Relation of elements to each other in sentences [Hickey, 2005] .

Total or partial ordering (Vertical) A total or partial ordering define on the graph G

Various forms of encoding sentences are possible, e.g. tree diagrams [Hickey, 2005] .

The method in which the graph is encoded, e.g. adjacency list, adjacency matrix, incidence matrix or object oriented representation [Drozdek, 2013] Table 1 : Mapping from linguistic syntax concepts to corresponding concepts in graphs

Morpheme "An abstract element of meaning" [Widdowson, 1996] .

Mental metamodel representing the concept of a vertex as a unit of containment and edge as unit of connection between concepts. Allomorph "The Version of a morpheme as actually realised in speech or writing" [Widdowson, 1996] Vertex or Edge Realisation of some marking on a medium representing a vertex or edge Internal stability Word which can not be broken down into further elements [Hickey, 2005] .

Atomic vertex or edge Vertex or edge is unable to be subdivide into smaller representations. External mobility Words can occupy various positions in a sentence [Hickey, 2005] .

Vertex can be moved around the structure as long as incidence function is obey Table 2 : Mapping from linguistic morphology concepts to corresponding concepts in graphs Phonology Phonetic "Concerning the actual pronunciation of speech sounds" [Widdowson, 1996] .

Mental metamodel of an abstract concept or entity as represented by set theory Phoneme "The abstract element of sound, identified as being distinctive in a particular language" [Widdowson, 1996] .

Abstract marking concept serving to separate the meaning between an entity and some relationship Phone "Smallest unit of human sound which is recognisable but not classified" [Hickey, 2005] .

Realised expression of an entity or relationship.

Allophone "The version of the phoneme as actually realized phonetically in speech" [Widdowson, 1996] .

Realised difference in marking to express the concept of an entity or relationship. Table 3 : Mapping from linguistic phonology concepts to corresponding concepts in graphs [Shannon and Weaver, 1949] . A metamapping between audio and visual communication can be defined as follows: (a) the machine producing visual markings to transfer information, maps to the concept of articulatory phonetics; (b) the medium on which the visual markings are made, corresponds to acoustic phonetics; (c) the machine processing the visual markings with the goal to receive information, maps to acoustic phonetics.

A phoneme refers to the mental metamodel of a sound representation. The visual equivalent would be the mental metamodel of markings on some medium, that is the mental metamodel of shapes and lines. A phone, the realised sound a human produces, maps naturally onto the realised drawings produced by a human. The realised markings can be any visual marking on a medium, including drawings on a digital screen. The allophone is the different ways that a phoneme can be reproduced. Humans produce different markings due to various aspects, including pressure, stroke types, medium characteristics and mental state, to name a few. The difference in these markings maps onto the allophone. Figure 3b presents a mental model of a graph which is realised by the respective markings in Figure 5 . The realisations of the graph was made by three different machines, using a variety of mediums. Table 4 summarises the inter-and intra-mappings between the two metamodels of linguistics and graphs. Figure 6 represents a taxonomy of both metamodels and the mapping between the two presented domains. In Figure 6a shows the mental (abstract) constructs contained by a machine. In contrast Figure 6b show a concrete (literal) scenario in each domain. The bottom of the taxonomy in Figure 6b represents concrete ideas in small units. As one moves up the respective taxonomies, it is observed how these smaller units can be combined to form more complex abstract representations. Lastly, Figure 7 visually represents the coverage of which subcomponents from Functional Language Theory, could be mapped onto corresponding concepts in graph theory. Figure 7a represents the presence of mappings between the subconstructs of the Functional Language Theory and graphs, while Figure 7b represents the absence of aforementiond mappings. That is the mental and realised metamodels of syntax, morphology and phonology could Table 4 : Table representation of the inter-mapping between the metamodels within each domain as well as the intra-mapping between the corresponding metamodels across domains.

be mapped onto corresponding concepts in graphs. In contrast no mappings from the linguisitic mental and realised metamodels for pragmatic and semantics could be mapped onto graphs.

Graphs structures exhibit many similar features as those found in a human language. Future work includes investigating whether deep structures exist in graph structures. If deep structures do exist, an analysis should be conducted, as to whether the deep structures in graphs can be mapped to the corresponding deep structures in linguistics. No mapping to the semantical and pragmatic concepts using Definition 3.1 could be found. Further investigation into the introduction of semantic and pragmatic concepts into graphs should be undertaken. This includes whether the ideas can be mapped onto the corresponding subconstructs of linguistics. Additional analysis should be done to determine whether graphs could be classified as a linguistic theory, that is to say, whether graphs fulfill the elements of economy, simplicity, generality and falsifiability [Hickey, 2005] . In addition, any proposed linguistic theory should satisfy observational, descriptive and explanatory adequacy [Green, 2006 , Rizzi and Rizzi, 2016 , Hickey, 2005 .

From the investigation undertaken, there is a large overlap between concepts in linguistics and graphs. The subconstructs that appear missing from graph structures are the representation of semantic and pragmatic information. These missing subconstructs were identified by deriving mappings between two metamodels in both Fundamental Language Theory and mathematical graph structures. The metamodels considered are the mental and realised metamodels of each domain.

This reports highlights the missing concepts of semantics and pragmatics which occur when using mathematical graphs. Graphs lacking semantics and pragmatics could be viewed as being weak from the perspective of linguistics, that is lacking meaning. This paper proceeds to refer to mathematical structures void of semantics and pragmatics as etiolated structures. In biological processes etiolation occurs in flowering plants that grow in partial or complete darkness. This process leads to elongated stems and leaves, longer internodes and chlorosis in plants [Armarego-Marriott et al., 2020] . In an abstract context etiolation can be viewed as the removal of substance from an entity in question, thus the reason for naming mathematical structures that are void of semantics and pragmatics. An etiolated structure only contains syntactic information of the source structure and is often derived from another structure or representation. Examples of source structures include examples such as social networks or the relationship between cited authors. The original structure or representation is referred to as the source structure, while the etiolated structure is referred to as the target structure.

Formally the target structure is said to be isomorphic to the source structure, in other words preserving the structural information. For an in-depth discussion on isomorphisms, the reader is directed towards Graph Theory by Reinhard Diestel [Diestel, 2017] . The omission of semantic and pragmatic information eases the mathematical manipulation of the target structure. The target, acting as an isomorphic proxy to the source structure, is manipulated rather than the source structure itself. Isomorphisms allow human cognition to transcribe a problem and solution between two or more independent domains. The transcribing of the problem into a different domain might allow for easier solving of the problem. A transference of the solution back to the original domain may not be possible and needs to be researched further. [Greer and Harel, 1998, Uptegrove and Maher, 2004] 

Beyond the darkness: Recent lessons from etiolation and de-etiolation studies

Graph Theory with Applications

Network graph representation of COVID-19 scientific publications to aid knowledge discovery

Syntactic Structures. Mouton de Gruyter

Deep Structure, Surface Structure and Semantic Interpretation

Model Theoretic Syntax. In The First Glot International State-of-the-Article Book

Graph Theory

Degree Expressions and the Autonomy of Syntax

Data Structures and Algorithms in Java. Cengage Learning

Functional grammar and its implications for English teaching and learning

Levels of Adequacy, Observational, Descriptive, Explanatory

The role of isomorphisms in mathematical cognition

Göran Hammerström. Articulatory, acoustic or auditory description? STUF -Language Typology and Universals

The Handbook of Phonetic Sciences: Second Edition

Levels of language

A Course in Modern Linguistics

Components of Language-Dr. M. Enamul Hoque

The large-scale organization of metabolic networks

Introduction to Theoretical Linguistics

The PageRank Citation Ranking: Bringing Order to the Web

The Concept of Explanatory Adequacy. In The Oxford Handbook of Universal Grammar

The mathematical theory of communication

Principles of semantic networks: Explorations in the representation of knowledge

Developing versatility in mathematical thinking

COVID-SEE: Scientific Evidence Explorer for COVID-19 Related Research

Social network analysis: Methods and applications

Linguistics