key: cord-0047261-qmxecx07 authors: Greiner-Petter, André; Schubotz, Moritz; Aizawa, Akiko; Gipp, Bela title: Making Presentation Math Computable: Proposing a Context Sensitive Approach for Translating LaTeX to Computer Algebra Systems date: 2020-06-06 journal: Mathematical Software - ICMS 2020 DOI: 10.1007/978-3-030-52200-1_33 sha: 06432db8f21a1161c15915f9aecc44600e8a49e6 doc_id: 47261 cord_uid: qmxecx07 Scientists increasingly rely on computer algebra systems and digital mathematical libraries to compute, validate, or experiment with mathematical formulae. However, the focus in digital mathematical libraries and scientific documents often lies more on an accurate presentation of the formulae rather than providing uniform access to the semantic information. But, presentational math formats do not provide exclusive access to the underlying semantic meanings. One has to derive the semantic information from the context. As a consequence, the workflow of experimenting and publishing in the Sciences often includes time-consuming, error-prone manual conversions between presentational and computational math formats. As a contribution to improve this workflow, we propose a context-sensitive approach that extracts semantic information from a given context, embeds the information into the given input, and converts the semantically enhanced expressions to computer algebra systems. The document preparation system L A T E X has become a de facto standard 1 for writing scientific papers in STEM disciplines over the last 30 years [1] . Numerous other editors, such as the editor for Wikipedia articles 2 or Microsoft Word [11] , entirely or partially support L A T E X expressions. L A T E X provides a syntax for printing mathematical formulae that is similar to the way a person would write the math by hand. Thus, L A T E X focuses on the presentation of formulae but does not explicitly carry their semantic information. For a human reader, L A T E X's focus on formulae presentation is typically not a problem since readers can deduce the semantics of the formulae from the surrounding context and the reader's prior knowledge. Consider the Euler-Mascheroni constant represented by the Greek letter γ. Without further information, γ is just a Greek letter, often used to describe this mathematical constant but can also be used to represent curve parametrization, among other things. Based on the context, a human reader can interpret γ correctly and connect the letter with the semantic background. Computational systems, however, have issues identifying the correct semantics of formulae if the formulae do not provide enough context. For example, in L A T E X, γ is represented as \gamma. Explicitly given semantic information in mathematical expressions becomes increasingly relevant in computational mathematics. Nowadays, many scientists also compute formulae from their papers [2, 3] . They evaluate specific values, create diagrams, and search or calculate practical solutions. Computer Algebra Systems (CAS) are software tools that allow for such computations and visualizations of mathematical expressions. CAS create their representations (hereafter referred to as CAS input) with the intent of creating an input syntax that is intuitive and easy to type. CAS input must be unambiguous to CAS. Otherwise, a CAS is unable to perform computations and visualizations. CAS input is not standardized; instead, each CAS provider has created its own syntax that differs from other systems [10] . The workflow of writing a paper, therefore, leads to the problem of continually transforming mathematical expressions from L A T E X to CAS input and back. Since L A T E X does not carry the semantic information explicitly, the CAS is unable to parse complex input directly. Thus, the author must perform the transformation manually, which is time-consuming and error-prone. Transformations between CAS input and L A T E X are not straightforward and require substantial knowledge of the internal processes for the CAS [10] . Table 1 illustrates the differences in representations exemplified for a Jacobi polynomial [5] . The expression in generic L A T E X, i.e., general L A T E X without custom macros, sharply differs from the semantically unique terms in CAS inputs. To overcome the issue of missing explicit semantic information in L A T E X expressions, the National Institute of Standards and Technology (NIST) has developed a unique set of semantic L A T E X macros. NIST uses these macros for the Digital Library of Mathematical Functions (DLMF) [13] and the Digital Repository of Mathematical Formulae (DRMF) [4] . Both DLMF and DRMF macros enhance the search capabilities on the DLMF and DRMF websites and establish info boxes that provide short descriptions of the symbols, link to their definitions, and further literature. Table 1 shows that the semantically enhanced L A T E X is closer to the syntax supported by a CAS. In the following, we will refer to semantically enhanced L A T E X as semantic L A T E X, and general L A T E X expressions as generic L A T E X, respectively. In the following, we will propose a context-sensitive approach to convert the generic L A T E X expressions to CAS. The approach will take advantage of existing tools and datasets. To the best of our knowledge, there is no system nor a theoretical concept yet that allows for translating L A T E X expressions to CAS and taking the context of the expression into account. Existing tools, such as the inbuild import/export functions of CAS, ignore context information and are therefore limited to simple, unambiguous cases (e.g., rac{1}{2} or Do open source developers respond to competition?: the (La)TeX case study Special issue on the use of computer algebra systems for computer aided control system design Modern Computer Algebra Growing the digital repository of mathematical formulae with generic LaTeX sources Semantic preserving bijective mappings of mathematical formulae between document preparation systems and computer algebra systems Evaluating and improving the extraction of mathematical identifier definitions Part-of-math tagging and applications Automated symbolic and numerical testing of DLMF formulae using computer algebra systems Improving the representation and conversion of mathematical formulae by considering their textual context Semantic preserving bijective mappings for expressions involving special functions in computer algebra systems and document preparation systems Craft beautiful equations in word with LaTeX Discovering mathematical objects of interest -a study of mathematical notations NIST Digital Library of Mathematical Functions Acknowledgments. This work was supported by the German Research Foundation (DFG grant GI-1259-1).