Music Encoding Conference Proceedings 2020 47 MEI and Verovio for MIR: A Minimal Computing Approach Mark Saccomano Natalia Ermolaev Columbia University Princeton University m.saccomano@columbia.edu nataliae@princeton.edu Abstract While the increase in digital editions, online corpora, and browsable databases of encoded music presents an extraordinary resource for contemporary music scholarship, using these databases for computational re- search remains a complex endeavor. Although norms and standards have begun to emerge, and interopera- bility among different formats is often possible, researchers must devote considerable time to discover, learn, and maintain the skill sets necessary to make use of these resources. This talk will discuss our work with the Serge Prokofiev Archive and the creation of a prototype to browse, display, and play notated music from Prokofiev’s notebooks via a web browser. The project is an example of how using the principles of minimal computing can reduce the burden of technological expertise required to both disseminate and access encod- ed music. The archive The Serge Prokofiev Archive,16 housed at Columbia University, contains more than 17,500 diverse items: music manuscripts, letters, scores, financial documents, notebooks, photographs, and recordings. Originally a per- sonal collection amassed by Prokofiev’s widow Lina, the materials were first established as an archive in 1994 at Goldsmith’s College in London. As the archive grew, a complex, intricate, and item-level descriptive appara- tus evolved alongside it. By the time the collection came to Columbia, the archival items were accompanied by hundreds of metadata files in a wide variety of formats, including Word documents, spreadsheets, text files, PDFs, Endnote databases, Access databases, MARC records, and various XML encodings. Typically, archival collections are accessed through an online Finding Aid, which users often find not only difficult to use, but whose underlying structure and interface can obfuscate the richness of a collection. The blocks of narrative and long lists of items found in a finding aid, especially in a collection of our scope, are a barrier to true discovery. We sought to improve the experience of navigating a large archival collection by affording users the opportunity to make new, spontaneous discoveries. Our Serge Prokofiev Archive as Data17 project was guided by two important conceptual shifts in the library and archives profession. First is the “Collections as Data” movement, which encourages reframing the digital object itself as data [1].18 The second is Kate Theimer’s notion of “archives as platform,” a move away from locating value exclusively in the objects of a collection to the impact collections have on people and communities [2]. In Theimer’s view, the notion of an archive includes the tools and technologies that help users interact with it in creative ways that add value to their lives and experiences. Accessible technology and minimal computing Because we were looking for solutions that could be adapted for researchers with varying skill sets and with different computing needs, we tested out a variety of freely available software to store, structure, clean, ana- lyze and display our data. Also, we had no budget: necessity dictated that we seek out non-proprietary tools. Thus, we placed ourselves in the position of many researchers (independent and graduate student research- ers in particular) looking for ways to disseminate their work to a wider audience. Following this path, we soon 16 https://findingaids.library.columbia.edu/ead/nnc-rb/ldpd_10815449 17 https://mss2221.github.io/spademo/ 18 See also https://collectionsasdata.github.io/statement/ https://findingaids.library.columbia.edu/ead/nnc-rb/ldpd_10815449 https://mss2221.github.io/spademo/ https://collectionsasdata.github.io/statement/ 48 became introduced to the principles of minimal computing and discovered their applicability to our own proj- ect’s goals. Minimal computing19 is a design philosophy that seeks to maximize access to digital materials through reduc- ing reliance on specific hardware and software requirements [3]. Organized around the question “What do we need?”, Alex Gil describes minimal computing as a conscious effort to “harness the new media in smart, ethical and sustainable ways.” In addition to reducing reliance on multiple, and opaque, processes, minimal comput- ing also implies “learning how to produce, disseminate and preserve digital scholarship ourselves, without the help we can’t get [4].” This DIY approach helps minimize dependence on institutional resources and funding, as well as proprietary tools (which, in addition to their cost, often require a high level of expertise as well). One of the first steps we took was to avail ourselves of systems and workflows with ample documentation. We were also cognizant of the advisability for scholars to publish digital materials in a versatile format that requires little or no maintenance and can easily be ported to other systems. This way, digital materials remain accessible even as technology develops in ways that are impossible to foresee today. We soon created a re- pository for the Serge Prokofiev Archive as Data project on GitHub and created a static website for display on GitHub Pages using a Jekyll template. Because a static site does not require knowledge of server operations or database design, it simplifies the task of individual researchers to disseminate their work. For the musical component of our project, we wanted to create not only an attractive front end and simple user interface, but a simple back end as well. The idea was to provide a repository of encoded music that could not only be seen but heard—a difference that could make such a repertory valuable beyond the spe- cialized scholar in computing or musicology. Aficionados and researchers in other fields who may not be able to read code or read music could nonetheless hear the music in Prokofiev’s manuscripts—and could hear for themselves the jagged rhythms and unexpected chromatic alterations that are hallmarks of his style. We also developed a simple workflow for creating the encoded files (one very similar to the process now detailed in the tutorial “Introduction to the Music Encoding Initiative”20 by Anna Kijas and Raffaele Viglianti). To publish to the encoded materials, we used our GitHub website and Jekyll template. The notebooks One of the highlights of the collection are Prokofiev’s notebooks. Here, in an interview transcript from the ar- chive, Prokofiev’s widow Lina described how he used the notebooks in his creative process. SP never stopped creating…. At the most unexpected moment, at the most unusual circumstances—during a conversation or while walking—he would make a note of a new theme in a special notebook he kept in his pocket or on any scrap of paper or on his cuff—on paper napkins in a restaurant. Then on returning home he would copy the themes into a more permanent notebook. The sketches we display on the site are from these “more permanent” notebooks Lena mentions. We began by simply browsing through the notebooks and taking some pictures. Displayed in a web exhibit, these images would be interesting on their own. But we also knew that by adding the sounded music represented by these scores, we would greatly increase the usefulness of these notebooks to scholars, as well as to the general pub- lic. Not all musicologists and music theorists have sufficient musicianship skills to fluently imagine the sound of notated music. For archival materials such as unlabeled sketches, this can aid in identification of fragments, suggesting how and where they might have been used in published scores. MEI was chosen as the encoding format, not only because of its adaptability and increasingly common use in digital musicological projects, but also due to the availability of Verovio, an engraving library that can be used to display and play MEI files in a web browser. As these were short, handwritten passages of only a few measures each, they were entered manually into the music notation program Sibelius. (Because they were written by hand, an OCR program would likely not have been the most efficient method of encoding). Next, the files were exported to MusicXML using the export function of Sibelius. To convert the MusicXML files to MEI, 19 https://go-dh.github.io/mincomp/ 20 https://dlfteach.pubpub.org/pub/intro-mei/release/1 https://go-dh.github.io/mincomp/ https://dlfteach.pubpub.org/pub/intro-mei/release/1 Music Encoding Conference Proceedings 2020 49 we used the automated converter21 available on the Verovio website. This worked extremely well and yielded excellent results. The light editing that remained to be done was mostly for aesthetic purposes of display. The editing was done in Atom, using the MEI-Tools-Atom22 package, which renders MEI in a separate pane within the application. Once the MEI files were checked and polished, they were uploaded to the GitHub repository. The challenge, then, at this stage, was to create page templates that would incorporate Verovio. Although it took many at- tempts to pull everything together, the results were encouraging, and a prototype was developed that could display an engraved version of the score derived from a digitally encoded version of the manuscript, as well play the score in a browser using a simple interface: https://mss2221.github.io/spademo/sketches/ Implementation Development challenges proved to be formidable. While finding appropriate tools for coding, display and play- back of manuscripts was reasonably easy, getting them to work together was exceedingly complex. Documen- tation, though rich, can be dense; the largest impediment to timely progress is access to consultant who can assist in troubleshooting. Without this, the plethora of manuals and tutorials become an obstacle to learning, creation, and design. (Think of the myriad articles, tips, and guidelines many of us received back in March of this year on how to migrate our courses online—such a wealth of material can be overwhelming). Even with access to university assistance, this site took nearly a year to assemble. However, the skills to use Github and Jekyll are within reach via ground-up tutorials available from such sites as The Programming Historian.23 Difficulties still remain, specifically, those arising from technical solutions that push the limits of common browser capabilities. For example, problems with audio playback, such as web MIDI players clipping notes (due to possible buffering or threading issues) have driven developers on some projects to insert an extra musical object into their encoded scores.24 We also encountered this clipping problem, and were only able to come up with a temporary work around through a laborious trial-and-error process. To ensure the MEI would play properly in the browser, a element or “dummy” event with @visible=“false” had to be inserted before the first and final notes in order for them to be heard. Such inelegant solutions are highly undesirable for an archival representation of a manuscript. Presumably, improvements in how browsers and system players handle MIDI will soon make such workarounds unnecessary. In the meantime, these ad hoc solutions need to be specially commented in the MEI files. Extensibility and Future Directions Sample sites using MEI, Verovio and Ed template El corrido mexicano https://mss2221.github.io/corridosEd/ Serbian hymns https://mss2221.github.io/zagreb/ In order to test the extensibility of this project, we tried it out with texted music in a special Jekyll template for minimal literary editions, “Ed.”25 developed by Alex Gil and associates. The resulting sites showed the flexibil- ity of the Ed theme to handle some of the more complex requirements of Verovio and web MIDI, while still remaining a project that could be managed by a single researcher. They also demonstrate the utility of our chosen suite of open source tools for musicologists, music theorists, and music archivists. In the future, we 21 https://www.verovio.org/musicxml.html 22 https://atom.io/packages/mei-tools-atom 23 https://programminghistorian.org/ We are particularly indebted to Amelia Visconti for her Jekyll tutorial https://programminghistorian. org/en/lessons/building-static-sites-with-jekyll-github-pages 24 https://github.com/cuthbertLab/music21/issues/332 25 https://elotroalex.github.io/ed/ https://www.verovio.org/musicxml.html https://atom.io/packages/mei-tools-atom https://programminghistorian.org/ https://programminghistorian.org/en/lessons/building-static-sites-with-jekyll-github-pages https://programminghistorian.org/en/lessons/building-static-sites-with-jekyll-github-pages https://github.com/cuthbertLab/music21/issues/332 https://elotroalex.github.io/ed/ 50 hope to incorporate search tools for specific series of notes and an analytical component that could be used to identify the stylistic traits of a corpus. A note about program evaluation: one aspect of design that is often overlooked in digital musicology proj- ects is user testing. As noted by David Weigle in his study of the academic use of digitized online resources [5]: “the needs and behaviours of musicologists in particular remain relatively underexplored”. This is not just an issue in musicology. The statement made by Warwick, et al. [6] in 2008 (cited by Murray and Wiercinski [7]) rings true today in 2020: “User testing, like disseminating information, is a skill that most humanities scholars have not acquired”. However, as Murray and Wiercinski point out, a strictly user-centered development might restrict a project’s ability to make full use of nascent technology. For them, the ideal interface would “provide the more familiar and comfortable features that facilitate the types of activities that scholars know,” while af- fording new opportunities for discovery and experimentation “of which they are currently unaware” [7]. Until more research like Weigle’s is conducted on users in music studies, we can only note that all development is an iterative process, an attempt to anticipate needs, get feedback, address shortcomings, and get more feedback. In the meantime, having robust models that can be easily adapted for use by others is a positive step toward increasing access to archival materials. Conclusions While the raw data of much notated music may be ready to be downloaded for analysis, the high-level com- puting skills required to retrieve and analyze that data means that it remains out of reach to many. In order to make collections such as these more accessible, both the resources and the training for encoding, retrieval, analysis, and display of encoded music need to be made available to researchers. We would like our prototype to be a resource to scholars in music studies—an example of open data and code that will lessen the demand for technical expertise for both the researcher and the user, while demonstrating the functionality that can be added to a single site accessed through an ordinary web browser. As music OCR technology continues to become more successful at first-pass recognition, we will want to be prepared to make repositories available to more than just the technologically savvy few. With encoded music, the difference between a mode of access that involves scrolling through a list of text files and one that features an interactive display of scores and sound, is analogous to the difference between retrieving library materi- als through an institution with open stacks and one with closed stacks. Refining interests and homing in on relevant and interesting material are often the result of seeing a book on a shelf, opening it up and thumbing through it—reading a few sentences, checking out the TOC, skipping to the index, looking at the color plates in the middle. We don’t always need or want to engage with materials in this manner, but having the option to do so is invaluable. Works cited [1] Padilla, Thomas. “On a Collections as Data Imperative”. Conference Report. Collections as Data: Stewardship and Use Models to Enhance Access, Library of Congress, Washington, DC, September 27, 2016. http://digitalpreservation.gov/meetings/dcs16/tpadilla_OnaCollecti- onsasDataImperative_final.pdf [2] Theimer, Kate. “The Future of Archives Is Participatory: Archives as Platform; or, A New Mission for Archives” presented at the Offene Archive 2.1 Conference, Stuttgart, Germany, April 3-4, 2014. [3] Sayers, Jentery. “Minimal Definitions”. 2016. https://go-dh.github.io/mincomp/thoughts/2016/10/02/minimal-definitions/ [4] Gil, Alex. “The User, the Learner, and the Machines We Make”. 2015. https://go-dh.github.io/mincomp/thoughts/2015/05/21/us- er-vs-learner/ [5] Weigl, David, et al. “On Providing Semantic Alignment and Unified Access to Music Library Metadata” International Journal on Digital Libraries 20, no. 2 (2019), 25-47. [6] Warwick, Claire, et al. “The Master Builders: LAIRAH Research on Good Practice in the Construction of Digital Humanities Projects” Literary & Linguistic Computing 23, no. 1 (2008), 383-96. [7] Murray, Annie, and Jared Wiercinski. “A Design Methodology for Web-Based Sound Archives” Digital Humanities Quarterly 8, no. 2 (2014). https://www.digitalhumanities.org/dhqdev/vol/8/2/000173/000173.html http://digitalpreservation.gov/meetings/dcs16/tpadilla_OnaCollectionsasDataImperative_final.pdf http://digitalpreservation.gov/meetings/dcs16/tpadilla_OnaCollectionsasDataImperative_final.pdf https://go-dh.github.io/mincomp/thoughts/2016/10/02/minimal-definitions/ https://go-dh.github.io/mincomp/thoughts/2015/05/21/user-vs-learner/ https://go-dh.github.io/mincomp/thoughts/2015/05/21/user-vs-learner/ https://www.digitalhumanities.org/dhqdev/vol/8/2/000173/000173.html