Music Encoding Conference Proceedings 2020 47

MEI and Verovio for MIR: A Minimal  
Computing Approach
Mark Saccomano    Natalia Ermolaev 
Columbia University    Princeton University 
m.saccomano@columbia.edu   nataliae@princeton.edu

Abstract
While the increase in digital editions, online corpora, and browsable databases of encoded music presents 
an extraordinary resource for contemporary music scholarship, using these databases for computational re-
search remains a complex endeavor. Although norms and standards have begun to emerge, and interopera-
bility among different formats is often possible, researchers must devote considerable time to discover, learn, 
and maintain the skill sets necessary to make use of these resources. This talk will discuss our work with the 
Serge Prokofiev Archive and the creation of a prototype to browse, display, and play notated music from 
Prokofiev’s notebooks via a web browser. The project is an example of how using the principles of minimal 
computing can reduce the burden of technological expertise required to both disseminate and access encod-
ed music.

The archive
The Serge Prokofiev Archive,16 housed at Columbia University, contains more than 17,500 diverse items: music 
manuscripts, letters, scores, financial documents, notebooks, photographs, and recordings. Originally a per-
sonal collection amassed by Prokofiev’s widow Lina, the materials were first established as an archive in 1994 
at Goldsmith’s College in London. As the archive grew, a complex, intricate, and item-level descriptive appara-
tus evolved alongside it. By the time the collection came to Columbia, the archival items were accompanied by 
hundreds of metadata files in a wide variety of formats, including Word documents, spreadsheets, text files, 
PDFs, Endnote databases, Access databases, MARC records, and various XML encodings.

Typically, archival collections are accessed through an online Finding Aid, which users often find not only 
difficult to use, but whose underlying structure and interface can obfuscate the richness of a collection. The 
blocks of narrative and long lists of items found in a finding aid, especially in a collection of our scope, are 
a barrier to true discovery. We sought to improve the experience of navigating a large archival collection by 
affording users the opportunity to make new, spontaneous discoveries.

Our Serge Prokofiev Archive as Data17 project was guided by two important conceptual shifts in the library and 
archives profession. First is the “Collections as Data” movement, which encourages reframing the digital object 
itself as data [1].18 The second is Kate Theimer’s notion of “archives as platform,” a move away from locating 
value exclusively in the objects of a collection to the impact collections have on people and communities [2]. In 
Theimer’s view, the notion of an archive includes the tools and technologies that help users interact with it in 
creative ways that add value to their lives and experiences. 

Accessible technology and minimal computing
Because we were looking for solutions that could be adapted for researchers with varying skill sets and with 
different computing needs, we tested out a variety of freely available software to store, structure, clean, ana-
lyze and display our data. Also, we had no budget: necessity dictated that we seek out non-proprietary tools. 
Thus, we placed ourselves in the position of many researchers (independent and graduate student research-
ers in particular) looking for ways to disseminate their work to a wider audience. Following this path, we soon 

16  https://findingaids.library.columbia.edu/ead/nnc-rb/ldpd_10815449
17  https://mss2221.github.io/spademo/
18  See also https://collectionsasdata.github.io/statement/

https://findingaids.library.columbia.edu/ead/nnc-rb/ldpd_10815449
https://mss2221.github.io/spademo/
https://collectionsasdata.github.io/statement/


48

became introduced to the principles of minimal computing and discovered their applicability to our own proj-
ect’s goals.

Minimal computing19 is a design philosophy that seeks to maximize access to digital materials through reduc-
ing reliance on specific hardware and software requirements [3]. Organized around the question “What do we 
need?”, Alex Gil describes minimal computing as a conscious effort to “harness the new media in smart, ethical 
and sustainable ways.” In addition to reducing reliance on multiple, and opaque, processes, minimal comput-
ing also implies “learning how to produce, disseminate and preserve digital scholarship ourselves, without the 
help we can’t get [4].” This DIY approach helps minimize dependence on institutional resources and funding, 
as well as proprietary tools (which, in addition to their cost, often require a high level of expertise as well).

One of the first steps we took was to avail ourselves of systems and workflows with ample documentation. 
We were also cognizant of the advisability for scholars to publish digital materials in a versatile format that 
requires little or no maintenance and can easily be ported to other systems. This way, digital materials remain 
accessible even as technology develops in ways that are impossible to foresee today.  We soon created a re-
pository for the Serge Prokofiev Archive as Data project on GitHub and created a static website for display on 
GitHub Pages using a Jekyll template. Because a static site does not require knowledge of server operations or 
database design, it simplifies the task of individual researchers to disseminate their work.

For the musical component of our project, we wanted to create not only an attractive front end and simple 
user interface, but a simple back end as well. The idea was to provide a repository of encoded music that 
could not only be seen but heard—a difference that could make such a repertory valuable beyond the spe-
cialized scholar in computing or musicology. Aficionados and researchers in other fields who may not be able 
to read code or read music could nonetheless hear the music in Prokofiev’s manuscripts—and could hear for 
themselves the jagged rhythms and unexpected chromatic alterations that are hallmarks of his style. We also 
developed a simple workflow for creating the encoded files (one very similar to the process now detailed in 
the tutorial “Introduction to the Music Encoding Initiative”20 by Anna Kijas and Raffaele Viglianti). To publish to 
the encoded materials, we used our GitHub website and Jekyll template.

The notebooks
One of the highlights of the collection are Prokofiev’s notebooks. Here, in an interview transcript from the ar-
chive, Prokofiev’s widow Lina described how he used the notebooks in his creative process.

SP never stopped creating…. At the most unexpected moment, at the most unusual circumstances—during 
a conversation or while walking—he would make a note of a new theme in a special notebook he kept in 
his pocket or on any scrap of paper or on his cuff—on paper napkins in a restaurant. Then on returning 
home he would copy the themes into a more permanent notebook.

The sketches we display on the site are from these “more permanent” notebooks Lena mentions. We began by 
simply browsing through the notebooks and taking some pictures. Displayed in a web exhibit, these images 
would be interesting on their own. But we also knew that by adding the sounded music represented by these 
scores, we would greatly increase the usefulness of these notebooks to scholars, as well as to the general pub-
lic. Not all musicologists and music theorists have sufficient musicianship skills to fluently imagine the sound 
of notated music. For archival materials such as unlabeled sketches, this can aid in identification of fragments, 
suggesting how and where they might have been used in published scores. 

MEI was chosen as the encoding format, not only because of its adaptability and increasingly common use 
in digital musicological projects, but also due to the availability of Verovio, an engraving library that can be 
used to display and play MEI files in a web browser. As these were short, handwritten passages of only a few 
measures each, they were entered manually into the music notation program Sibelius. (Because they were 
written by hand, an OCR program would likely not have been the most efficient method of encoding). Next, the 
files were exported to MusicXML using the export function of Sibelius. To convert the MusicXML files to MEI, 

19  https://go-dh.github.io/mincomp/
20  https://dlfteach.pubpub.org/pub/intro-mei/release/1

https://go-dh.github.io/mincomp/
https://dlfteach.pubpub.org/pub/intro-mei/release/1


Music Encoding Conference Proceedings 2020 49

we used the automated converter21 available on the Verovio website. This worked extremely well and yielded 
excellent results. The light editing that remained to be done was mostly for aesthetic purposes of display. The 
editing was done in Atom, using the MEI-Tools-Atom22 package, which renders MEI in a separate pane within 
the application.

Once the MEI files were checked and polished, they were uploaded to the GitHub repository. The challenge, 
then, at this stage, was to create page templates that would incorporate Verovio. Although it took many at-
tempts to pull everything together, the results were encouraging, and a prototype was developed that could 
display an engraved version of the score derived from a digitally encoded version of the manuscript, as well 
play the score in a browser using a simple interface: 

https://mss2221.github.io/spademo/sketches/

Implementation
Development challenges proved to be formidable. While finding appropriate tools for coding, display and play-
back of manuscripts was reasonably easy, getting them to work together was exceedingly complex. Documen-
tation, though rich, can be dense; the largest impediment to timely progress is access to consultant who can 
assist in troubleshooting. Without this, the plethora of manuals and tutorials become an obstacle to learning, 
creation, and design. (Think of the myriad articles, tips, and guidelines many of us received back in March of 
this year on how to migrate our courses online—such a wealth of material can be overwhelming). Even with 
access to university assistance, this site took nearly a year to assemble. However, the skills to use Github and 
Jekyll are within reach via ground-up tutorials available from such sites as The Programming Historian.23

Difficulties still remain, specifically, those arising from technical solutions that push the limits of common 
browser capabilities. For example, problems with audio playback, such as web MIDI players clipping notes 
(due to possible buffering or threading issues) have driven developers on some projects to insert an extra 
musical object into their encoded scores.24 We also encountered this clipping problem, and were only able 
to come up with a temporary work around through a laborious trial-and-error process. To ensure the MEI 
would play properly in the browser, a <space> element or “dummy” <note> event with @visible=“false” had to 
be inserted before the first and final notes in order for them to be heard. Such inelegant solutions are highly 
undesirable for an archival representation of a manuscript. Presumably, improvements in how browsers and 
system players handle MIDI will soon make such workarounds unnecessary. In the meantime, these ad hoc 
solutions need to be specially commented in the MEI files.

Extensibility and Future Directions

Sample sites using MEI, Verovio and Ed template

El corrido mexicano  https://mss2221.github.io/corridosEd/
Serbian hymns  https://mss2221.github.io/zagreb/

In order to test the extensibility of this project, we tried it out with texted music in a special Jekyll template for 
minimal literary editions, “Ed.”25 developed by Alex Gil and associates. The resulting sites showed the flexibil-
ity of the Ed theme to handle some of the more complex requirements of Verovio and web MIDI, while still 
remaining a project that could be managed by a single researcher.  They also demonstrate the utility of our 
chosen suite of open source tools for musicologists, music theorists, and music archivists. In the future, we 

21  https://www.verovio.org/musicxml.html
22  https://atom.io/packages/mei-tools-atom
23  https://programminghistorian.org/ We are particularly indebted to Amelia Visconti for her Jekyll tutorial  https://programminghistorian.

org/en/lessons/building-static-sites-with-jekyll-github-pages
24  https://github.com/cuthbertLab/music21/issues/332
25  https://elotroalex.github.io/ed/

https://www.verovio.org/musicxml.html
https://atom.io/packages/mei-tools-atom
https://programminghistorian.org/
https://programminghistorian.org/en/lessons/building-static-sites-with-jekyll-github-pages
https://programminghistorian.org/en/lessons/building-static-sites-with-jekyll-github-pages
https://github.com/cuthbertLab/music21/issues/332
https://elotroalex.github.io/ed/


50

hope to incorporate search tools for specific series of notes and an analytical component that could be used 
to identify the stylistic traits of a corpus.

A note about program evaluation: one aspect of design that is often overlooked in digital musicology proj-
ects is user testing. As noted by David Weigle in his study of the academic use of digitized online resources 
[5]: “the needs and behaviours of musicologists in particular remain relatively underexplored”. This is not just 
an issue in musicology. The statement made by Warwick, et al. [6] in 2008 (cited by Murray and Wiercinski [7]) 
rings true today in 2020: “User testing, like disseminating information, is a skill that most humanities scholars 
have not acquired”. However, as Murray and Wiercinski point out, a strictly user-centered development might 
restrict a project’s ability to make full use of nascent technology. For them, the ideal interface would “provide 
the more familiar and comfortable features that facilitate the types of activities that scholars know,” while af-
fording new opportunities for discovery and experimentation “of which they are currently unaware” [7]. Until 
more research like Weigle’s is conducted on users in music studies, we can only note that all development is an 
iterative process, an attempt to anticipate needs, get feedback, address shortcomings, and get more feedback. 
In the meantime, having robust models that can be easily adapted for use by others is a positive step toward 
increasing access to archival materials. 

Conclusions
While the raw data of much notated music may be ready to be downloaded for analysis, the high-level com-
puting skills required to retrieve and analyze that data means that it remains out of reach to many. In order 
to make collections such as these more accessible, both the resources and the training for encoding, retrieval, 
analysis, and display of encoded music need to be made available to researchers. We would like our prototype 
to be a resource to scholars in music studies—an example of open data and code that will lessen the demand 
for technical expertise for both the researcher and the user, while demonstrating the functionality that can be 
added to a single site accessed through an ordinary web browser. 

As music OCR technology continues to become more successful at first-pass recognition, we will want to be 
prepared to make repositories available to more than just the technologically savvy few. With encoded music, 
the difference between a mode of access that involves scrolling through a list of text files and one that features 
an interactive display of scores and sound, is analogous to the difference between retrieving library materi-
als through an institution with open stacks and one with closed stacks. Refining interests and homing in on 
relevant and interesting material are often the result of seeing a book on a shelf, opening it up and thumbing 
through it—reading a few sentences, checking out the TOC, skipping to the index, looking at the color plates 
in the middle. We don’t always need or want to engage with materials in this manner, but having the option 
to do so is invaluable.

Works cited
[1] Padilla, Thomas. “On a Collections as Data Imperative”. Conference Report. Collections as Data: Stewardship and Use Models to Enhance 

Access, Library of Congress, Washington, DC, September 27, 2016. http://digitalpreservation.gov/meetings/dcs16/tpadilla_OnaCollecti-
onsasDataImperative_final.pdf 

[2] Theimer, Kate. “The Future of Archives Is Participatory: Archives as Platform; or, A New Mission for Archives” presented at the Offene 
Archive 2.1 Conference, Stuttgart, Germany, April 3-4, 2014.

[3] Sayers, Jentery. “Minimal Definitions”. 2016. https://go-dh.github.io/mincomp/thoughts/2016/10/02/minimal-definitions/
[4] Gil, Alex. “The User, the Learner, and the Machines We Make”. 2015. https://go-dh.github.io/mincomp/thoughts/2015/05/21/us-

er-vs-learner/
[5] Weigl, David, et al. “On Providing Semantic Alignment and Unified Access to Music Library Metadata” International Journal on Digital 

Libraries 20, no. 2 (2019), 25-47.
[6] Warwick, Claire, et al. “The Master Builders: LAIRAH Research on Good Practice in the Construction of Digital Humanities Projects” 

Literary & Linguistic Computing 23, no. 1 (2008), 383-96.
[7] Murray, Annie, and Jared Wiercinski. “A Design Methodology for Web-Based Sound Archives” Digital Humanities Quarterly 8, no. 2 

(2014). https://www.digitalhumanities.org/dhqdev/vol/8/2/000173/000173.html

http://digitalpreservation.gov/meetings/dcs16/tpadilla_OnaCollectionsasDataImperative_final.pdf
http://digitalpreservation.gov/meetings/dcs16/tpadilla_OnaCollectionsasDataImperative_final.pdf
https://go-dh.github.io/mincomp/thoughts/2016/10/02/minimal-definitions/
https://go-dh.github.io/mincomp/thoughts/2015/05/21/user-vs-learner/
https://go-dh.github.io/mincomp/thoughts/2015/05/21/user-vs-learner/
https://www.digitalhumanities.org/dhqdev/vol/8/2/000173/000173.html