Bulgarian Dialectology as Living Tradition: A Labor of Love


Bulgarian Dialectology as Living Tradition: A Labor of Love  

Quinn Dombrowski 

Division of Literatures, Cultures, and Languages & Stanford University Libraries, 

Stanford University, Palo Alto, CA, USA; ORCID: 0000-0001-5802-6623 

Ronelle Alexander 

Department of Slavic Languages and Literatures, UC Berkeley, Berkeley, CA, USA 

Vladimir Zhobov 

Department of Slavonic Philology, Sofia University "St. Kliment Ohridski", Sofia, 

Bulgaria 


Bulgarian Dialectology as Living Tradition: A Labor of Love 

Bulgarian Dialectology as Living Tradition (BDLT) has been one of the longest-

running Slavic digital humanities projects in the United States. Initially 

conceived in 2008 as a series of printed volumes, the digital project was built 

upon the foundation of a long-term international collaboration dating to the 

1970’s. As BDLT nears completion in 2019, this paper reflects on the trajectory 

of its development and its sustainability as an unfunded digital humanities 

project, and the ways it can serve as both a model and cautionary tale for others 

who seek to undertake similar work. 

Keywords: digital humanities; digital preservation; content management systems; 

dialectology; Bulgarian language 

1. Introduction 

Bulgarian Dialectology as Living Tradition (BDLT) has been one of the longest-running 

Slavic digital humanities projects in the United States. Initially conceived in 2008 as a 

series of printed volumes, the digital project was built upon the foundation of a long-

term international collaboration between Ronelle Alexander (UC Berkeley) and various 

Bulgarian scholars dating to the mid-1970’s. A serendipitous conversation in 2011 

between Alexander and Quinn Dombrowski, a digital humanist with a background in 

Slavic linguistics, and in library and information science, transformed the focus of the 

project from preparing Word documents for eventual publication, to preparing data for 

entry into a database. Soon after that, Vladimir Zhobov of Sofia University became the 

Bulgarian research director of the new digital project. In 2016, Zhobov and Alexander 

decided to open the previously password-protected website to the public, even as basic 

data entry was still in progress, and in 2019, the project officially launched after many 

years “in beta”. While data entry is not yet finished for all aspects of the project, it is 

rapidly nearing completion.  


At eight years old, BDLT hardly ranks among the most longstanding digital 

projects with a focus on Slavic materials (cf. various national corpora-building efforts; 

manuscript markup and display environments such as http://manuscripts.ru/; the 

database of Russian birchbark letters at http://gramoty.ru; and dialectological databases, 

such as those on Russian dialects found at http://www.parasol.corpus.org/Pushkino and 

http://www.rureg.hs-bochu.de, a site on Bulgarian diaspora dialects at 

http://www.corpusbdr.info, and a comprehensive site on Polish dialects at 

http://www.dialektologia.uw.edu.pl/index.php?11=start, etc.) Nonetheless, the 

institutional and financial circumstances for US-based Slavists undertaking digital 

projects are vastly different than for their colleagues situated in countries where such 

projects can be framed and funded as valuable efforts to bolster the national language in 

a digital environment where English predominates. As more US-based Slavists engage 

with digital tools and methodologies that are transformative in their application to 

Slavic studies, but may not be perceived as sufficiently “innovative” on the technical 

level to successfully compete for national-level grant funding (e.g. from the NEH, 

ACRL, etc.), BDLT can serve as a replicable model for project development that 

depends much more on time than money. 

2. Overview 

Bulgarian Dialectology as Living Tradition (BDLT, http://bulgariandialectology.org/) is 

a searchable database of oral speech representing the full range of Bulgarian dialects. It 

comprises 184 excerpts (henceforth called “texts”), drawn from a large corpus of 

material recorded in Bulgarian villages over the period 1986-2013. BDLT is the digital 

embodiment of a scholarly project with two goals: first, to make both the discipline and 

the material of Bulgarian dialectology available to a broader, international audience; 

and second, to bring the focus of dialectology back to the natural, spontaneous speech 


which constitutes the basic data for dialectological research.  

2.1 Background and source material 

BDLT emerged out of a multi-year collaboration between the American Slavist and 

dialectologist Ronelle Alexander and several Bulgarian dialectologist colleagues. As 

early as 1975, Alexander began to discuss the desirability of joint fieldwork with 

Bulgarian colleagues Todor Bojadžiev and Maksim Mladenov; in 1986, two additional 

Bulgarian colleagues, Georgi Kolev and Vladimir Zhobov, joined this conversation. 

Such work was not possible during the socialist period, but as soon as the government 

changes occurred, various members of this group took short field trips: Alexander and 

Kolev recorded material in the Razlog region (one village) in 1990, and Alexander and 

Mladenov recorded material in the Ihtiman, Panagjurište, and Velingrad regions (five 

villages total) in 1992. These trips were followed by longer expeditions in 1993 and 

1996, which visited many more locations and gathered the bulk of the material 

underlying BDLT. These ventures were supported by the International Research and 

Exchanges Board (IREX), and the field expeditions were directed jointly by Alexander, 

Kolev and Zhobov. When Bulgarian Dialectology as Living Tradition arose as a project 

and publication, the material from these field trips was augmented with similar work 

done by members of the research team, their colleagues, students and associates, in 

order to increase the geographic coverage and obtain a more representative set of 

transcripts from Bulgaria as a whole. 

In order to place primary emphasis on natural spontaneous speech, audio clips 

from the actual field recordings have been made available along with each text, and 

both they and the transcriptions are presented in as “natural” a frame as possible. The 

audio files have undergone very little sound editing; only certain loud and distracting 

noises have been removed. In the transcription, every utterance has been included 


(including those by bystanders when relevant to the conversation) as well as any non-

linguistic sounds when there was even the slightest possibility that they may have 

influenced the flow of conversation, e.g. by distracting the speaker. In addition, 

overlapping speech by several informants has been transcribed. Such transcription of 

“natural speech”, therefore, makes the material available for linguistic analysis at 

several different levels beyond the word itself (the focus of nearly all the maps in dialect 

atlases). Topics which are rarely, if ever, addressed in dialectological research, such as 

word order, functional sentence perspective, conversational analysis, narrative 

structures, and intonation, could now be studied on the basis of this material. 

2.1 “Revitalizing Bulgarian Dialectology” 

One of the conditions of the IREX grant supporting the 1996 field expedition was the 

publication of a volume summarizing results of the expedition. This volume, 

entitled Revitalizing Bulgarian Dialectology1, was published in 2004 under the 

editorship of Alexander and Zhobov, in association with the University of California 

press, as an open-access PDF manuscript available through the California Digital 

Library’s eScholarship platform. The goal of the expedition had been to “revitalize” 

Bulgarian dialectology both in Bulgaria and the West by means of putting Bulgarian 

and American students together in the field and creating situations where they could 

learn not only from their teachers but also from each other. The resulting volume 

included not only articles by the teachers (Alexander, Kolev and Zhobov) but also 

research papers by each participant, student and teacher alike, based on dialect material 

recorded during the expedition. The volume was published in California to underscore 

                                                
1 Ronelle Alexander and Vladimir Zhobov, eds.. Revitalizing Bulgarian Dialectology. 2004. 

University of California Press. http://escholarship.org/uc/item/9hc6x8hp   


the importance of making Bulgarian dialectal data available at the international level, 

and it was published electronically and open-access to maximize availability, especially 

in Eastern Europe.  

2.2 BDLT as audio-based chrestomathy 

Although Revitalizing Bulgarian Dialectology had made public some outcomes of the 

most recent expedition, the ultimate goal of the research team was to devise a way to 

make available the actual field material gathered on this and previous expeditions, and 

to do so in such a way as to make this material more accessible to outsiders. Realizing 

that it would not be possible to transcribe the entire amount of recorded material (over 

200 hours), they decided to choose representative excerpts and create an audio-based 

chrestomathy; in order to make the chrestomathy more fully representative of the broad 

scope of variation throughout Bulgaria, they also decided to include material from 

previous trips undertaken by Zhobov and Kolev prior to their collaboration with 

Alexander. The plan was not only to transcribe each excerpt but also to provide it with 

interlinear glosses and an English translation; each excerpt would also be accompanied 

by a streaming audio file, a clip from the actual field recordings. The goal of the 

resulting publication was to make actual field data maximally available (including in 

audio form) at the international level. Furthermore, since the excerpts were chosen not 

only for linguistic value but also for content, the volume would give a representative 

picture of both linguistic variation and traditional cultural phenomena throughout 

Bulgaria. 

3. BDLT as digital humanities project 

At AATSEEL 2011, Alexander discussed the audio-based chrestomathy with Quinn 

Dombrowski. After receiving an MA in Slavic linguistics as well as an MLIS, 


Dombrowski had found employment as IT staff in the Academic Technologies unit of 

the University of Chicago’s central IT organization. Dombrowski had experience with 

developing digital humanities projects across a number of fields, and at the time was on 

the program staff of Project Bamboo, a Mellon-funded digital humanities 

cyberinfrastructure initiative. Having previously attempted an XML markup project2 to 

capture dialectal variation in subsets of the data published in the Bŭlgarski dialekten 

atlas3, Dombrowski had a personal interest in working with a different kind of 

Bulgarian dialectology material, and making it as accessible and reusable as possible. 

Alexander shared early drafts of BDLT with Dombrowski in the form of 

Microsoft Word documents, where each line of the text was transcribed and translated 

into English, and each token was annotated with linguistic metadata. Dombrowski noted 

that the high degree of structure in these Word files was more reminiscent of a database 

than a traditional scholarly monograph. Moreover, the process required to generate the 

Word files involved significant duplication of work, as each token would need to be 

glossed and annotated anew every time it occurred. Not only was this inefficient, it also 

increased the risk of inconsistencies. Dombrowski felt that the rigidity of a print-

oriented PDF end product would also limit its audience. The transcripts touch on a wide 

range of topics, from folklore and traditions, to agricultural practices, to personal stories 

                                                
2 Andrew Dombrowski and Quinn Dombrowski. “An XML-Based Approach to Dialectological 

Data: The Development of Syllabic Liquids in Bulgarian.” Presented at the 17th Balkan 

and South Slavic Conference at The University of Ohio. 2010. 

http://quinndombrowski.com/blog/2010/04/13/bulgarian-dialect-atlas-at-the-17th-balkan-

and-south-slavic-conference 

 
3 Stojko Stojkov et al., ed. 1964-1975. Insitut za bǎlgarski ezik. Bǎlgarski dialekten atlas I-IV. 

Sofia: Izdatelstvo na Bălgarska akademia na naukite, 1964-1981.  

 
depicting daily life in rural Bulgaria. These narratives could be valuable in a wide 

variety of contexts, within and beyond the academy, but the formatting of the Word 

documents -- where the narrative was visually interrupted every few words by a block 

of linguistically-oriented data -- significantly impeded the narrative’s readability and 

accessibility. This could be remedied by the production of another set of Word 

documents that presented the narratives as continuous text, but there, too, choices would 

have to be made about whether to include the original Bulgarian (and how: inline, in 

parallel columns, or separately), and whether to include a transliteration along with the 

Cyrillic (again, and how)?  

Converting the project’s structure to a database would eliminate these issues. 

Tokens could be entered, glossed, and annotated once, and these token entries would 

then be referenced in each text where they appear. Rather than committing to a single 

display format, database queries could enable any number of displays, in order to 

accommodate various audiences’ needs and interests. A linguistics-oriented view could 

display all the tokens and their metadata (much like the original Word files); multiple 

narrative-oriented views could display the text without interruption, and in any 

combination of writing systems. A database would allow users to not only view the 

linguistic metadata on tokens but also to use it as a means of querying the transcripts: 

e.g. pulling up all lines that include a lexeme of interest, or all lines that include a 

particular verb form. A database would also facilitate augmenting the transcripts with 

additional metadata to support discoverability and analysis -- for instance, individual 

lines could be tagged with thematic content, and tokens could be grouped into phrases 

that show noteworthy linguistic features. In short, moving from a print-oriented 

workflow to a database would vastly increase the research potential of the corpus, in 


addition to making the content more accessible to the broadest possible international 

audience. 

Dombrowski offered to create a prototype of BDLT as an online database. She 

had previously built web-based digital humanities projects using the open-source 

content management system Drupal, and saw it as being well suited to this project as 

well. At the time, Drupal had a large, international developer community creating and 

maintaining modules (pieces of add-on functionality for the core Drupal platform) that 

could fulfill the project’s technical requirements of storing and querying structured 

metadata, storing and presenting audio files, and importing and exporting text. This 

would allow Dombrowski to quickly develop a complex web application that would be 

highly customized to the specific data model of BDLT, without writing any code. (See 

“Drupal and Other Content Management Systems”4 for further discussion of Drupal and 

other content management systems.) Drupal 7 was released shortly before Dombrowski 

began to develop the pilot version of BDLT, and the Drupal development philosophy 

supports API-breaking changes between major versions5. As a consequence, there is 

always a delay between the release of a new major version of Drupal core, and the point 

when it becomes usable for complex projects, as module developers need time to 

refactor their code if they intend to continue supporting their modules. For that reason, 

Dombrowski chose to build BDLT in Drupal 6 -- a decision that had long-term 

consequences, even as it was unavoidable at the time. 

                                                
4 Quinn Dombrowski. “Drupal and Other Content Management Systems” in Doing Digital 

Humanities: Practice, Training and Research, ed. C. Crompton, R.J. Lane, and R. 

Siemens. 2016. Routledege. 
5 Dries Buytaert. “Backwards Compatibility”. May 17, 2006. https://dri.es/backward-

compatibility 


It took approximately 12 hours of work to develop the initial prototype of 

BDLT, which included an interface for entering and editing texts and annotating tokens, 

a text display equivalent to the original Word files, a map display of locations, as well 

as data structures for linking tokens to lexemes, annotating thematic content, and 

browsing all tokens, organized by lexeme. For the sake of expediency, Dombrowski 

manually entered the data for a few example texts, but anticipated that the existing 

Word files could be imported into the system without much difficulty. In February 

2011, Dombrowski demoed the prototype for Alexander. After conferring with Zhobov, 

Alexander decided to move ahead with implementing BDLT as a database, with 

Dombrowski acting as the project’s technical staff. 

4. Technical implementation 

The pilot version of the site that Dombrowski developed in 2011 remains largely 

unchanged to this day, though a few additional displays and features have accrued over 

the course of the site’s development. “Digital Humanities Development without 

Developers: Bulgarian Dialectology as Living Tradition”6 provides a detailed 

description of the technical underpinnings and data model for BDLT as of 2014, and is 

inclusive of all of the site’s major features, with the exception of the more recent 

“phrases” content types and displays.  

4.1 Structural overview 

In brief, there are seven content types (Location, Contents, Text, Token, Line, Lexeme, 

Phrase) and five search functions (Wordform, Lexeme, Linguistic Trait, Thematic 
                                                

6 Ronelle Alexander and Quinn Dombrowski. "Digital Humanities Development without 

Developers: Bulgarian Dialectology as Living Tradition". 2014. Proceedings of DH-CASE 

II (DocEng workshop). doi: 10.1145/2657480.2657481. 


Content, Phrase). The results from any search query can be exported as a CSV or 

Microsoft Word file, and the site provides a map display as one output from linguistic 

trait, wordform, or phrase search queries. 

• Locations. Each village visited is located on a map on the home page and is 

represented by a page of its own, accessible either by a link from a list on the 

home page or a tab on the map. Each Location page gives basic metadata about 

the village (administrative region, dialect group, date visited), and provides a 

lengthy prose description of the relevant dialect subgroup. Salient traits of the 

group are illustrated by examples taken from the site itself. Links to the text(s) 

representing this village are also available on this page. 

• Contents. This page displays basic information about each of the 184 excerpts, 

or texts, which the site contains: text name, dialect group, duration of audio file, 

number of lines of text, number of tokens of informant speech, and a brief 

synopsis of thematic content. Data entry status for content not yet completely 

entered is also noted. Data can be sorted on all columns except the audio length, 

and texts can be accessed from the text link on this page. 

• Texts. Each text has its own page: it contains a sidebar with a small map locating 

the village, a photo image of the village, and metadata (date of recording, word 

count, physical context of recording, name(s) of investigator(s), and synopsis of 

thematic content). Each text is broken into lines for ease of data retrieval; each 

line is numbered, coded to identify the speaker, and provided with a timecode to 

facilitate location of the transcribed portion within the accompanying audio file. 

Each text is presented in three different views: Glossed view gives a translated 

text with interlinear tags, comprising grammatical and lexical tags placed 

underneath each token; Line view gives simply the transcribed text with English 


translation; and Cyrillic line view simply gives the text in Cyrillic transcription 

(it is assumed that Bulgarian users need neither translation nor interlinear 

glosses). The audio link is available in all three views, and it follows the text as 

the user scrolls down the page. 

• Tokens. Each token has its own page, which lists all the tags assigned to that 

token, and all the lines throughout the database where that token occurs, with 

each line identified by text name and line number. 

• Lines. Each line has its own page, which lists all its tokens, any thematic content 

tags assigned to that line, and any identified phrases associated with that line. 

• Lexemes. Each lexeme has its own page, with links to all the tokens associated 

with that lexeme. Note: a “lexeme” is the lemma in standard Bulgarian 

associated with the dialectal token; if no such lemma exists, then one is created 

and tagged as a “dialectal lexemes”. Lexemes are also tagged for etymological 

and other information. 

• Phrases. A unique feature of this site design is the ability to isolate 

grammatically significant groups of words, or phrases. Each phrase has its own 

page, listing all the tags assigned to it, as well as the line of its occurrence and 

any other lines in which it occurs. 

• Wordform Search. This search page allows users to select any combination of 

grammatical/pragmatic tags and/or the English translation and/or the Bulgarian 

lexeme, and see all the lines on the site which display the tokens so identified; 

the geographical distribution of the selected tokens is displayed on a map. Each 

selected token is displayed within the line of its occurrence; users may then 

follow a link to the text with the token in question to see the larger context and 

hear the audio. 


• Lexeme Search. This search page allows the user to see all the phonetic 

representations throughout the site of any one lexical item. Users can also isolate 

words with particular prefixes or suffixes (using “Begins with…”, “Ends 

with…” buttons). Users can also search for lexemes within categories of special 

interest (such as dialectal lexeme, loanword source), and for instances of lexical 

variation (the occurrence of more than one dialectal term for a particular item or 

action). 

• Linguistic Trait Search. This page, by allowing the user to search for any one of 

a very large number of linguistically significant traits, enabled the linguistic 

tagging of tokens at a much more complex level than that marked by the 

interlinear tags which form the basis of the Wordform search. Here, the user 

makes hierarchically embedded choices to isolate the trait in question; this 

allows very complex searches at both the synchronic and diachronic level. Each 

selected token is listed in the context of its line, and the geographical 

distribution of selections is displayed on a map. 

• Thematic Content Search. This search page allows users to find chunks of text 

(identified by text and line number) where the recorded conversation concerns a 

particular topic. The search page allows one to locate the desired topic either 

through a thematically ordered ethnographic list with many subdivisions in each 

category, or by an alphabetical list of every single tag regardless of its place in 

the hierarchical listing. 

• Phrase Search. This search page allows the user to find instances of 

grammatically significant groups of words at a number of levels. This is 

particularly useful for scholars of Bulgarian and Balkan linguistics, since many 

of the traits characterizing the Balkan Sprachbund must be defined in phrasal 


terms. Because there was no way to mark these traits at the token level, this 

additional content type was devised specifically for this site. As in other 

searches, results give the context of the full line, and display the geographical 

distribution on a map. 

4.2 Hosting 

Hosting is a perennial challenge for web-based digital humanities projects. Digital 

humanities thought leader Miriam Posner has characterized obtaining server space as 

“the most hilariously awful problem in doing DH at a university, and almost nobody has 

got this figured out. I know people who are secretly running servers under their desks, 

buying their own server space, or running projects off Google Drive.”7 Universities 

continue to struggle with questions of what campus organization, if any, should be 

responsible for providing web hosting for digital projects. Many central IT 

organizations, including those at UC Berkeley, follow a model of offering inflexible, 

standardized services in order to reduce support costs when those services are made 

available to the campus as a whole. As a result, they are a poor fit for digital scholarly 

projects, which are unlikely to resemble standard templates for departmental websites, 

faculty profiles, etc. At some institutions, the library has stepped in to fill the need for 

hosting for scholarly projects,8 but when web hosting is seen as an indefinite 

commitment, the ongoing costs of server hardware and -- more significantly -- the staff 

                                                
7 Miriam Posner. “Here and There: Creating DH Community”. September 18, 2014. 

http://miriamposner.com/blog/here-and-there-creating-dh-community/ 

 
8 Jennifer Vinopal & Monica McCormick. “Supporting Digital Scholarship in Research 

Libraries: Scalability and Sustainability”. Journal of Library Administration: Vol. 53, 

2013, Issue 1. p. 27-42. https://doi.org/10.1080/01930826.2013.756689 


time necessary for patching and updating software, can become a drain on library 

resources. As a result, some organizations that take supporting digital scholarship as 

their mandate have retreated to a position of offering advice on commercial hosting 

options, with the costs (financial and technical upkeep) to be borne by the scholar9. 

Over the course of its development, BDLT has navigated three of the most common 

hosting scenarios for digital projects. 

Dombrowski built BDLT using a general-purpose shared web hosting account 

already purchased for use with multiple different projects. Within four years, the site 

needed to be migrated to a different environment after the hosting service threatened to 

shut it down due to an excess number of tables in the MySQL database. The large 

number of tables that Drupal generates as part of creating its content types (data 

structures) is a frequent criticism of the system10, and it became a technical barrier for 

hosting the site using low-cost, general-purpose hosting. By 2015, when the site was 

threatened with eviction from its hosting environment, Dombrowski had moved to the 

Research IT organization at UC Berkeley (Alexander’s institution), and was overseeing 

the hosting services offered by that campus’s digital humanities program. Dombrowski 

initially arranged for BDLT to move to the Drupal-specific commercial hosting service 

that had partnered with UC Berkeley’s IT organization to provide hosting for Drupal 

sites. This move was ultimately short-lived: recognizing that hosting would disappear 

when the digital humanities program’s funding ran out, Dombrowski and Alexander 
                                                

9 Sarah Kalikman Lippincott. “Digital Scholarship at Harvard: Current Practices, Opportunities, 

and Ways Forward”. June 27, 2017. 

https://projects.iq.harvard.edu/files/dsi/files/harvard_ds_final-report_20170627_v2.pdf 
10 “Drupal Schema – Why this methodology?” January 21, 2011. Drupal forums. 

https://www.drupal.org/forum/general/general-discussion/2011-01-21/drupal-schema-why-

this-methodology 


took advantage of the Berkeley Language Center’s offer to move the site to their server, 

with system-level support from that unit’s sysadmin. Under this model, hosting for the 

site would be guaranteed for at least as long as Alexander was an active or emerita 

faculty member at UC Berkeley. 

4.3 Technical staffing 

Technical development of digital humanities projects can quickly become costly, even 

when reusing existing code as part of a configurable open source content management 

system such as Drupal. Professional technical expertise commands a premium. For that 

reason, self-funded projects such as BDLT particularly benefit from having a core team 

of personally committed collaborators that includes at least one individual with the 

technical expertise to implement the project. While restricting the technical scope to 

what a core member of the team can personally accomplish may limit the project’s 

scholarly ambitions, the alternative involves waiting for a significant influx of funding 

that may not be feasible, particularly if the project involves applying established 

methodologies in a new domain. Some projects attempt to overcome this hurdle by 

hiring professional technical staff to work on the project piecemeal as smaller amounts 

of funding (e.g. university-internal microgrants, etc.) become available, but this 

approach becomes more expensive overall as the project pays for the start-up costs of 

professional developers re-familiarizing themselves with the project at the beginning of 

each new phase of work. It also risks the project being left half-completed if the scholar 

is unable to secure further grants. 

Dombrowski has served as the primary technical developer on this site from its 

inception to the present day. Like Alexander, Dombrowski has never been paid for work 

on the project, instead contributing out of personal interest and commitment. However, 

particularly because BDLT represents volunteer work, Dombrowski’s availability to 


direct time to the project has fluctuated, and changes in institution, job, job scope, and 

life circumstances (including the birth of three children over four years) have all had an 

impact. Alexander has used her research funds to pay graduate students with technical 

knowledge of Drupal (including some trained by Dombrowski) to implement site 

configuration changes during times when Dombrowski has been unavailable. However, 

those graduate students themselves have taken on this work as one among many 

conflicting priorities, including finishing their dissertations, leading to periods where 

they have fallen incommunicado for weeks or months at a time. In August 2017, 

Alexander took a weeklong workshop on Drupal offered by Dombrowski at the Digital 

Humanities at Berkeley Summer Institute, with the goal of developing sufficient 

technical proficiency to serve as her own technical backstop for the project, and reduce 

the turnaround time needed to make minor configuration changes on the website. 

4.4 Migrations and code changes 

One disadvantage of building a project using a content management system is that it 

closely ties the project’s lifecycle to the support lifecycle for that version of the content 

management system. A major version upgrade is a non-trivial undertaking on any such 

platform, but Drupal’s API-breaking design philosophy further exacerbates these 

challenges. Building BDLT in early 2011 necessitated the use of Drupal 6, but this 

choice guaranteed that the site would have to be migrated to a new version of Drupal 

within the medium term, when the Drupal open source project stopped providing 

security updates for that version.  

In Alexander and Dombrowski 2014, the authors anticipated a migration directly 

from Drupal 6 to Drupal 8, with the expectation that version 8 -- which had not yet been 

given a release date -- would provide more robust technical underpinnings for the site in 

the long term. Instead, the Drupal project’s decision to jettison much of Drupal’s own 


architecture and replace it with the enterprise PHP framework Symfony11 had the effect 

of alienating many smaller-scale developers, including those who typically work on 

digital humanities projects. The resulting lag in module availability has been 

tremendous, and many modules with significant adoption in digital projects across a 

wide variety of disciplines (e.g. Biblio, which provides a data structure for importing, 

exporting, storing, and displaying bibliographic references) have still seen no significant 

movement towards a Drupal 8 port as of 201912. 

By summer 2015 the release of Drupal 8 was imminent, and Drupal 6 would 

only be given a three-month grace period after its release before security updates were 

no longer provided. Discussions in the developer forums suggested that many general-

purpose modules would not be available concurrently with Drupal 8’s release, to say 

nothing of scholarly-oriented modules. In light of this, Dombrowski advised Cammeron 

Girvin, a graduate student working with Alexander, on a site upgrade to Drupal 7. 

While Girvin had served as a project manager for BDLT for some years, the upgrade 

was his first experience interfacing directly with the technical underpinnings of the site 

(i.e. the filesystem and MySQL database). The upgrade was difficult, requiring multiple 

attempts and a downtime spanning the entire summer before the site was again available 

online; furthermore, it took nearly an additional six months to resolve all the bugs 

related to the upgrade.  

By 2015, Drupal 7 had seen significant uptake among digital humanists, and all 

of the widely used Drupal 6 modules were available for Drupal 7 at that point, or 

                                                
11 Dries Buytaert. “Why the big architectural changes in Drupal 8?” September 9, 2013. 

https://dri.es/why-the-big-architectural-changes-in-drupal-8 
12 Bibliography module – Issues – Drupal 8 port. March 20, 2015. Drupal module issue queue. 

https://www.drupal.org/project/biblio/issues/2456591 


replaced with improved alternatives. Unfortunately, early in the development of BDLT, 

Dombrowski had selected a niche module called Editview as the primary interface for 

data entry. That module had been abandoned by its developer after Drupal 6, despite 

user requests for a Drupal 7 version starting in 201013. Rather than completely 

reconceptualizing data entry for the site, Alexander contracted with Agile Humanities 

Agency (http://agilehumanities.ca/), a digital humanities-oriented development firm 

created by former English professor Dean Irvine, to write a Drupal 7 version of 

Editview. This piece of technical development work represented a significant financial 

investment for BDLT, but it also has served as a locus of broader impact for the project 

within the digital humanities community. The Drupal 7 Editview module has 

subsequently been adopted by other projects with similar tabular data entry needs, 

including the George Washington Financial Papers Project 

(http://financial.gwpapers.org/). 

5. Data entry and labor 

For BDLT and similar projects, the amount of time dedicated to developing the 

technical infrastructure is dwarfed by the enormity of the task of data entry. 

Dombrowski’s expectation that data could be parsed from the 2011 Word files and 

imported into Drupal to seed the database was quickly shown to be overly optimistic. 

Despite consulting with developer colleagues at the University of Chicago who offered 

elaborate examples of using regular expression syntax to capture some of the words and 

linguistic annotation, it was ultimately too error-prone to use and all the lines, tokens, 

and annotations had to be manually entered into Drupal.  

                                                
13 Editview module – Issues – D7 port of Editview. April 7, 2010. Drupal module issue queue. 

https://www.drupal.org/project/editview/issues/764882 


In some respects, entering data into Drupal was not dissimilar from work 

Alexander and Zhobov already anticipated undertaking in Microsoft Word as part of 

their audio chrestomathy. It may have been easier, as there was no need to fuss over 

table spacing and formatting in Word for the linguistic annotations. The work was, 

nonetheless, slow, and became slower as the database grew. The growing number of 

annotated tokens led to an increasing lag in the site’s autocomplete functionality, which 

was necessary to ensure that new texts were able to reference existing tokens, rather 

than creating new database entries. The possibility of additional metadata beyond what 

the Word documents would have supported represented another data entry task, and the 

ways that the database throttled the speed of data entry (e.g. through waiting for the 

autocomplete) represented a significant increase in the overall time needed to put the 

material in its final format.  

The challenge of data entry at scale is endemic to digital humanities projects. 

The need for large-scale, low-cost data entry has led projects to adopt practices vis-à-vis 

undergrad labor that have drawn critique from others in the field14. A survey described 

in “Student Labour and Training in Digital Humanities”15 shows that the vast majority 

of digital humanities projects are funded by federal and/or institution-internal grants, 

which are necessary to offset the many costs of developing these projects, not least 

among them the cost of paying student workers. Only three of the 40 projects surveyed 

indicated that they received no funding.  
                                                

14 Spencer Keralis. “Disrupting Labor in Digital Humanities; or, The Classroom Is Not Your 

Crowd”. In Disrupting the Digital Humanities. Dorothy Kim and Jesse Stommel, eds. 

2018, Punctum Books. 
15 Katrina Anderson, Lindsey Bannister, Janey Dodd, Deanna Fong, Michelle Levy, and 

Lindsey Seatter. “Student Labour and Training in Digital Humaniteis”. Digital Humanities 

Quarterly, 2016, vol. 10, no. 1. 

http://www.digitalhumanities.org/dhq/vol/10/1/000233/000233.html 


The first point of the Student Collaborators’ Bill of Rights16 states that “As a 

general principle, a student must be paid for his or her time if he or she is not 

empowered to make critical decisions about the intellectual design of a project or a 

portion of a project (and credited accordingly). Students should not perform mechanical 

labor, such as data-entry or scanning, without pay.” For BDLT, the lack of project 

funding combined with the City of Berkeley’s steep increases in minimum wage over 

the course of the project ($15/hour as of 2018, up from $11/hour in 2015) made hiring 

undergraduates for data entry unfeasible. Instead, Alexander has worked with cohorts of 

students through a longstanding UC Berkeley program, URAP, which connects 

undergraduates with faculty members doing research. While it may not be the ideal 

solution, Alexander has devoted significant thought and energy towards collaborating 

with those students in ways that align with the Collaborators’ Bill of Rights.  

5.1 URAP program 

Since 1991, UC Berkeley has offered the Undergraduate Research Apprenticeship 

Program (URAP) as an institutional framework “to assist faculty in reconciling their 

commitments to research with their responsibilities for undergraduate education. By 

promoting faculty-student research collaboration, URAP works to invigorate 

undergraduate education and to contribute to the sense of intellectual community on 

campus.”17 Faculty who wish to participate in URAP submit a project description to an 

online portal, and students can submit a statement of interest to up to three different 

                                                
16 Haley Di Pressi, Stephanie Gorman, Miriam Posner, Raphael Sasayama, and Tori Schmitt. “A 

Student Collaborator’s Bill of Rights”. June 8, 2015. UCLA HumTech. 

https://humtech.ucla.edu/news/a-student-collaborators-bill-of-rights/ 
17 “What is the Undergraduate Research Apprenticeship Program?” URAP website. 

http://urap.berkeley.edu/program-intro 


projects. Faculty members interview the students and can select any number of them to 

collaborate on the project. 

The BDLT project was ideally suited for this framework. The project description 

Alexander submitted to the portal outlined the nature of the project, stressing both its 

linguistic and ethnographic aspects, stated that knowledge of basic linguistic structure 

was highly desirable but not required, and that knowledge of Bulgarian, or indeed any 

Slavic language, was not necessary. During interviews with interested students, 

Alexander gave students an overview of the site and explained how data entry was 

done. Both the instructor, and the students who decided to choose this project, then 

completed a “learning contract”: the instructor committed to providing a research 

experience for the students and the students committed to a minimum number of work 

hours per week throughout the semester, for which they could receive course credit (one 

credit per three hours of work a week, up to four credits per semester).  

The project was first listed with URAP in January 2013, and has been listed 

every semester since then (except for fall 2017 when Alexander was doing research 

abroad the entire semester). The largest number to join the project in any one semester 

was eight, and the smallest was two. Because of the enormous amount of data to be 

entered, there was never a lack of work for students. Students did data entry on their 

own time, keeping track of their work hours, and then participated in regular group 

meetings for discussion of research goals. 

The Student Collaborators’ Bill of Rights states that “Course credit is generally 

not sufficient ‘payment’ for students’ time, since courses are designed to provide 

students with learning experiences.” URAP is one of a few programs at UC Berkeley 

that provides course credit for non-traditional work; another, DECal, grants course 

credit for student-run courses on topics ranging from “Decode Silicon Valley Startup 


Success” and “Sign Language in Healthcare” to “Cal Pokémon Academy” and a master 

course in the board game “Settlers of Catan”18. Providing course credit specifically in 

exchange for work on faculty research makes the nature of the exchange clear to the 

student upfront, in contrast to traditional departmental courses that incorporate student 

labor as a class assignment. Data entry, for all its tedium, is a very authentic research 

experience, and one that the project directors also engaged with as part of data 

preparation. In order to make the project available to as many students as possible, 

regardless of their knowledge of Bulgarian, the project directors provided the data to the 

students in plain text files with all the tags for coding -- not unlike the original Word 

files of the audio chrestomathy.  

In order to make the project as meaningful as possible, Alexander met twice 

monthly with student apprentices as a group. In addition to discussing any problems 

with data entry, these meetings were an opportunity for students to learn more about the 

history and development of the project, and its importance for Bulgarian dialectology as 

a research field. Since one of the goals of the very minimal BDLT research budget has 

been to bring the Bulgarian project director (Zhobov) to Berkeley once a year, some of 

the student apprentices have been able to meet with him as well and to learn first-hand 

about the Bulgarian aspects of the collaborative project. This aligns with the Student 

Collaborators’ Bill of Rights principle that “At a minimum, internships for course credit 

should be offered as learning experiences, with a high level of mentorship.” 

Students have also been able to participate in project development: their input 

has been sought on certain aspects of project design, and there was more than one 

occasion when a student volunteered an idea that led to a particular breakthrough. 

Whether such contributions amount to students’ being “empowered to make critical 
                                                

18 Decal Courses. Spring 2019. https://decal.berkeley.edu/courses 


decisions about the intellectual design of a project or a portion of a project” is arguable, 

but the project directors’ willingness and enthusiasm for reworking the site in response 

to solicited and unsolicited student input has given these students more agency in the 

project than simply doing data entry. 

5.3 Impact on students 

To date, thirty-one undergraduate students have worked on the project through URAP. 

Their importance to the project is inestimable, a fact of which they are reminded at the 

celebratory dinner at the end of each semester. In more lasting terms, their contributions 

are acknowledged on the site’s Project Team page 

(http://bulgariandialectology.org/project-team), which gives a small list of the names of 

“active apprentices” and an ever-growing list of the names of “alumni apprentices”. 

This is in alignment with the Student Collaborators’ Bill of Rights point #4, which 

states that “If students have made substantive (i.e., non-mechanical) contributions to the 

project, their names should appear on the project as collaborators”. 

Many of the apprentices keep in touch after graduation. Two have gone on to 

graduate work in linguistics, listing participation in this project as a major deciding 

factor in their career choices. Of the graduate students who have worked on the project, 

two were specialists in Bulgarian, and project directors created shorter field expeditions 

in Bulgaria with them in mind, so that each was able to get first-hand experience of the 

process through which field data are acquired. In addition, the graduate student who was 

most involved in project design was able to cite his work on this project as an important 

qualification for his current alternative-academic career path.  

The students who have been least satisfied working on the project are those in 

their final year as computer science majors. It is understandable that they wish to be 

doing cutting-edge technical work, rather than staying at the level of using (and not 


even modifying the code for) a PHP/MySQL content management system. Their dismay 

and frustration, while somewhat disorienting for other students, has been instructive as a 

concrete illustration of the ways in which those in the humanities do research with very 

limited financial resources. 

5.4 International collaboration 

The BDLT project has been international from the outset, since it grew out of 

collaborative work between one American scholar and a group of Bulgarian scholars, 

and has been maintained over the last decade through collaboration between the two 

project directors, one American (Alexander) and one Bulgarian (Zhobov). The two are 

in constant electronic contact, consulting over issues of data preparation, data entry, and 

site design issues (especially with the most recent design additions, concerning 

“phrases”). They visit each other’s universities frequently for purposes of on-site 

collaboration; Zhobov’s visits to Berkeley are especially useful for student apprentices 

to learn more about international aspects of the project. They have also presented joint 

research papers about the project at various venues in Europe and Russia. 

Most data entry takes place in Berkeley, because of faster computer speed and 

more modern equipment. Some types of data entry need Zhobov’s specialized 

knowledge, however, and must be done in Bulgaria, despite the fact that it takes place 

more slowly as a result of network delays, and computers with less memory and older 

browsers that often perform poorly with the site’s AJAX-based data entry interface. 

6. Research outcomes 

Scholars worldwide have become aware of the rich data resource which BDLT 

provides, and several research projects are currently utilizing data from the BDLT site. 

In particular, both Zhobov and Alexander have recently produced major research papers 


drawing on material in the BDLT site.19 Although Zhobov’s work on dialectal vocalism 

could have been prepared directly from the original field tapes, the choice to focus his 

analyses on texts from the site, and to cite only examples which could then be consulted 

directly on the site, increase the value of his work to other researchers. Alexander’s 

work, by contrast, on accentuation in certain word groups, derives directly from the data 

organization in the “phrases” section of the site. Indeed, it is anticipated that this part of 

the site will be especially valuable to Balkan linguists once the full set of data is 

entered: they will be able to access dialect data about word order sequences, pronoun 

reduplication, instances of “evidential” usage, and similar topics. Before the availability 

of the BDLT site, scholars could only collate data on these topics by laboriously 

combing through whatever dialectal “texts” had been included as supplementary 

material to published dialect descriptions: now they will be able to easily search for 

such material due to the site’s search interface. 

From a certain angle, it is difficult to argue that the project itself is research. 

While the investigation of any one research question would necessitate a focused subset 

of the data preparation involved in creating BDLT, those research questions require 

scholars to limit the scope of their curation and annotation, and move on to analysis and 

write-up. In the name of developing a resource of value beyond any one inquiry, or even 

any one person’s research agenda, the project directors have spent the majority of the 

last eight years focusing on data curation and annotation, which has unavoidably come 

at a cost with regard to their scholarly output vis-à-vis the kind of dialectological work 
                                                

19 Zhobov, Vladimir. New Approaches to Bulgarian Dialectal Vocalism; Alexander, Ronelle. 

Bulgarian Dialectal Accent, A New Approach. Both articles are slated to be published in the 

forthcoming monograph: Alexander & Zhobov, Bulgarian Dialects, Living Speech in the 

Digital Age. 


that is the focus of their pre-2011 scholarship -- work that can now resume fully as 

BDLT nears completion (nevertheless, the completion of the two major research 

ventures noted above proceeded alongside work on BDLT). For a scholars who are 

principally interested in disciplinary research, but who are drawn to the promise of what 

a resource equivalent to BDLT in their field could provide, the reality is that they will 

get substantially more research done if they engage in the traditional research practices 

of focusing their data collection and curation to those materials that contribute directly 

to a specific inquiry. Building a site such as BDLT is an act of hope, and of generosity 

for the future scholars who will receive all of the benefit without sacrificing years of 

their professional lives to data preparation. It is not an undertaking for early-career 

scholars who can ill afford the impact on their publication rate. At the same time, late-

career scholars who are giving thought to the nature and impact of their legacy may find 

that a project like BDLT, which generates richly annotated data that can jumpstart 

research for subsequent generations of scholars, is a meaningful gift to the future that 

reaches far beyond any new monograph. 

7. Sustainability 

For the project’s potential to be realized, the materials need to remain available. While 

the primary value is in the texts and linguistic annotations, the interface has been 

specifically designed to facilitate access to the data, reducing the effort necessary to 

extract the subsets of the corpus relevant for particular research questions. The recent 

change in licensing terms for the Google Map API, which broke the map-based 

navigation that the site had used since its inception and necessitated a complete rebuild 

of the site’s geospatial functionality, was a stark reminder of how BDLT is vulnerable 

to decisions made by large corporations whose interests and priorities diverge from 

those of the project team. Collaborating with students has added an ethical dimension to 


the question of sustainability; per the Student Collaborators’ Bill of Rights: “Senior 

scholars should recognize that projects on which students have collaborated represent 

important components of students’ scholarly portfolios. Senior scholars should thus 

make every reasonable effort to either sustain a “live” project or, failing this, either 

transfer its ownership to student collaborators or distribute to students an archived 

version or snapshot of the project.” With the data entry phase of BDLT winding down, 

and the full picture of the contents of the project becoming clear, the project team is 

taking a multi-pronged approach to sustainability. 

7.1 Website 

The Drupal project has announced end-of-life for Drupal 7 in November 202120; 

ironically, this will also be the end-of-life for Drupal 8, aligning with the end-of-life for 

the version of the Symfony PHP framework that replaced Drupal’s previous technical 

underpinnings. Moving directly from Drupal 6 to Drupal 8, as BDLT initially planned, 

would not have bought the project any additional time between migrations. 

The digital humanities community’s response to Drupal 8 has not been 

enthusiastic. In addition to the increased difficulty for coders whose skill set does not 

align with “enterprise PHP development” to build Drupal modules and themes, the 

server requirements for Drupal 8 to perform adequately outstrip what is typically 

available in shared low-cost commercial hosting environments. As a result, many digital 

humanities projects built in Drupal 7 will face a decision point in the next few years, 

and Drupal 9 is not an obvious choice. One compelling alternative that has emerged in 

the wake of Drupal 8 is Backdrop CMS (https://backdropcms.org/), a fork of Drupal 7 

aimed at nonprofits and small businesses that prioritizes stability and a positive user 
                                                

20 Dries Buytaert. “Drupal 7, 8 and 9”. September 12, 2018. https://dri.es/drupal-7-8-and-9 


experience for non-programmers who build such sites, over technical advancements in 

the core APIs. The skill set of Backdrop’s target audience, and the project’s overall 

priorities, align well with the needs of digital humanities projects. Backdrop has 

incorporated many Drupal modules that previously had to be maintained and updated 

separately into its core code, and Backdrop core includes an option to enable automatic 

updates in order to reduce ongoing support costs and minimize the risk of the site being 

hacked as a result of delayed installation of security updates. While additional work is 

needed on Backdrop ports of some of BDLT’s modules, Dombrowski has been working 

with the Backdrop developer community to ensure this functionality is in place in time 

to migrate BDLT to Backdrop before Drupal 7’s end-of-life. 

The Berkeley Language Center sees it within their purview to provide access to 

BDLT indefinitely via through their server and sysadmin, but the scope and priorities of 

organizations such as a language center are subject to change, particularly in a context 

of public disinvestment in higher education. While it is not a substitute for the full 

functionality of a live database that can support any combination of queries, 

Dombrowski and Alexander are working with digital preservation specialists at UC 

Berkeley to capture and preserve the site with web recording software once data entry is 

complete. This will generate a moderately interactive surrogate of the site that can be 

used in perpetuity for some kinds of information retrieval needs, even if the site itself 

ceases to exist online. 

7.2 Data 

The BDLT website includes functionality for exporting any search query as a CSV. 

With an eye towards capturing the full extent of the data for potential use in 

computational research, and/or in other interfaces if the website is no longer available, 

Dombrowski has generated a set of CSV files that include all fields from all content 


types, and will update these files with new versions once data entry is complete. Drupal 

automatically generates a unique ID for each node (instantiation of a content type), and 

stores references between nodes (e.g. a pointer from a token to a lexeme, or a line to a 

token) using that unique ID. Including this ID in all exports will make it possible to 

reconstruct the network of relationships between the various content types in future data 

analyses and interfaces. 

While UC Berkeley does not have a track record of providing an institutional 

repository for data, the project team anticipates depositing the data sets in such an 

environment if it is established. The Tromsø Repository of Languages and Linguistics 

(https://site.uit.no/trolling), which is affiliated with the CLARIN European Research 

Infrastructure, is appealing as a disciplinary repository. In addition to formally 

accessioning the data to these data repositories, Dombrowski has followed the common 

digital humanities practice of putting the CSV files, along with some example analysis 

code, on the code repository platform Github (https://github.com/quinnanya/bdlt-data).  

7.3 Print 

The “texts” on the site are valuable pieces of data, even in the absence of the metadata 

that enables the various search options. To make sure that these texts are preserved at 

multiple levels, print copies will be produced of all three versions of each text: that with 

the grammatical and lexical glosses, that in Latin transcription with English translation, 

and that in Cyrillic transcription. It is particularly important to have both of the latter, 

since Bulgarian dialectology uses a different set of transcription symbols than those 

currently used in the West.  

8. Conclusion 

Over the course of eight years, the Bulgarian Dialectology as Living Tradition project 


has navigated the full digital humanities project lifecycle, from idea to archiving, 

without support from external funding. While others may disagree with any of the 

decisions made during the course of the project’s implementation, this paper has served 

to explicate the motivations, context, and constraints that informed those decisions, to 

serve as a point of discussion for the development of future digital humanities projects 

within the broad field of Slavic studies, and beyond. As BDLT transitions from a 

project to a scholarly resource, the directors hope that this undertaking can lay the 

foundation for the emergence of a richer understanding of Bulgarian language and 

culture, and that scholars working in other areas may be inspired to undertake similar 

endeavors to make materials available digitally -- but only at the right time and place in 

their careers where such work becomes feasible. 

9. Acknowledgements 

The authors would like to express their deep appreciation for all their collaborators on 

this project as of April 2019: senior associate research team member Georgi Kolev; 

associate research team members Roslyn Burns, Cammeron Girvin, Snejana Iovtcheva, 

Kea Johnston, Eric Prendergast, Vesela Simeonova, Traci Speed, and John Sylak; 

apprentice research team members Jessica Adams, Richa Bhandal, Zuhra Bholat, 

Gabrielle Bozmarova, Nina Chang, Jessica Chapman, Lana Cosic, Katie Crowe, 

Stephanie DeLeon, Naomi Francisco, Austin Frenes, Dimiana Georgieva, Emmanuella 

Hristova, Siyana Hristova, Andrew Kuznetsov, Kathleen Lamont, McKayla Major, 

Kelsey Mota, Grace Newsom, Jerry Nikolaev, Nadia Nizetich, Siyao [Logan] Peng, 

Stella Petkova, Charles Rosencrans, Elizabeth Sawyer, John Sockolov, Jeffrey Stock, 

Aleksandrina Stoyanova, Vanessa Taylor, Milena Tintcheva, and Emma Wilcox; 

fieldwork core team members Georgi Kolev and the late Maksim Mladenov; fieldwork 

contributors Elena Uzeneva and Georgi Mitrinov; fieldwork assistants Krasimir 


Mirchev, Radko Shopov, and Ivan Vankov; fieldwork student apprentices Cammeron 

Girvin, Marieta Nikolova, Traci Speed Lindsey, Matthew Baerman, Jonathan Barnes, 

Tanya Delcheva, Elisabeth Elliott, Kamen Petrov, and Petŭr Shishkov. For a complete 

and current list of project collaborators, see http://bulgariandialectology.org/project-

team. 

This paper is dedicated to the late Maksim Mladenov, who has been the 

project’s guiding light and guardian angel.