id author title date pages extension mime words sentences flesch summary cache txt work_yc4aamvkrbeefncwitj4tn7xca D M Carter Presenting the Bangor Autoglosser and the Bangor Automated Clause-Splitter 2017.0 8 .pdf application/pdf 4389 369 65 Centre for Language Studies, Radboud University, Nijmegen, The Until recently, corpus studies of natural bilingual speech and, more specifically, codeswitching in bilingual speech have used a manual method of glossing, partof-speech tagging, and clause-splitting to prepare the data for analysis. article, we present innovative tools developed for the first large-scale corpus study scripting language, we were able to successfully analyze almost 450,000 words in 65,000 clauses. crucially, the autoglossing and clause-splitting processes, and final data preparation. application of Constraint Grammar to mixed-language texts. our study, including the development of the automated clause-splitter we describe next. Given that the Siarad corpus was not originally transcribed in simple clauses and no Welsh parser As mentioned in the introduction, previous studies that involved manual clause-splitting took several weeks and many researchers to divide only a cognate, and the language of the clause (Welsh, included the total number of words, clauses, cognates, and codeswitches in each conversation and ./cache/work_yc4aamvkrbeefncwitj4tn7xca.pdf ./txt/work_yc4aamvkrbeefncwitj4tn7xca.txt