key: cord-0319149-1a2jow8r authors: Giroud, Jérémy; Lerousseau, Jacques Pesnot; Pellegrino, François; Morillon, Benjamin title: The channel capacity of multilevel linguistic features constrains speech comprehension date: 2021-12-10 journal: bioRxiv DOI: 10.1101/2021.12.08.471750 sha: c311e6762500e601a1d0be1b06aaf712f3dcd5e1 doc_id: 319149 cord_uid: 1a2jow8r Humans are expert at processing speech but how this feat is accomplished remains a major question in cognitive neuroscience. Capitalizing on the concept of channel capacity, we developed a unified measurement framework to investigate the respective influence of seven acoustic and linguistic features on speech comprehension, encompassing acoustic, sub-lexical, lexical and supra-lexical levels of description. We show that comprehension is independently impacted by all these features, but at varying degrees and with a clear dominance of the syllabic rate. Comparing comprehension of French words and sentences further reveals that when supra-lexical contextual information is present, the impact of all other features is dramatically reduced. Finally, we estimated the channel capacity associated with each linguistic feature and compared them with their generic distribution in natural speech. Our data point towards supra-lexical contextual information as the feature limiting the flow of natural speech. Overall, this study reveals how multilevel linguistic features constrain speech comprehension. Humans are remarkably successful at quickly and effortlessly extracting meaning from 44 spoken language. The classical method to study this ability and identify its processing steps is to 45 reveal the constraints that limit speech comprehension. For example, the fact that speech 46 comprehension drops when more than ~12 syllables per second are presented has been interpreted 47 as evidence that at least one processing step concerns syllables extraction (Ghitza, 2013; Giraud & 48 Poeppel, 2012; Versfeld & Dreschler, 2002) . As language processing involves distinct 49 representational and temporal scales, it is usually decomposed into co-existing levels of information, 50 estimated with distinct linguistic features, from acoustic to supra-lexical ( comprehension, only investigate a single linguistic feature and, as a consequence, a complete 56 picture of which processes underlie speech comprehension is still lacking. This is because there 57 exists no common theoretical framework and no unique experimental paradigm to compare multiple 58 linguistic features at the same time. Among the existing experimental paradigms, artificially 59 increasing the speaking rate to generate adverse and challenging comprehension situations is a 60 common approach. However, when speech is artificially time-compressed (Dupoux & Green, 1997; 61 Foulke & Sticht, 1969; Garvey, 1953) , all linguistic features are impacted by the modification, making 62 it impossible to disentangle their unique impact on behavioral performance. It thus remains unknown 63 whether the syllabic rate actually constrains comprehension, whether it is the phonemic rate or any 64 other rate, or whether bottlenecks are present at different levels of processing. 65 To solve this problem, we propose to rely on a concept inherited from information theory 66 (Shannon, 1948) , channel capacity, and to carefully orthogonalize multiple linguistic features to 67 reveal their unique contribution to speech comprehension. The processing of each linguistic feature 68 can be modeled as a transfer of information through a dedicated channel. Channel capacity is 69 defined as the maximum rate at which information can be transmitted. Thanks to this approach, we 70 identified and compared in a unique paradigm the potential impact of acoustic, sub-lexical, lexical 71 and supra-lexical linguistic features on speech comprehension. 72 First, speech is an acoustic signal characterized by a prominent peak in its envelope 73 modulation spectrum, around 4-5 Hz, a feature shared across languages (Ding et al., 2017; Varnet, 74 Ortiz-Barajas, Erra, Gervain, & Lorenzi, 2017). This acoustic modulation rate approximates the 75 syllabic rate of the speech stream (Poeppel & Assaneo, 2020) , which happens at around 2. Compressed speech gating paradigm. 160 We collected behavioral data from three independent experiments in which participants were 161 required to understand successive time-compressed versions of either spoken monosyllabic words 162 or sentences, respectively in Experiment 1, 2 and 3 ( Fig. 1) procedure. This is a crucial condition to be able to determine their respective impact on speech 179 comprehension performance. Finally, by investigating each feature in a similar measurement 180 framework we were able to directly compare their respective impact on speech comprehension. 181 Across the different compression rates, comprehension shifted from not understood (mean 183 performance accuracy of 0.03 % and 0.1 % for experiments 1 and 2, respectively) to perfectly 184 understood (96.3 % and 99 %), with a characteristic sigmoid function, indicating that the range of 185 compression rates selected was well suited to investigate speech comprehension at its limits ( Fig. 186 2). A mean performance accuracy of 50 % was observed for a compression rate of 3.5 in both 187 experiments. At a compression rate of 5 or above, comprehension was essentially residual (< 10 %). In experiment 1, participants were presented with the same audio stimuli (words) at ten different compression rates. In We used generalized linear mixed-effects models (GLMMs) to evaluate the extent to which 198 multiple linguistic features were predictive of behavioral performance (word or sentence 199 comprehension). The GLMM approach enables a fine-grained characterization of the independent 200 contributions of the different features (see Methods). 201 In experiment 1, a GLMM with a logit link function was conducted to model spoken word 202 comprehension. The model included participants and compression rates as random effects and five 203 linguistic features, acoustic modulation rate, the phonemic and syllabic rates, phonemic information 204 rate and static lexical surprise, as fixed effects (Fig. 3 for 74 % of the variance of the data. The model revealed a significant effect of the acoustic 207 modulation rate (β = -0.7 ± 0.06, p < 0.001), the phonemic rate (β = -0.25 ± 0.07, p = 0.001) and the 208 syllabic rate (β = -1.07 ± 0.08, p < 0.001), indicating that they independently and additively impact 209 comprehension. The model's coefficients read as follows: !"#"$%&'(")*+,-"./+."./*"011-"02"3454,3"+" 210 6077*6." 7*-80,-*" +7*" )9:.48:4*1" ;<" *=8>$%&'(?" @" +7*" 14541*1" ;<" A" 207" +," 4,67*+-*" 02" 0,*" -.+,1+71" 211 1*54+.40," 4," -<::+;46" 7+.*B" 1*)0,-.7+.4,3" ./*" +15*7-*" 4)8+6." 02 increased syllabic rate on speech 212 comprehension. Phonemic information rate did not significantly contribute to the model (β = -0.03 ± 213 0.03, p = 0.258). Finally, the static lexical surprise was significantly associated with listeners' speech 214 comprehension (β = -0.91 ± 0.07, p < 0.001), indicating that words' unexpectedness worsens 215 participants' comprehension. Holm-corrected post-hoc comparisons were performed to identify differences among 246 selected features in modulating spoken word comprehension. Features were ordered from the most 247 to the least influential, and compared between neighbours. The analysis revealed no significant 248 difference between the two most influential features, syllabic rate and static lexical surprise (β = -249 0.16, z = -1.58, p = 0.12). In contrast, all other pairwise comparisons were significantly different (all 250 p < 0.05). 251 In experiment 2, a GLMM with a logit link function was also used to model spoken sentences 252 comprehension. The model included seven linguistic features as fixed effects (Fig. 3 hoc comparisons were conducted to assess differences between the relative influence of each 256 linguistic feature on sentence comprehension. The analysis showed that the syllabic rate has the 257 largest impact on performance, with significantly more influence than contextual lexical surprise (β 258 = -0.14, z = -5.22, p < 0.001). Conversely, the contrast between contextual and static lexical surprise 259 rate did not reach significance (β = -0.01, z = -0.34, p > 0.05). Whereas modulatory effect of the 260 static lexical surprise and the acoustic modulation rate on comprehension was not significantly 261 different (β = -0.10, z = -2.07, p > 0.05), this latter alter significantly more speech comprehension 262 than syllabic information rate (β = -0.18, z = -4.87, p < 0.001). Finally, modulation of performance 263 induced by syllabic information rate, phonemic information rate and phonemic rate do not 264 significantly differ (all p > 0.41). 265 266 Comparing experiments 1 and 2, we first observed a similar profile of response weights, with 268 a larger impact of syllabic rate and static lexical surprise, a medium influence of the acoustic 269 modulation rate, and lower weights for the other linguistic features (Fig. 3) . 270 We assessed, for each linguistic feature, potential significant differences between experiments 1 and 271 2. This analysis ( Fig. Supp. 3) reveals that the weights associated with the four features of interest -272 the acoustic modulation, phonemic and syllabic rates and the static lexical surprise-are significantly 273 larger in Experiment 1 than in Experiment 2 (all p < 0.05 Holm-corrected for multiple comparison). This difference is associated with a reduction of (around or more than) 50% in experiment 2 275 compared to experiment 1. This hence suggests that adding contextual lexical information (the main 276 difference between experiments 1 and 2) reduces the impact of all other features on comprehension. 277 Of note, a fifth feature investigated in this comparison -phonemic information-was associated with 278 a non-significant weight in experiment 1, a significant but marginal weight in experiment 2, and these 279 weights are not significantly different across experiments, which confirms the marginal impact of this 280 linguistic feature on comprehension. 281 Following the main GLMM analysis, we aimed at characterizing the relationship between the 293 value of each linguistic feature at original speed (x1) -which reflects the intrinsic linguistic properties 294 of the stimulus sets -and the comprehension point (i.e the compression rate at which participants' 295 comprehension reaches 75 % of accuracy, see Methods). This analysis ought to confirm the 296 individual propensity of each linguistic feature to modulate the comprehension point (see Methods). In experiment 1, a linear mixed model analysis fully reproduced the results from the main GLMM 298 analysis ( with the GLMM and directly show that the linguistic properties of the non-compressed stimuli predict 306 the maximal compression rate at which comprehension can be maintained. 307 The syllabic rate is the strongest determinant of speech comprehension. 308 To more directly visualise the data from both experiments, a complementary approach was 309 adopted. For each compression rate, performance was first binned as a function of the syllabic rate 310 (see Methods), as this feature had the strongest impact on performance in the two experiments ( Fig. 311 Supp. 3 and Fig. Supp. 4a ). This visualisation highlights the major influence of the syllabic rate on 312 behavioral outcome independently of the compression rate, in both experiments. Second, data were 313 also binned as a function of the other features, after having been stratified as a function of the syllabic 314 rate (Fig. Supp. 3 and Fig. Supp. 4 ). This highlights their additional impact over the major influence 315 of syllabic rate. This visualisation enables a better grasping of the relative influence of each linguistic 316 feature on comprehension and confirmed graphically the genuine results obtained with the more 317 fine-grained GLMM and LMM approaches. 318 Stimulus repetition has no effect on comprehension performance. 319 The compressed speech gating paradigm requires that the same speech stimulus be 320 repeated immediately with a lower compression rate. Such a procedure could bias the 321 comprehension point in favour of earlier comprehension, as participants might understand a little 322 more with each repetition of the stimulus. Although this paradigm specificity is unlikely to have an 323 impact on the main results (e.g. GLMM/LMM analyses, Fig. 3) , it is possible that the comprehension 324 point would occur later if the stimuli were not repeated immediately. 325 In order to address this concern, we ran a control experiment (experiment 3). We recruited a 326 new pool of twenty participants online. They performed a shorter version of experiment 2. The 327 participants were presented with the same stimuli than in experiment 2, but at only one compression 328 rate (*3.5), the compression rate leading to approximately 50% of comprehension in experiment 2 329 (the inflexion point of the sigmoid curve of comprehension). Importantly, in experiment 2, this 330 compression rate corresponded to the gate n°4, i.e., the fourth repetition of the same sentence in a 331 row, while in the new experiment it corresponds to the first and unique presentation (gate n°1). It is 332 hence appropriate to investigate the potential impact of stimulus repetition on comprehension. Like 333 in experiment 2, participants were asked to repeat the sentence after each single presentation. Data 334 were scored exactly as in experiment 2. 335 We assessed whether stimulus repetition was biasing the comprehension point and our 336 estimation of the channel capacities associated with each linguistic feature. We performed an 337 independent t-test to assess the difference of performance between experiments 2 and 3. The 338 statistical procedure revealed no significant difference between the two samples (p> 0.05, t(39) = -339 1.8; Fig. 4 ), which indicates that stimulus repetition does not facilitate comprehension compared to 340 a unique presentation nor bias the comprehension point towards earlier understanding, and hence 341 does not bias our estimation of the channel capacities associated to each linguistic features. 342 To summarize, the repeated presentation paradigm (experiment 2) and the unique 343 presentation paradigm (experiment 3) yield converging estimations in terms of linguistic feature 344 importance and channel capacity estimation. Estimation of the channel capacity associated with each linguistic feature. 356 Thanks to the compressed speech gating paradigm, we were able to derive for each feature 357 the distribution of its values (in rate) at the comprehension point, which provided an estimation of its 358 channel capacity (see Methods). This estimation corresponds to the value (in rate, or bit/s) at which 359 comprehension consistently emerges. This threshold thus reflects a successful transmission of 360 linguistic information but also determines the highest rate of information flow. As such, stimuli 361 containing linguistic feature's values above this threshold will exceed channel capacity leading to a 362 drop in comprehension performance. Overall, we found that channel capacities associated with each 363 linguistic feature investigated were on the same order of magnitude in both experiments (Fig. 5) . Specifically, the estimated maximum acoustic modulation and syllabic rates were both centred 365 around 10-15 Hz, while the phonemic rate's channel capacity was centred around 35 Hz. We finally estimated whether any linguistic feature was close to its channel capacity in the 384 non-compressed stimulus sets. For each linguistic feature, we thus compared its value at the 385 comprehension point (i.e. its channel capacity) and at original speed (i.e. its intrinsic statistics) and 386 estimated a percentage of overlap across distributions. 387 In experiment 2, for each feature, the percentage of overlap between the two distributions 388 was below 1 %, with the exception of the contextual lexical surprise, which was reaching a ~18 % of 389 overlap (a value significantly higher than the others; repeated-measures ANOVA: comprehension. In particular, speech has been described as an inherently rhythmic phenomenon, 422 in which linguistic information is pseudo-rhythmically transmitted in "packets" (Ghitza, 2014 We also addressed whether in natural speech and at normal speed, the intrinsic statistics 495 associated with each linguistic feature are already close to their channel capacity. Apart from 496 contextual information, all other features' generic statistics are below their respective channel 497 capacity. Based on those results, we propose that contextual lexical surprise is an important 498 constraint regarding the rate at which natural speech unfolds. Accordingly, speech production and 499 perception can be envisioned as a dynamical information processing cycle, in which the speaker and 500 the listener are two elements in interaction within one closed-loop converging system (Ahissar & 501 Assa, 2016). While in this study we approach the question from the perception side, to delimitate the 502 highest rate at which linguistic inputs can be processed, it would be of great interest to look at the 503 same phenomenon from the production side and determine whether constraints imposed on speech 504 comprehension have some equivalents in speech production. Related to this, investigating whether 505 and which channel capacities can be extended by training could be a powerful way to optimise 506 rehabilitation strategies in patients suffering from speech impairments. 507 Artificially compressing speech can lead to a degradation of the quality of the linguistic 508 information. This can cause comprehension to drop as linguistic features may most efficiently be 509 represented at their natural rates in that the complexity of an integration operation defines its channel capacity. Our data are in 528 accordance with this idea, as we showed that multilevel linguistic features predict accelerated 529 speech comprehension performance. One question we can not answer is whether this is the result 530 of a serial chain of processes or of competing parallel processes, or both. Further work using time-531 resolved measurements of comprehension could adjudicate between these concurrent hypotheses. 532 Finally, while we used meaningful sentences and words derived from large databases, due 533 to experimental conditions, we artificially accelerated the spoken material to carefully control for 534 speed variations. This controlled experimental task may seem somewhat unnatural but we show that 535 the compressed speech gating paradigm is sensitive to linguistic features that have been shown to 536 influence language processing in more classical experimental settings. Importantly this paradigm 537 allows comparing in a generic framework different linguistic features from previously distinct 538 subfields in the language domain. While the model approach comparison used in this work only 539 affords relative conclusions, it undoubtedly paves the way for more thorough investigations of the 540 effects of multilevel linguistic features on speech comprehension. Thanks to an innovative paradigm 541 and stimuli selection procedure, our approach unifies a diverse literature under the unique concept 542 of channel capacity. Our findings highlight the relevance of using both natural speech material 543 (despite being more methodologically constraining) and a normative measurement framework to 544 study speech comprehension. We hope that this work will settle the ground for further explorations 545 of speech comprehension mechanisms at the interface of multiple linguistic research fields. 546 "fr-FR-Wavenet-C"). 567 Using text-to-speech technology as opposed to naturally-produced speech has the critical advantage 568 of controlling for the relevant linguistic features. Indeed, naturally produced speech displays 569 variability across utterances in multiple linguistic characteristics (i.e., prosody, quality of phonetic 570 pronunciation, phonemic duration, coarticulation, local speech rate, etc) (Miller, Grosjean, & 571 Lomanto, 1984). On the contrary, synthetic speech remains highly consistent across utterances with 572 the same sentence being always pronounced the same way. This point is highly important when 573 assessing channel capacity, as the different words (Experiment 1) or sentences (Experiment 2) must 574 be pronounced similarly to be able to estimate the impact of linguistic features on comprehension 575 across stimuli. 1000 Hz. Then, we used Welch's method (Virtanen et al., 2020) to estimate the power spectral 591 density of the envelope, resulting in a modulation spectrum between 1 and 215 Hz with a 0.1 Hz 592 resolution. This was done for each stimulus. Finally, the center frequency of each spectrum was 593 extracted by taking the global maximum value of each modulation spectrum. The acoustic 594 modulation rate was expressed in Hz. 595 Phonemic rate: it corresponds to the number of phonemes presented per second. It was computed 596 by dividing the number of phonemes (retrieved from the canonical pronunciation provided in the 597 Lexique database (New, Pallier, Brysbaert, & Ferrand, 2004)) by the duration of the stimulus. The 598 phonemic rate was expressed in Hz. 599 Syllabic rate: same as the phonemic rate but for syllables. It was also expressed in Hz. 600 Phonemic information rate: it measures how much information, defined by Shannon's theory of 601 communication, is carried by each phoneme (n=38). In order to approach this level from a 602 perspective different from the lexical level described below, we adopted a methodology based on 603 the contrastive role of the phonemes in keeping the words different in the French lexicon. For each 604 distinct phoneme, its contrastive role was computed as its relative functional load (Oh, Coupé, 605 Marsico, & Pellegrino, 2015) . The functional load allows calculating the relative importance of a 606 phoneme for a given language. More specifically, it quantifies its importance in terms of avoiding 607 homophony keeping the words distinct in the lexicon, given their frequency of usage. The phonemic 608 information rate is consequently defined for each stimulus as the sum of its phonemic functional 609 loads divided by its duration. This feature was estimated from written data derived from the Lexique 610 database. The phonemic information rate was expressed in bits per second. 611 Syllabic information rate: same as phonemic information rate but for syllables (n=3660). It was 612 also expressed in bits per second. Orthogonalisation procedure to select the stimulus sets. In order to avoid collinearity issues due 640 to correlations between features across stimuli, we developed a custom-made leave-one out iterative 641 algorithm to select stimuli with low correlation between features. The algorithm starts with the 642 complete original database (1,100 words in experiment 1 and 14,000 sentences in experiment 2) 643 and computes the correlation between each pair of features (5-7 features, 10-21 correlations in total 644 in experiment 1 and 2 respectively). Then, the algorithm performs a leave-one-out procedure: it 645 removes one stimulus, recomputes the correlation matrix on this reduced set and estimates the 646 specific contribution of the one stimulus on the original correlation matrix, by comparing the 647 correlation matrices of the full and reduced stimuli sets. This processing step is repeated until all 648 items have been removed once. The 10 percent stimuli that led to the most significant increase in KHz. This resulted in 2510 audio stimuli (251 words x 10 compression rates) in experiment 1, 700 674 audio stimuli (100 sentences x 7 compression rates) in experiment 2 and 100 audio stimuli (100 675 sentences x 1 compression rate) in experiment 3. A manual check was performed to ensure that the 676 compression procedure did not insert salient quirks. 677 One necessary prerequisite of our experiment is that across presentation rates all the investigated 678 acoustic and linguistic factors are uniformly modified (i.e., that time-compression does not impact a 679 particular feature more than the others). Previous experimental work has shown that artificially time-680 compressed speech and natural fast speech are qualitatively different. Indeed, in the first case, the 681 spectral content is exactly similar but the duration of the utterance is reduced. This results in an 682 uniform modification of all spectral and temporal details. In the second case, due to restrictions on 683 articulation, the signal is affected non-uniformly (Guiraud et al., 2018; Janse, 2004) . In addition, the 684 idea of using the modified gating paradigm was to present to the participants at each compression 685 rate exactly the same overall quantity of information, albeit delivered at different speed/rate, so that 686 the channel capacity of each factor can be estimated. Hence it was crucial that the material was 687 exactly similar across compression rates, except for the time dimension. 688 Paradigm. All three behavioral experiments consisted in a modified version of the gating paradigm 689 (Grosjean, 1980) using time-compressed speech stimuli. 690 In experiment 1, participants were presented with 10 time-compressed versions of isolated words. Each trial consisted in the successive presentation of different time compressed versions of the same 692 audio stimulus, in an incremental fashion, starting with the most compressed version of the stimulus 693 (gate n°1) and ending with the least compressed version (either gate n°10). After each audio 694 presentation, participants were asked to type on the keyboard what they heard and then to press 695 enter to continue to the next gate. 696 Experiment 2, was similar to experiment 1, apart from the fact that participants were presented with 697 7 time-compressed versions of seven-word sentences. Each trial thus ends at gate n°7, following 698 the presentation of the least compressed version of the sentence. In experiment 2, participants were 699 required to repeat in the microphone at each gate what they heard and then to press enter to continue 700 to the next gate. 701 Experiment 3 was similar to experiment two except that only one time compressed version (x 3.5) of 702 each sentence was presented per trial. 703 In all experiments, participants were instructed that each auditory stimulus was meaningful and 704 difficult to understand at the highest compression rates. In order to get familiarized with the task, 705 participants completed three practice trials before the experiment. The procedures were the same except that participants were instructed to record their answers with 726 a microphone (instead of typing them) to optimize the duration of the experiment. 727 Data scoring. Speech comprehension was scored 1 if the response was correct (grammatical errors 730 were allowed) and 0 if the response was incorrect or if no answer was given. In experiment 2 and 3, 731 participants' audio responses were first transcribed using Google Cloud Speech-to-Text (Google, 732 Mountain View, CA, 2018) and checked manually for mistakes or inconsistencies. statistically assessed the significance of the difference between the multiple regressors across 784 experiments 1 and 2 in an unbiased way using their standardized estimates and standard error to 785 the mean. Moreover, after having transformed the resulting Z-scores (standard normal distribution) 786 into p-values, we additionally applied a Holm-correction for multiple comparisons. From the resulting 787 statistics, we assessed, for each linguistic feature, potential significant differences between 788 experiments 1 and 2. percentage of overlap between feature distributions reveal which feature is already near the upper 802 limit of speech comprehension at normal speed, potentially limiting our ability to cope with higher 803 speed speech. 804 Model validation. All models were fitted in R (version 3.5.1, (R core, 2020) ) and implemented in 805 RStudio (Racine, 2012) using the lme4 package (Bates et al., 2015). Fixed effects were z-806 transformed to obtain comparable estimates (Schielzeth, 2010) . Visual inspection of residual plots 807 was systematically performed to assess deviations from normality or homoscedasticity. Variance 808 inflation factors (VIF) were also checked to ensure that collinearity between fixed effects was absent. Overall, VIF values were generally close to one and no deviations from model assumptions were 810 detected. We tested the significance of the respective full models as compared to the null models by 811 using a likelihood ratio test (R function anova). Goodness of fit of the models were evaluated and 812 reported using both the marginal and conditional R 2 . 813 Data availability. Numerical data supporting this study will be available on GitHub: , 66(1-2) , 46-63. Overlapping: a R package for Estimating Overlapping in Empirical Distributions Using the correct statistical test for the equality of 1000 regression coefficients 8 Comparing and deconstructing speech rhythm across Romance languages Neural Oscillations Carry Speech Rhythm through to Comprehension Phase-locked responses to speech in human auditory cortex are enhanced 1006 during comprehension Band Neural Activity Reflects Independent 1008 Syllable Tracking and Comprehension of Time-Compressed Speech Do people use language production to make predictions during comprehension? Speech rhythms and their neural foundations The analysis of speech in different temporal integration windows: cerebral lateralization as 'asymmetric 1017 sampling in time Examples of mixed-effects modeling with crossed random effects and with binomial 1019 data RStudio: A Platform-Independent IDE for R and Sweave Speech timing and linguistic rhythm: on the acoustic bases of rhythm typologies Temporary suppression of visual processing in an RSVP task: an 1025 attentional blink? Note on information transfer rates in human communication Temporal information in speech: acoustic, auditory and linguistic aspects R: A Language and Environment for Statistical Computing Simple means to improve the interpretability of regression coefficients Neural speech 1035 tracking shifts from the syllabic to the modulation rate of speech as intelligibility decreases A mathematical theory of communication Brain mechanisms of serial and parallel processing during dual-task performance Chimaeric sounds reveal dichotomies in auditory perception Predictive top-down integration of prior knowledge during 1044 speech perception Toward a model for lexical access based on acoustic landmarks and distinctive features. The Journal 1046 of the The syllable in the light of motor skills and neural oscillations. Language, cognition 1048 and neuroscience A temporal bottleneck in the language 1050 comprehension network A cross-linguistic study of speech modulation 1052 spectra The relationship between the intelligibility of time-compressed speech and 1054 speech in noise in young and elderly listeners SciPy 1057 1.0: fundamental algorithms for scientific computing in Python How stable are acoustic metrics of 1059 contrastive speech rhythm? Transformers: State-of-the-Art Presented at the Proceedings of the 2020 Conference on Empirical 1063 Methods in Natural Language Processing: System Demonstrations