key: cord-0741153-wd0s6m4w authors: Huth, Markus E.; Boschung, Regula L.; Caversaccio, Marco D.; Wimmer, Wilhelm; Georgios, Mantokoudis title: The effect of internet telephony and a cochlear implant accessory on mobile phone speech comprehension in cochlear implant users date: 2022-04-24 journal: Eur Arch Otorhinolaryngol DOI: 10.1007/s00405-022-07383-x sha: bdf72052f66415e4ec24cd20134c8bc6743201f9 doc_id: 741153 cord_uid: wd0s6m4w PURPOSE: In individuals with severe hearing loss, mobile phone communication is limited despite treatment with a cochlear implant (CI). The goal of this study is to identify the best communication practice for CI users by comparing speech comprehension of conventional mobile phone (GSM) calls, Voice over Internet Protocol (VoIP) calls, and the application of a wireless phone clip (WPC) accessory. METHODS: This study included 13 individuals (mean age 47.1 ± 17.3 years) with at least one CI. Frequency response and objective voice quality were tested for each device, transmission mode and the WPC. We measured speech comprehension using a smartphone for a GSM call with and without WPC as well as VoIP-calls with and without WPC at different levels of white background noise. RESULTS: Frequency responses of the WPC were limited (< 4 kHz); however, speech comprehension in a noisy environment was significantly improved compared to GSM. Speech comprehension was improved by 9–27% utilizing VoIP or WPC compared to GSM. WPC was superior in noisy environments (80 dB SPL broadband noise) compared to GSM. At lower background noise levels (50, 60, 70 dB SPL broadband noise), VoIP resulted in improved speech comprehension with and without WPC. Speech comprehension scores did not correlate with objective voice quality measurements. CONCLUSION: Speech comprehension was best with VoIP alone; however, accessories such as a WPC provide additional improvement in the presence of background noise. Mobile phone calls utilizing VoIP technology, with or without a WPC accessory, result in superior speech comprehension compared to GSM. With the advent of mobile phones, mobile communication and availability for private and professional reasons have become ubiquitous. In hearing-impaired individuals, however, having mobile phone conversations is challenging despite treatment with hearing aids or cochlear implants (CI) [1] . Telephone speech comprehension in cochlear implant users (CIUs) is often compromised by reduced speech signal quality [2] [3] [4] [5] . During the COVID-19 (coronavirus disease 2019) pandemic and the subsequent requisite social distancing, the importance of telephone speech comprehension during both private and professional telephone conversations and conferences has become paramount. Telephone speech comprehension is often compromised due to a limited frequency bandwidth of 300-3400 Hz and digital signal compression [6, 7] . Voice over Internet Protocol (VoIP), however, transmits a wider frequency bandwidth (200-12,000 Hz), which results in improved speech comprehension independent of additional visual cues [6] [7] [8] [9] . Mantokoudis et al. measured speech comprehension during a conventional telephone call with background noise at different signal-to-noise ratios (SNR) [10] . During quiet conditions, telephone speech comprehension in CIUs was 73%. Speech comprehension with background noise (SNR 10 dB), however, decreased significantly to only 12% in CIUs compared to 97% in normal-hearing individuals (NHI) [10, 11] . Using VoIP, speech comprehension with background noise (SNR 10 dB) improved to 48% in CIUs; this resulted from higher speech signal quality due to transmission of a wider frequency spectrum and application of noise reduction algorithms in VoIP compared to conventional telephony [10, 11] . Using different mobile phone models, speech comprehension in CIUs improved by 13-15% for VoIP compared to Global System for Mobile Communication (GSM) transmission [12] . The ability to lip-read using the video signal during VoIP conversations further contributes to improved speech comprehension [13] . Furthermore, placing the phone receiver or the mobile phone next to the ear, as commonly held during telephone conversations, can produce interfering noise from physical contact of the phone receiver with the microphone of the speech processor [14, 15] . On the other hand, using the speakerphone can reduce the speech signal quality due to interference from ambient noise. Notably, environmental background noise during mobile phone calls is an omnipresent predicament that comes with permanent availability [10] as surrounding traffic and people require a certain awareness. This predicament has already been recognized by the hearing aid industry. While the newest CIs can link directly to a smartphone, previous CI generations require a coupling device. Currently, all three major CI manufacturers offer such coupling devices. MED-EL (MED-EL Elektromedizinische Geräte Gesellschaft m.b.H., Fürstenweg 77a, 6020 Innsbruck, Austria) offers the AudioLink. In addition to smartphone call transmission to the CI, the AudioLink can also transmit music from the smartphone and link to a TV. Advanced Bionics (Advanced Bionics LLC 28515 Westinghouse Place, Valencia, CA 91355, USA) offers the Naída CI Connect, which is directly attached to the CI speech processor. Cochlear (Cochlear Ltd., 1 University Avenue, Macquarie University, NSW, 2109, Australia) offers the Wireless Phone Clip (WPC). The WPC allows for direct, wireless routing of the speech signal from a smartphone to the hearing aid or cochlear implant, thereby reducing or optionally eliminating surrounding background noise. The WPC has a built-in microphone which enables optional acoustic transmission of environmental sounds with the phone signal, to increase surrounding awareness. Furthermore, the WPC has a button that allows for answering and terminating phone calls directly, effectively rendering the WPC, with its built-in microphone, a hands-free speakerphone. Multiple studies performed by Cochlear Inc. demonstrate improved speech comprehension with WPC for conventional mobile (GSM) telephony in cochlear implant users [16] [17] [18] . The effect of the WPC on telephone speech comprehension in CIUs utilizing VoIP telephony, however, remains unclear. This is of particular interest, as VoIP telephony has become an integral part of the increasing amount of distant communication throughout the COVID-19 pandemic. Therefore, this study investigates CIU telephone speech comprehension for GSM and VoIP telephony both with and without the WPC. The study protocol was approved by the local ethical review board prior to the start. The study was conducted in full accordance with the study protocol as approved and with the ethical standards as stated in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. Written informed consent including consent for publication was obtained from all participants. Based on the two different transmission modes (GSM or VoIP) and the two coupling modes (acoustically with a mobile phone on the ear [A] or via WPC (Bluetooth connection to the speech processor [B]), four measurement scenarios resulted: GSM telephony with acoustic signal transmission (A-GSM); GSM telephony with Bluetooth signal transmission (B-GSM); VoIP telephony with acoustic signal transmission (A-VoIP); VoIP telephony with Bluetooth signal transmission (B-VoIP). GSM phone calls were routed through the mobile service provider. A mobile VoIP App was used for receiving VoIP calls. To prevent learning effects in participants due to repeated measurements in this study, the sequence of measurement scenarios was randomized. Prior to measuring human study participants, the characteristics of the study equipment were measured. An integrating sound pressure level (SPL) meter ( [23] [24] [25] [26] [27] [28] [29] Rives de Clausen, L-2165 Luxemburg) to the mobile phone in the audio booth either as a mobile GSM phone call or as a VoIP phone call to the mobile Skype software application (Fig. 1) . The SPL meter measured sound levels at 1 cm distance to the mobile phone loudspeaker and the phone volume was set to 65 dB SPL. A CI speech processor type Nucleus 7 was mounted and connected to a Head & Torso Simulator (Type 4128, Brüel & Kjaer A/S, Skodsborgvej 307, DK-2850 Naerum, Denmark) in the audio booth. The mobile phone was mounted to the ear of the Head & Torso simulator at 1 cm distance to simulate a typical phone call position. The HSM test sentences were either played acoustically from the mobile phone receiver or transmitted directly to the CI speech processor via Bluetooth by the WPC. Every possible combination was subsequently measured: A-GSM, B-GSM, A-VoIP, and B-VoIP. The frequency responses of the GSM or VoIP transmission were measured with an audio analyzer (UPV Audio Analyzer, Rohde & Schwarz GmbH & Co. KG, Mühldorfstrasse 15, 81671 Munich, Germany) as described in a previous publication from our group [12] . The frequency responses were measured either acoustically (A, for the mobile phone receiver) or electronically (B, for the WPC) from the CI speech processor mounted on the Head & Torso simulator. Reproducible measurements within a 1% margin were considered valid. To quantify the speech signal quality of the chosen speech comprehension test, a Perceptual Evaluation of Speech Quality (PESQ) [20] was performed. The audio analyzer utilized the PESQ test, which is a validated objective measure of speech quality in telecommunication [21, 22] as well as in CIs [23] . The PESQ score ranges from 1 to 5, in which 1 is the lowest and 5 the highest speech signal quality. The Head & Torso simulator recorded the signal from the CI speech processor for the four measured scenarios. To test speech comprehension, participating individuals were required to be native German speakers. Participants were recruited from our institutional CI patient database. Inclusion criteria were age ≥ 18 years, CI use for ≥ 3 months and a CI speech processor compatible with WPC. Exclusion criteria were minors and patients mentally or physically unable to participate. All participants utilized their CI speech processor in their everyday user settings or a commonly-used setting for phoning. The four measurement scenarios (A-GSM, B-GSM, A-VoIP, B-VoIP) were measured in a randomized order. To minimize confounding effects of residual hearing of the contralateral, non-implanted ear, a foam earplug was inserted in the respective ear canal. Speech comprehension was quantified using the HSM-sentence test [19] (Fig. 1) . The standardized HSM-test sentences were played from a CD on a laptop computer using VLC media player software. The speech signal was directly transmitted via VoIP software to the mobile phone (iPhone SE, iOS 10.2, Apple Inc., Cupertino, CA, USA) either as a GSM or VoIP call. The mobile phone and the transmission volume of the HSMsentences were set at 65 dB SPL as measured during the calibration. A white background noise (WBN) signal was played at 50, 60, 70, and 80 dB SPL from a loudspeaker in the sound-attenuated audio booth at 1 m distance in front of The Hochmair-Schulz-Moser (HSM)-test sentences were transmitted to a mobile phone via a conventional phone call (GSM) or via Voice over Internet Protocol (VoIP) and further transmitted either acoustically or via Bluetooth and Wireless Phoneclip to the cochlear implant while the participant was exposed to wideband noise at 50, 60, 70, or 80 dB SPL from a loudspeaker at 1 m distance the participant. The HSM-sentences then were transmitted either acoustically or via WPC to the CI speech processor. The participants verbalized the HSM-sentences, and every correctly verbalized word contributed to the overall HSMscore (106 words = 106 points maximum score). After each measurement, participants were asked to quantify subjective comprehension of speech sound quality for all four scenarios by the mean opinion score (MOS, 5-point rating scale with 1 as the lowest and 5 as the highest score), a subjective speech quality test widely used in telecommunication [9, 24] . We analyzed individual patient speech comprehension scores on the HSM-sentence test for all four scenarios. To eliminate confounding, the measurements of 6 participants were excluded at 80 dB SPL WBN due to zero comprehension of the HSM test sentences. Speech comprehension scores as directly measured by correctly verbalized words were compared by means of a Friedman-ANOVA test. To correct for multiple testing, a Wilcoxon-Nemenyi-McDonald-Thompson posthoc test was performed with a 5% level of significance. Statistical analysis was performed with Origin Pro (Version 8.6; OriginLab Corporation, One Roundhouse Plaza, Suite 303, Northampton, MA 01060, USA). We reported the subjective comprehension of speech sound quality according to the MOS as a secondary outcome. Fifteen individuals were identified as eligible. Two participants, however, demonstrated 0% speech comprehension in all tested scenarios and were thus excluded from the study. We included 13 participants (mean age 47.1 ± 17.3 years; male:female 7:6, Table 1 ). The frequency response was measured for all four scenarios from 20 to 10,000 Hz in dB V (Fig. 2) . Acoustic signal transmission (A-GSM and A-VoIP) demonstrated similar frequency responses from 20 to 3200 Hz. In the low-frequency range from 70 to 200 Hz, A-GSM and A-VoIP showed a decreased frequency response compared to Bluetooth signal To quantify audio signal quality, the PESQ score was measured for the HSM-sentences in all four scenarios. A-GSM and B-GSM resulted in a PESQ-score of 3.3 and 3.0, respectively. A-VoIP and B-VoIP had PESQ-Scores of 2.8 and 2.6, respectively. Comparing acoustic to Bluetooth transmission, the application of the WPC resulted in a decreased PESQ-Score of 0.3 for B-GSM and 0.2 for B-VoIP (Fig. 3 ). The Mean Opinion Score (MOS) represents the subjectively perceived audio signal quality of the participants with 1 as the lowest score and 5 as the highest. The participants rated A-VoIP (MOS 3.2) as the modality with the subjectively best signal quality, followed by B-VoIP (MOS 3.1). B-GSM resulted in a MOS of 2.8, whereas A-GSM resulted in a MOS of 2.3 (Fig. 3 ). This study investigates cochlear implant users' speech comprehension for VoIP telephony with (B-VoIP) and without (A-VoIP) WPC compared to conventional mobile telephony (A-GSM and B-GSM). Overall, VoIP provided superior voice quality and improved speech comprehension for CIUs; however, it appears that available accessories do not fully support the extended frequency band provided by VoIP. The frequency response measurements demonstrated a broader frequency range for A-VoIP alone. All other scenarios (A-GSM, B-BSM, B-VoIP) showed a narrow band transmission not supporting frequencies > 4 kHz. Application of the WPC resulted in a reduced signal loss in low frequencies (70-200 Hz), which are considered irrelevant for speech comprehension and may actually be interpreted by the user as additional background noise. Furthermore, the application of the WPC demonstrated lower signal intensities over a narrower frequency range resulting in increased signal loss. Restrictions of frequency range and signal intensity are often due to the device design and chosen hardware components. This has also been illustrated in previous studies, where different telephones or mobile devices were assessed in terms of voice quality and intelligibility [9] . In addition, telecommunication providers offering GSM apply speech codecs, which compress voice signals and restrict high-frequency transmission using 13kBit/s transmission rates (ITU recommendations, G-series). The compressed speech signals have a crucial impact on speech discrimination in the presence of noise since the high-frequency component of speech is absent [6, 7] . PESQ measurements were performed to quantify speech signal quality; however, PESQ scores were neither consistent with speech comprehension results nor with MOS scores. This discrepancy was also found in a previous study analyzing Digital Enhanced Cordless Technology (DECT) phones and their sound quality [9] . This raises the question: given its original purpose of measuring sound signal quality in conventional acoustic telephony for a normal hearing population, is the PESQ-score an adequate test of sound signal quality for VoIP in CIUs? MOS scores (subjective perceived voice quality), contrary to PESQ-scores, were consistent with the speech comprehension results. However, no statistically significant difference was observed. The results suggest a loss of speech signal quality by application of the WPC for conventional as well as for VoIP telephony. In contrast to the PESQ scores, VoIP telephony demonstrated statistically significant improved speech comprehension at 50 (A-VoIP), 60 (A-and B-VoIP), and 70 dB SPL (A-and B-VoIP) compared to A-GSM. Although additional video transmission results in a further 8.5% improvement in speech comprehension, this study excluded visual cues to specifically examine the effect of the WPC [13] . Overall, the application of the WPC resulted in improved speech comprehension at all tested background noise levels compared to A-GSM but only outperformed A-VoIP at a background noise level of 80 dB SPL WBN. This finding was expected because accessories such as the WPC offer improved coupling to the implant while reducing background noise. In addition, the speech processor's built-in microphone was still active to enable CIUs to continue to be aware of their surroundings while calling. CIUs have the option to switch off the built-in microphones to block or reduce background noise. The WPC, however, did not transmit the full frequency range and full intensity of the VoIP software. Based on previous studies and on the current data, we can conclude that signal quality is reduced in GSM (due to signal restrictions from the providers) [1, 9] , improves with accessories like the WPC in environmental background noise and is superior with VoIP, for which no additional benefit from the WPC can be observed. This is the first study investigating the benefit of VoIP telephony using a phone clip for direct coupling to the cochlear implant; however, our study had also limitations: We calculated the required sample size (n = 8) based on data from a previous study done by our group [11] . A post-hoc power-analysis demonstrated statistical power of more than 80% for the statistically significant results of our study; however, a higher number of study participants would have been preferable to increase the overall power of this study. All participants used the WPC for the first time and we cannot exclude a learning effect through repeated use. We also applied the phone clip only monaurally, and we would expect an increased benefit with binaural transmission to the implants [25] . Furthermore, only one VoIP application and one phone accessory were tested. Other accessories, such as induction neck loops, have not been tested. Built-in microphones remained switched on as this was the standard user setting. Other mapping strategies and microphone settings adapted to the WPC may have resulted in better voice and speech comprehension. Additionally, not all CIUs have access to high-speed mobile internet, which is a pre-requisite for utilizing VoIP. As previously discussed, there was also a discrepancy between measured and subjectively perceived voice quality. Measurement tools adapted to cochlear implant recipients are lacking. Professionals dealing with CIUs should advise the use of VoIP for distant communication. Smartphone applications in conjunction with fast and reliable mobile internet connections improve the overall communication experience of CIUs. Adding a phone accessory such as a WPC may be recommended for mobile calls in the presence of background noise such as street traffic, railway stations, etc. Future accessory developments for CIUs should support a broad frequency phone transmission range. Speech comprehension was best using a mobile phone application (A-VoIP) that took advantage of improved voice signal quality offered by newer technology. Accessories such as a WPC provide improvement with environmental background noise. Mobile VoIP calls both with and without WPC accessories result in superior speech comprehension compared to conventional mobile phone calls. Mobile and landline telephone performance outcomes among telephone-using cochlear implant recipients Telephone use and understanding in patients with cochlear implants Fine structure processing improves telephone speech perception in cochlear implant users Telephone usage in the hearing-impaired population Speech perception and communication ability over the telephone by Mandarin-speaking children with cochlear implants Effect of bandwidth extension to telephone speech recognition in cochlear implant users An investigation into the effect of limiting the frequency bandwidth of speech on speech recognition in adult cochlear implant users Effects of low pass filtering on the intelligibility of speech in noise for people with and without dead regions at high frequencies Influence of telecommunication modality, internet transmission quality, and accessories on speech perception in cochlear implant users Speech perception benefits of internet versus conventional telephony for hearing-impaired individuals How internet telephony could improve communication for hearing-impaired individuals Mobile internet telephony improves speech intelligibility and quality for cochlear implant recipients Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users An investigation of telephone use among cochlear implant recipients Hearing ability by telephone of patients with cochlear implants Evaluation of speech recognition over the telephone with and without the Cochlear Wireless Phone Clip Wireless streaming with the Cochlear Wireless Phone Clip improves speech understanding and reduces listening effort during telephone use in noise The HSM sentence test as a tool for evaluating the speech understanding in noise of cochlear implant users An evaluation of objective measures for intelligibility prediction of timefrequency weighted noisy speech Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs Revised annex A-reference implementations and conformance testing for ITU-T Recs P.862, P.862.1 and P.862.2 P.862 Amendmend 2, vol P.862.2. International Telecommunication Union Noise reduction using wavelet thresholding of multitaper estimators and geometric approach to spectral subtraction for speech coding strategy Objective evaluation of speech signal quality by the prediction of multiple foreground diagnostic acceptability measure attributes Studies on bilateral cochlear implants at the University of Wisconsin's binaural hearing and speech laboratory Acknowledgements The authors express their gratitude to the Foundation Besser-Hören-Schweiz, Allmendstrasse 11, CH-6312 Steinhausen, Switzerland and Cochlear AG, Peter Merian-Weg 4, CH-4052 Basel, Switzerland for funding of this study in equal parts. We additionally thank Cochlear AG additionally for providing the Wireless Phone Clip. Furthermore, we thank Jérémie Guignard (Cochlear AG) for his help in setting up the equipment measurements and Catherine S. Reid, MD for critical review and language correction of the manuscript. All authors contributed to the study and meet all authorship qualifying standards. The data that support the findings of this study are available from the corresponding author upon reasonable request. Ethical approval Ethics approval was obtained from the local institutional review board (ethical commission of the canton of Bern) (KEK-BE #2016-0135). The study was conducted in full compliance with the ethical standards as stated in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. Written informed consent including consent for publication was obtained from all participants.