key: cord-0846268-rymvvdw2 authors: Guo, Zhenyu; Yu, Guangzheng; Zhou, Huali; Wang, Xianren; Lu, Yigang; Meng, Qinglin title: Utilizing True Wireless Stereo Earbuds in Automated Pure-Tone Audiometry date: 2021-11-19 journal: Trends Hear DOI: 10.1177/23312165211057367 sha: 35ac87635979a1814d0671c99f0401c06c7a69b0 doc_id: 846268 cord_uid: rymvvdw2 True wireless stereo (TWS) earbuds have become popular and widespread in recent years, and numerous automated pure-tone audiometer applications have been developed for portable devices. However, most of these applications require specifically designed earphones to which the public may not have access. Therefore, the present study investigates the accuracy of automated pure-tone audiometry based on TWS earbuds (Honor FlyPods). The procedure for developing an automated pure-tone audiometer is reported. Calibration of the TWS earbuds was accomplished by electroacoustic measurements and establishing corrected reference equivalent threshold sound pressure levels. The developed audiometer was then compared with a clinical audiometer using 20 hearing-impaired participants. The average signed and absolute deviations between hearing thresholds measured using the two audiometers were 3.1 dB and 6.7 dB, respectively. The overall accuracy rate in determining the presence/absence of hearing loss was 81%. The results show that the proposed procedure for an automated air-conduction audiometer based on TWS earbuds is feasible, and the system gives accurate hearing level estimation using the reported calibration framework. The prevalence of hearing loss has increased in the past decade, and this is now a serious burden on the global healthcare system (Wilson et al., 2017) . A serious problem is that people with mild-to-moderate hearing loss often ignore it, although hearing loss is associated not only with decreased speech communication but also with cognitive decline (Manheim et al., 2018; Wayne & Johnsrude, 2015) . To examine hearing function and provide suitable remedies, several hearing tests are widely used in clinical practice (Kanji et al., 2018) , for example, pure-tone audiometry (PTA), otoacoustic emissions, tympanometry, and auditory brainstem responses. Of these, PTA is regarded as the gold standard for hearing screening for people older than four years (Bright & Pallawela, 2016) . PTA is usually performed by an audiologist using a manual audiometer connected to calibrated air-or boneconduction transducers in a soundproof booth. Although airconduction PTA usually takes only about 10 min, both the equipment and audiologists for PTA screening may be in short supply (Margolis & Morgan, 2008) . In some underdeveloped and developing areas in particular, the extreme scarcity of hearing healthcare services means that the gap between supply and demand is large (Swanepoel et al., 2010) . Unfortunately, the COVID-19 pandemic may exacerbate this gap due to the infection risk posed by physical contact (Centers for Disease Prevention & Control, 2020) . Automated PTA was developed to reduce audiologists' workload (Margolis & Morgan, 2008) , and both its validity and effectiveness have been evaluated comprehensively (Mahomed et al., 2013) . With the rapid development of consumer electronics, many applications have been developed to provide self-administered hearing screening via mobile phones (Paglialonga et al., 2015) , tablets (Thompson et al., 2015) , and personal computers (Choi et al., 2007) . The automated PTA applications are implemented either locally on devices or remotely through the Internet (Honeth et al., 2010) . With low-cost and accessible features, these applications pave the way for measuring hearing levels (HLs) of people with potential hearing loss (Louw et al., 2017) . Some of the applications had impressive reliability (Margolis et al., 2018) and consistency with manual audiometry (Margolis et al., 2016; Thompson et al., 2015) , especially when researchers have imposed strict controls over the conditions (e.g., earphones, calibration, and test environment). Besides, with the increasing interest in self-fitting hearing aids (Keidser & Convery, 2016) in recent years, the requirement for accurate audiograms measured without supervision has increased. Most automated pure-tone audiometers require specifically designed hardware and standard transducers, which the general public cannot afford, thereby limiting their use. The practicality of these automated PTA applications is restricted if only standard earphones are used, although audiometer applications with some commercially available earphones can yield results consistent with those from clinical PTA (Chu et al., 2019; Foulad et al., 2013; Swanepoel et al., 2014) . In recent years, true wireless stereo (TWS) earbuds, especially those fitting in the cavum conchae (e.g., Apple AirPods, Huawei FreeBuds, Honor FlyPods), have become extremely popular globally. It would make sense to use TWS earbuds for automated hearing screening, given their popularity and portability. Moreover, since the digital audio signal is transmitted through Bluetooth (Bluetooth SIG, 2015) and since the analog audio output processing circuits are packaged in the bodies of the earphones (Floros et al., 2002) , wireless earphones can give consistent output sound levels across platforms (e.g., different mobile phones, tablets, or computers). In contrast, different platforms equipped with different audio interfaces (e.g., sound cards) may achieve inconsistent output sound levels with wired earphones (Foulad et al., 2013) . The consistent output across platforms means that once a pair of wireless earphones is calibrated, consistent PTA test accuracy across multiple platforms can be obtained, if used with a welldesigned PTA application. In summary, the popularity and calibration advantages of TWS earbuds make them a good option for automated PTA applications, which may help numerous people via self-administered hearing screening at home and prescreening in clinics. The purpose of the present study was to develop an automated PTA procedure using a pair of calibrated TWS earbuds (first-generation Honor FlyPods) and to assess whether it gives HLs consistent with those for a standard manual airconduction audiometer and earphones. Various factors influence the accuracy of PTA, including the measurement procedure, device calibration (Schmidt et al., 2014; Van Tasell & Folkeard, 2013) , cooperativeness of participants, environmental noise (Na et al., 2014) , and earphone position (Paquier et al., 2016) . In the following, we focus mainly on the first three issues. To avoid disturbance by environmental noise, all experiments with participants were conducted in a soundproof booth in the First Affiliated Hospital of Sun Yat-sen University (FAH-SYU; background noise was ∼21 dBA). The earphone position was checked before testing to ensure that the earbuds were tightly in the cavum conchae of the participant. All human tests were approved in advance by the research ethics review committee of FAH-SYU. A critical issue in obtaining accurate HLs lies in calibrating the electroacoustic system, especially for nonstandard earphones (Corry et al., 2017) . A standard calibration procedure is (i) to use a standard ear simulator or acoustic coupler combined with a sound level meter to calibrate the output sound level of a specific earphone in units of dB SPL and then (ii) to convert the sound level (in units of dB SPL) to hearing level (in units of dB HL) using: where RETSPL (reference equivalent threshold sound pressure level) is the average SPL corresponding to the audiometric zero level (i.e., 0 dB HL) of a group of otologically normal subjects. RETSPL values can be obtained either from existing standards (ANSI/ASA, S3.6-2018 , 2018 ISO 389-1: 2007 ISO 389-1: , 2007 ISO 389-8: 2004 ISO 389-8: , 2004 or by measuring the equivalent threshold sound pressure level (ETSPL, i.e., the hearing threshold in units of dB SPL measured by an electroacoustic system calibrated with specific calibrators) of a sufficient number of otologically normal subjects (Larson et al., 1988) . The choice of calibrator and earphone can affect the values of RETSPL (Wilber et al., 1988) . Moreover, the enclosure of the earbuds used in the present study differs from that of any standard earphones used for audiological applications, so the existing RETSPLs of those standard earphones cannot be adopted directly. Besides, there is no standard ear simulator for earbuds. As an alternative approach, Ho et al. (2017) and Kam et al. (2012) used a KEMAR manikin (Burkhard & Sachs, 1975) to calibrate the output SPL. After the electroacoustic calibration, the average hearing threshold of a group of otologically normal people needs to be measured. However, it has been shown that audiometric zero levels differ across regions (Borchgrevink et al., 2005; Wang et al., 2018) . The actual average HLs of otologically normal young people were often below 0 dB HL when measured using standard earphones and audiometers in these large-sample surveys. Thus, the values of the RETSPLs could vary with demographics. To obtain HLs consistent with those achieved with a clinical audiometer using standard earphones, one workable strategy is to compensate the RETSPL of earbuds with average HLs measured using standard earphones, as follows (Ho et al., 2017; Wilber et al., 1988) : where RETSPL c and RETSPL are the corrected and acoustically measured RETSPL of the earbuds, respectively. HearingLevel ave is the average HL (in units of dB HL) of an otologically normal subject group measured using a clinical audiometer with calibrated standard earphones. In this study, the aforementioned calibration procedure was implemented with a pair of FlyPods TWS earbuds. (i) First, the output level of the earbuds was calibrated using a head and torso simulator (HATS) (Brüel & Kjaer type 5128), which had anatomical pinnae, ear canals, and ear simulators (Brüel & Kjaer type 4620). Then the SPL of a pure tone (in dB) was mapped to digital audio magnitude (in root-mean-square) (Elberling & Crone Esmann, 2017) . The conversion factor from digital magnitude to SPL at each frequency was: (ii) After electroacoustic calibration, the HLs of 25 otologically normal participants (aged from 18 to 25 years, mean = 20.6 years) were measured twice via the automated audiometer with FlyPods. Provisional RETSPLs were obtained via Equation (1). (iii) 16 of the participants took hearing tests via a conventional audiometer (Otometrics Madsen Astera 2) with a pair of TDH 39 supra-aural earphones. As Table 1 shows, the average hearing thresholds of these normal-hearing participants did not reach 0 dB HL. (iv) After that, the average HLs of participants using TDH 39 headphones were used to correct the RETSPL values via Equation (2), and then the HL of a specific participant was calculated as: Two main kinds of methods have been used for PTA: (i) the modified Hughson-Westlake method (Carhart & Jerger, 1959) and (ii) the Békésy method (von Békésy, 1947) . The former is often used for manual audiometry in clinical practice. The sound levels of pure tone stimuli are changed according to the participants' responses, and positive responses in ascending sequences determine the hearing thresholds. The Békésy method, which can be used for mass screening (Jerger, 1962) , requires the participant to hold down or release a response button when a sound is heard or not, respectively. When collecting responses from participants in automated PTA with adaptive methods, either a yes-no task (Swanepoel et al., 2014) or a two-alternative forced-choice (2AFC) task (Schmidt et al., 2014) can be used. In this study, a specifically designed automated PTA procedure (illustrated later in Figure 1 ) was developed and the yes-no task was used. To simplify the task and reduce the testing time, only a 'heard' button was available to the subjects (Derin et al., 2016) rather than both 'heard' and 'not heard' (Louw et al., 2017; Margolis et al., 2010) . The system deemed that a participant had not perceived the pure-tone stimulus (i.e., negative response) unless a response was given within a certain time (i.e., positive response). Since participants were required to respond within a limited time, the latency of the wireless earphones and the automatic procedure were taken into account. Different people may have different reaction speeds, and the variability could be increased by the latency of the audio transmission associated with wireless earphones. In a pilot test (see Appendix), different people with different earphones showed vastly disparate reaction speeds. Therefore, a pre-test phase was used to estimate the participants' "average reaction time" (denoted by τ r and defined as the time from stimulus onset to response) before the formal HL test. In the pre-test phase, each participant underwent six trials with a 50 dB SPL 1-kHz pure tone (the level can be adjusted to suit the participant). The average reaction time across the last five trials determined the value of τ r for that participant. Note that the reaction time included the times for both audio transmission latency and participants' judgment. The stimulus duration and the inter-trial silence duration were adjusted according to the value of τ r , as described below. . Flow diagram of the automated audiometry procedure. The dashed lines highlight disparities between the automated procedure and the standard modified Hughson-Westlake procedure. 1 A catch trial appeared after every 10 normal trials. 2 The initial sound level was 45 dB SPL for the first frequency and the previously tested threshold plus 20 dB for the subsequent frequencies. The duration of the stimuli was uniformly and randomly distributed between 0.7 and 1.3 times the pretested average reaction time τ r of the current participant. 3 The response interval was between 100 ms after stimulus onset and 1 s after the stimulus ended. 4 T wait fluctuated randomly with a normal distribution (μ = τ + 1 s, σ = 0.5μ) with arbitrary lower and upper limits of 200 ms and 3000 ms. 5 Six successive reversals whose range did not exceed 10 dB. In the main test, each ear was tested at eight frequencies: 0.125, 0.25, 0.5, 1, 2, 3, 4, and 8 kHz. The test frequency ascended from 1 to 8 kHz and then descended from 1 to 0.125 kHz. Note that 1 kHz was tested twice. The first run at 1 kHz was used for training, and only the threshold measured at the second time was taken as the formal result. The flow diagram of measurement for each frequency (denoted a "block") is shown in Figure 1 . In each block, puretone stimuli were presented in sequence. The participant was required to respond using the button displayed on the screen. A stimulus and a corresponding response (either positive or negative) are called as a "trial". The stimulus duration fluctuated randomly around the estimated reaction time of the participant (uniformly distributed between 0.7 τ r and 1.3 τ r ). The duration was varied to prevent the participant from anticipating the next tone. The initial SPL for the first trial for the first test frequency (i.e., the first 1 kHz) was 45 dB. For the following frequencies except the second 1 kHz tone, the initial level was 20 dB above the HL for the preceding frequency. For the second 1 kHz block, the initial level was 20 dB above the HL for the first 1 kHz block. The sound level was changed according to the subject's response on preceding trials. If the participant failed to respond in time (between 100 ms after stimulus onset and 1000 ms after stimulus end), then the sound level of the next stimulus was increased by 5 dB; otherwise, the sound level was decreased by 10 or 5 dB (10 dB until 2 reversals had occurred). A 'reversal' is a change from a positive response to a negative response, and vice versa. Using a larger step in the early trials tends to accelerate convergence. When the stimulus was playing, it was terminated immediately when the participant responded. The next trial began after a silence with a random duration (T wait ). The fluctuating value of T wait prevented the participant from predicting the occurrence of the stimulus; T wait was calculated using a normal distribution (mean μ = τ r + 1 s, standard deviation σ = 0.5μ), and its upper and lower limits were set arbitrarily as 3000 ms and 200 ms, respectively, to avoid extreme durations. The procedure ran until six consecutive reversals occurred within a 10 dB range. If the number of trials reached 30, the test block was also terminated. The arithmetic mean of the sound levels at the last four reversals was taken as the HL for that frequency. There are several differences between our method and the modified Hughson-Westlake method ( Figure 1 ). (i) A catch trial (Schlittenlacher et al., 2018) was inserted after every 10 normal trials. If the participant responded during the 2 s catch trial, a voice prompt (combined with a text prompt) instructing the participant to focus was presented. (ii) If the participant responded during the waiting time, the sound level was increased by 10 dB. The purpose of these two schemes was to decrease the false alarm rates, since a high false alarm rate could lead to an incorrect low threshold (McNicol, 2005) . (iii) The estimated reaction time was measured in the pre-test phase and the durations of stimulus, maximum response time restriction, and inter-trial silence were adjusted accordingly. An experiment was conducted to evaluate the accuracy of the automated PTA with a pair of TWS earbuds compared to conventional PTA with a set of TDH 39 headphones. The clinical audiometer and earphones were the same as those used for calibration. Twenty participants (aged from 10 to 81 years, mean = 40.6 years) with mild-to-moderate hearing loss were recruited from FAH-SYU. Each participant took ∼20 min to complete the entire experiment including both types of PTA. One participant failed to finish the tests because of severe hearing loss in one ear, and so was excluded from the statistical analysis. PTA was conducted first using the conventional ascending method and then using automated PTA. PTA was performed for the left ear first. The earbuds were connected to a Surface Pro 6 laptop running a MATLAB program for the automated PTA. Most of the participants were able to finish the hearing screening without additional guidance. An assistant monitored the whole experimental procedure and offered support if participants encountered any difficulties. The audiograms of all the participants (denoted N1 to N19) are shown in Figure 2 . In most cases, there was a high degree of consistency between HLs measured via the two methods (53% of the threshold differences were below 5 dB). Nevertheless, there were large differences in some cases. For instance, large differences were observed for the right ear at 2 and 3 kHz for N1 and N13, whereas HLs for the other frequencies were highly consistent. The distribution of the HLs is shown in Figure 3 . Note that the orthogonal regression line is slightly above the symmetric line, which indicates that the participants tended to have higher HLs (i.e., a higher degree of hearing loss) with the automated PTA than with the standard PTA. To give a more intuitive view of how the results differ, the distributions of the automated-manual deviations are presented in Figure 4 . The average deviation (Masalski et al., 2018) across frequencies and ears was 3.1 dB with a standard deviation (SD) of 8.4 dB, whereas the average absolute threshold deviation was 6.7 dB with an SD of 6.0 dB (see Table 2 ). The proportion of absolute deviations below 5 and 10 dB for all thresholds was 53% and 78%, respectively. There was a relatively large difference between the two methods at 125 and 250 Hz, the proportions of deviations below 5 dB at these two frequencies being 37% and 34%, respectively. For the pure-tone average, which is typically used as an evaluation metric for hearing (the average of thresholds at 0.5 and 1, 2, and 4 kHz, Ho et al., 2009) , 63% and 92% of the absolute deviations were below 5 and 10 dB, respectively. In present study, having a HL above 25 dB HL was considered as impaired hearing (World Health Organization, 1991) . With this criterion, sensitivity, specificity, and accuracy were calculated (see Table 3 ) with reference to HLs obtained with manual PTA. Sensitivity is defined as the proportion of correct classification as impaired hearing, and specificity is defined as the proportion of correct classification as normal hearing. Accuracy is the overall proportion of correct classification both as impaired hearing and as normal hearing. The overall accuracy across frequencies was 81% and the results were similar to those from the HL difference analysis, that is, accuracy was higher at high frequencies. The time cost for automated PTA mainly depends on the response time for each trial and the number of trials in each test block. The response time depends on the estimated average reaction time obtained in the pre-test phase. The average response time across all participants and trials was 1.86 s (SD = 0.60). Figure 5 shows the distribution of the number of trials across all frequencies. The participants required 11.2 trials per block on average (median = 11, SD = 4.4). Overall, 91% of the blocks needed no more than 15 trials. Only 2% of the blocks reached the predefined ceiling of 30 trials. Responses were made on 13% of catch trials. The number of trials was not normally distributed according to the Anderson-Darling test. Therefore, a Friedman test was performed to examine whether the number of trials depended on the frequency. There was no significant difference in the number of trials across frequencies when excluding the training block. There was a significant difference between the training block and the 125 Hz block (p = 0.046). The medians for these two blocks were 12 and 9, respectively. In this study, automated PTA was developed using commercially available TWS earbuds (Honor FlyPods). Wireless earphones of this type have achieved considerable popularity in recent years, but few studies have evaluated their feasibility for PTA. In the procedure for estimating HLs, a user-dependent maximum effective response time, which is evaluated in a pre-test phase, is used to allow for different audio latencies Figure 5 . Box-and-whisker plots and violin plots of the number of trials for each frequency. The upper, median, and lower horizontal lines of each box indicate the 75 percentile, the median, and the 25 percentile, respectively. The whiskers attached to the boxes represent the most extreme data that are below 1.5 times the interquartile range (IQR). The widths of the violin plots depict distribution density of the trial number. Symbols indicate extreme cases far from the data centers. The first box on the left is for the training block, which was not taken into account in the threshold evaluation. and reaction speeds of the participants. A short maximum effective response time could make the test difficult for the participants, leading to an overestimated proportion of negative responses. Participants can hardly keep focused for a long time, and thus a long (Barbour et al., 2019) or even no time restriction (Margolis et al., 2010) may be unacceptable to them. Moreover, due to the additional latency of wireless earphones, the automated program may misjudge if participants do not respond in time. Thus, it is reasonable to adjust the maximum effective response time for each person and each set of earphones before testing. Automated PTA is often conducted without supervision from audiologists, and so the cooperativeness of the participants can influence the results, especially for children and older people (Yeung et al., 2013) . However, a widely accepted supervision scheme is yet to be established for selfscreening automated PTA. The proposed procedure included catch trials (Margolis et al., 2010) and responses were detected during a waiting period. In the verification experiment, responses were made on 13% (22 of 175) of the catch trials. This indicates that the participants may respond even when no tone is presented. To calibrate the FlyPods earbuds, a Brüel & Kjaer type 5128 HATS was used and the corresponding provisional RETSPLs were determined from 25 participants with normal hearing. It has been reported that there are differences in the average normal HLs of different populations (Borchgrevink et al., 2005; Wang et al., 2018) , and thus an audiometric zero level bias could exist between our participants and those using standard earphones. Of the otologically normal young participants, 16 out of 25 them underwent PTA using an Otometrics Madsen Astera 2 audiometer with TDH 39 headphones. Their average HL was used to compensate for the bias. It could be argued that it is unnecessary to perform an electroacoustic calibration because the RETSPLs were corrected to the levels of a standard earphone. A simpler calibration method would be to adjust the output level of non-standard earbuds (in units of dB SPL) to the output level of standard earphones driven by clinical audiometers (in units of dB HL) via calibrators directly rather than by obtaining a set of corrected RETSPLs (Kam et al., 2013) . The purpose of the preliminary electroacoustic calibration is that other earbuds with similar enclosures could be calibrated through electroacoustic measurement and RETSPLs easily, without requiring additional behavioral tests. In the present study, the average absolute deviation between HLs measured using automated and standard PTA was 6.6 dB across all frequencies and both ears. The proportions of absolute deviations below 5 and 10 dB were 53% and 78%, respectively. The threshold deviations between the automated PTA with FlyPods and manual PTA with the standard device increased at 125 and 250 Hz. Masalski et al. (2018) reported a similar phenomenon. Moreover, the values of RETSPL at low frequencies vary dramatically for different devices (Vencovsky et al., 2018) . Given that the output level was adjusted to match the level for a conventional audiometer, the lower accuracy at 125 and 250 Hz was unexpected. The error at low frequencies may be due to the earphone leakage (Welti, 2015) , since the state of contact between earbuds and ears varies dramatically between participants compared with TDH 39 headphone, which have soft cushions. In addition, automated PTA tended to give slightly higher HLs (see Figure 3 ). Compared with participants in the calibration procedure, a lower average capacity of attention of participants in the verification experiment may have caused this bias, since these participants' responses during the waiting intervals led to increments of the sound level. In tests with participants, there are often differences between two repeated tests even with the same device and under the same conditions. Margolis et al. (2010) reported that the average absolute deviation between test and retest for clinical PTA was 4.1 dB. A. T. Ho et al. (2009) reported that test-retest absolute deviations below 5 and 10 dB for clinical PTA occurred 74% and 93% of the time, respectively. Swanepoel et al. (2010) found that absolute deviations below 5 and 10 dB occurred 45% and 88% of the time, respectively. In summary, the HLs obtained using automated PTA with FlyPods were slightly higher than those obtained using clinical PTA with TDH 39 headphone at low frequencies, but the errors were comparable to those for conventional manual PTA. The sensitivity and specificity across frequencies for the automated PTA with FlyPods were 95% and 63%, respectively. The same two metrics were 82% and 83% in Louw et al. (2017) using hearScreen TM with Sennheiser HD 202 II, and 25% and 97% in Derin et al. (2016) using the EarTrump application with a Philips SHP 1900 headset. In Thompson et al. (2015) , the sensitivity and specificity varied with frequency over the ranges 87%-100% and 90%-95%, respectively. They used TDH 39 headphones for both the clinical audiometer and the automatic audiometer application. In another comparative study, Yeung et al. (2013) also used TDH 39 headphones with automated PTA, and obtained 93% sensitivity and 95% specificity. Moreover, in a test using ER-3 earphones for both automated and clinical PTA, 89% of HL differences were below 5 dB (Mosley et al., 2019) . It seems that standard earphones are more likely to give HLs consistent with those for clinical audiometry. Compared with these studies, especially those using standard earphones, the sensitivity of the automated PTA proposed here is high, but the specificity is lower. A median of 11 trials was needed to complete a test block, and participants took an average of 1.86 s to respond to each stimulus. The first training block required more trials because the initial sound level was usually far from the threshold. The closer the initial level was to the HL of the participants, the fewer trials were required, so setting the initial level according to the preceding measured threshold could help to accelerate the measurements (Carhart & Jerger, 1959) . Thus, adopting methods based on statistical estimation or machine learning (Barbour et al., 2019; Schlittenlacher et al., 2018; Schmidt et al., 2014; Song et al., 2015) could further reduce the time cost. The present study describes a method for using FlyPods TWS earbuds in automated PTA. The procedure could guide the development of an audiometer application and calibration. The results show that the system can provide accurate HL estimation without relying on an audiologist with appropriate design and calibration. Although the proposed system cannot replace conventional PTA entirely, people whose hearing is potentially impaired could take such a selfscreening test regularly. Moreover, large clinical centers could benefit from automated audiometers by using them as a prescreening tool, which may release the audiologist from checking normal-hearing individuals. In particular, this form of contactless remote PTA could continue to provide hearing evaluations during the COVID-19 pandemic. Moreover, self-screened audiograms could be utilized in tuning self-fitting hearing aids. The proposed automated PTA proposed here did not use a noise detector (Swanepoel et al., 2014) , but in actual applications, a reliable noise detector is essential for ensuring that the automated PTA is performed in a suitable environment. Also, the automated PTA did not use contralateral masking when there was a large (≥ 30 dB) left-right threshold disparity (Margolis et al., 2010; Swanepoel et al., 2010) . However, the hearing-impaired participants involved in this study had few asymmetric HLs (only 9 of 152 samples had an acrossear difference larger than 30 dB), so this limitation had little effect on the results. The present study assessed the accuracy of automated PTA with FlyPods TWS earbuds through a subjective experiment. Nonetheless, complete electroacoustic output measurements (Van der Aerschot et al., 2016) of the FlyPods were not made, nor was the consistency across multiple pairs of the same type of earbuds evaluated. Variability of the output level could lead to additional errors in HL measurements. However, compared with the deviations between test and retest of PTA, modern earphones often have acceptably low output variability. Foulad et al. (2013) reported that the output-level differences among pairs of earphones were within 4 dB. A HATS was used for the electroacoustic calibration since no suitable ear simulator or acoustic coupler was available for the earbuds used in this study. However, a HATS with anatomical pinnae is less robust and reliable than ear simulators, because the coupling is less reliable. During the verification experiment, all participants were tested using manual PTA first and automated PTA later. This unbalanced test order may have led to biased results. Another limitation is that the number of participants, 20, was relatively small. Given the aforementioned limitations, future work should consider (i) monitoring the noise environment, (ii) contralateral masking for large left-right disparities, (iii) comprehensive electroacoustic measurements and checking consistency across multiple devices, (iv) balancing the order of manual and automated testing, and (v) recruiting more participants. This study utilized FlyPods TWS earbuds in automated PTA and verified the feasibility and validity of the method through an experiment. The verification experiment showed that with reasonable calibration and PTA procedure design, the proposed self-screening PTA gave results consistent with those from clinical PTA. The average signed and absolute deviations between the two types of PTA were 3.1 dB and 6.7 dB, respectively, but there were large differences at 125 and 250 Hz. For hearing loss detection, the overall accuracy was 81%, but was lower at low frequencies. Although a further performance evaluation may be needed for low frequencies, the present experiment shows that it is feasible to conduct automated air-conduction PTA with affordable TWS earbuds. Forty participants (aged from 11 to 57 years, mean = 27 years) were recruited remotely through the Internet, and the automated PTA software was sent to them via the Internet. Each was required to perform the hearing test three times using their own device and earphones in a quiet environment. The reaction times estimated from the pre-test phase were recorded. Of the 40 participants, 26 submitted their test results. The final reaction time for each participant was determined as the average across the three repeated tests. Figure 6 shows the distribution of reaction times. The average reaction time was 968 ms with a standard deviation of 695 ms and a range from 450 to 3201 ms. Further, 42% of the reaction times were in the range 500-750 ms. The results show that different devices and participants may have very different reaction times. Therefore, using a fixed effective response time in automated PTA could lead to inaccuracies if the reaction time of a participant is larger than the limit. Note that the reaction time may depend on the device, so the values reported here cannot be used as reference data. Specification for Audiometers Online machine learning audiometry The Nord-Trondelag Norway audiometric survey 1996-98: Unscreened thresholds and prevalence of hearing impairment for adults >20 years Validated smartphone-based apps for ear and hearing assessments: A review Anthropometric manikin for acoustic research Preferred method for clinical determination of pure-tone thresholds Interim US guidance for risk assessment and work restrictions for healthcare personnel with potential exposure to COVID-19 PC-based tele-audiometry A mobile phone-based approach for hearing screening of school-age children: Cross-sectional validation study The accuracy and reliability of an app-based audiometer using consumer headphones: Pure tone audiometry in a normal hearing group Initial assessment of hearing loss using a mobile application for audiological evaluation Calibration of brief stimuli for the recording of evoked responses from the human auditory pathway A study of wireless compressed digital-audio transmission Automated audiometry using apple iOS-based application technology Computer-assisted audiometry versus manual audiometry Reference equivalent threshold sound pressure levels for Apple EarPods An internet-based hearing test for simple audiometry in nonclinical settings: Preliminary validation and proof of principle Acoustics: Reference zero for the calibration of audiometric equipment. Part 8: Reference equivalent threshold sound pressure levels for pure tones and circumaural earphones Acoustics-Reference zero for the calibration of audiometric equipment: Part 1: Reference equivalent threshold sound pressure levels for pure tones and supra-aural earphones Békésy audiometry Automated hearing screening for children: A pilot study in China Clinical evaluation of a computerized self-administered hearing test Newborn hearing screening protocols and their outcomes: A systematic review Self-Fitting hearing aids: Status Quo and future predictions Reference threshold sound-pressure levels for the TDH-50 and ER-3A earphones Smartphone-Based hearing screening at primary health care clinics Validity of automated threshold audiometry: A systematic review and meta-analysis Age, hearing, and the perceptual learning of rapid speech Home hearing test™: Within subjects threshold variability AMTAS: Automated method for testing auditory sensitivity: Validation studies Validation of the home hearing test Automated pure-tone audiometry: An analysis of capacity, need, and benefit Hearing tests based on biologically calibrated Mobile devices: Comparison With pure-tone audiometry A primer of signal detection theory Reliability of the home hearing test: Implications for public health Smartphone-based hearing screening in noisy environments Apps for hearing science and care Effect of headphone position on absolute threshold measurements Audiogram estimation using Bayesian active learning A user-operated audiometry method based on the maximum likelihood principle and the two-alternative forced-choice paradigm Fast, continuous audiogram estimation using machine learning Hearing assessment-reliability, accuracy, and efficiency of automated audiometry Smartphone hearing screening with integrated quality control and data management Accuracy of a tablet audiometer for measuring behavioral hearing thresholds in a clinical population Affordable headphones for accessible screening audiometry: An evaluation of the Sennheiser HD202 II supra-aural headphone Reliability and accuracy of a method of adjustment for self-measurement of auditory thresholds Reference equivalent threshold sound pressure levels for nonaudiometric headphones A new audiometer A crosssectional study on the hearing threshold levels among people in Qinling A review of causal mechanisms underlying the link between age-related hearing loss and cognitive decline Improved measurement of leakage effects for circum-aural and supra-aural headphones Reference thresholds for the ER-3A insert earphone Global hearing health care: New findings and perspectives. The Lancet Report of the informal working group on prevention of deafness and hearing impairment programme planning The new age of play audiometry: Prospective validation testing of an iPad-based play audiometer We thank Z. Ellen Peng, and Changxin Zhang for suggestions on manuscript writing. We also express our gratitude to Brian C. J. Moore, Robert H. Margolis, and an anonymous reviewer for their kind revision and helpful comments. We express our gratitude to all the participants in this study. The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. In conventional manual PTA, the audiologist can adjust the maximum response time according to participant's reaction time. For automated PTA, it is necessary to estimate the reaction times of participants wearing specific earphones. Consequently, it is meaningful to study the variation of reaction time. Before designing the automated audiometer, a preliminary experiment was carried out to assess the variation in reaction times for different participants and devices.