key: cord-0079358-zgw2dy9p authors: Sinaki, Fatemeh Y.; Ward, Rabab; Abbott, Derek; Allen, John; Fletcher, Richard Ribon; Menon, Carlo; Elgendi, Mohamed title: Ethnic disparities in publicly-available pulse oximetry databases date: 2022-05-27 journal: Commun Med (Lond) DOI: 10.1038/s43856-022-00121-8 sha: f62409d0667c3fb9a84d9202de4cac4f3ea72ae3 doc_id: 79358 cord_uid: zgw2dy9p Inaccuracies have been reported in pulse oximetry measurements taken from people who identified as Black. Here, we identify substantial ethnic disparities in the population numbers within 12 pulse oximetry databases, which may affect the testing of new oximetry devices and impact patient outcomes. given ethnicity can express a range of skin pigmentation, it is generally agreed that those patients who self-identify as Black generally have a darker skin pigmentation than other ethnic groups. In order to investigate the proportion of individual ethnicities represented in publicly available pulse oximetry databases, we conducted a comprehensive assessment of accessible databases from 1st January 2012-1st January 2022 using PubMed consisting of Medical Subject Headings (MeSH) terms and Title/Abstract keywords. Applying the inclusion and exclusion criteria defined in Fig. 1 resulted in 12 research articles describing 12 publiclyavailable datasets to assess different medical conditions using pulse oximeter data. In total, as of January 28th 2022, these databases have been used to produce over 6214 citations according to Google Scholar including 3544 citations for Medical Information Mart for Intensive Care (MIMIC III) 9 ; 1049 citations for MIMIC II 10 ; 531 citations for IEEEPPG Dataset 11 ; 243 citations for Multiparameter Intelligent Monitoring in Intensive Care I (MIMIC I) 12 ; 239 citations for WESAD 13 ; and 5 citations for Medical Information Mart for Intensive Care IV (MIMIC-IV) 20 . We evaluated the existence of potentialdisparities in ethnicity based on the existing patient records as reported in the publicly available databases. In the absence of such information, the numbers of subjects of each category were inferred and quantified based on the authors' research institutions' locations or where the data was collected, as shown in Table 1 . To avoid any uncertainty in the results of ethnic disparity analysis for a given population, databases with inferred ethnicity information were excluded from the statistical analysis. Four databases for which data for ethnicity was clearly stated, MIMIC, MIMIC-II, MIMIC-III and MIMIC-IV, were included in the statistical analysis. The distribution of ethnic groups in the four databases is shown in Fig. 2 . We tested the statistical significance among all the subjects in the four databases considering a p-value <0.05 as statistically significant and analyzed the variance using a one-way ANOVA followed by post hoc test to provide simultaneous two-way interactions using the Tukey's honest significant difference criterion. The results indicated that there was a significant difference between the mean distributions of all racial groups; Asian and Black (p = 0.021), Asian and white (p = 4.10 × 10 −14 ), and Black and white (p = 5.01 × 10 −13 ). The same trend was observed between Other and Asian (p = 9.43 × 10 −05 ), Other and Black (p = 0.026), and Other and white (p = 4.82 × 10 −12 ). The results also suggested a higher proportion of white subjects compared to Asian, Black and other populations. These results demonstrate the existence of clear disparities in these key databases. Detailed results on the statistical separability tests for all pairs of demographic groups are provided in Table 2 . In the remaining databases in which ethnicity was not explicitly stated, the ethnic disparity is not known. However, if we examine the demographic statistics of each data set, based on location, we see that significant potential for disparity exists. For example, the Vortal dataset was collected in the UK in 2016, and the authors did not provide the race of each participant. Based on government records, we can infer the ethnic distributions based on UK ethnicity statistics: 7.5% Asian, 3.4% Black, 0.1% Other, and 80.0% white. The same method to infer ethnicity was used for the remaining databases, as shown in Table 1 . Furthermore, since the racial groups were not clearly defined, it does suggest a lax approach to the matter of constructing reference databases, mainly when applied to vascular optical measurement technology that can be influenced by skin color characteristics. White subjects appeared in all four MIMIC databases where the ethnicity was clearly stated, constituting an average of 73.19% of the total population. However, Black subjects only accounted for an average of 9.29% of the sample population. In addition, Asian subjects comprised an average of 2.67% of the total population investigated. Such distributions highlight the potential for racial and ethnic biases in algorithms and devices, leading to possible challenges in their wider application in medicine. Our findings highlight clear disparities in pulse oximetry databases. As these biased databases would be used during the premarket phase to adjust pulse oximeter accuracy and to develop algorithms for oxygen saturation determination, they place subjects with darker skin pigmentation at increased risk of unrecognized health conditions 3 . Such health inequalities necessitate the development of new pulse oximeter databases with more racially balanced populations. Our recommendation does not deny the value of exploiting existing biased databases; rather, it attempts to benefit from using these publicly available databases when testing developed algorithms, as well as aiming for more balanced populations in future databases. Asian and Black populations have low representation in existing databases and it would also be beneficial to create an increased number of databases from different geographical regions. Since last year, the US Food and Drug Administration has started to issue new guidelines to evaluate pre-and post-market pulse oximeters 3 , and to increase awareness of racial and ethnic disparities that can affect the accuracy of pulse oximetry algorithms. As publicly-accessible databases are commonly used for the development of many biomedical algorithms and devices, our findings highlight the need to improve device algorithms and expand these databases to better represent a diversity of skin pigmentations regardless of the racial or ethnic group. Improving diversity in public databases would help improve the general accuracy of AI algorithms, especially for measurements that involve frequently life-threatening conditions such as COVID-19. Supplementary Data 1 contains source data for the main figures in this manuscript. Pulse oximetry databases can be accessed via the following links: MIMIC-I (https://www. physionet.org/content/mimicdb/1.0.0/); CapnoBase (https://dataverse.scholarsportal. info/dataverse/capnobase#:~:text=The%20CapnoBase%20benchmark%20dataset% 20contains,that%20may%20arise%20during%20anesthesia.); MIMIC-II (https://archive. physionet.org/physiobank/database/mimic2wdb/); University of Queensland Vital Signs (https://outbox.eait.uq.edu.au/uqdliu3/uqvitalsignsdataset/index.html#:~:text= Introduction,at%20the%20Royal%20Adelaide%20Hospital.); IEEEPPG (https://zenodo. org/record/3902710#.YmsOVNrMKUk); MIMIC-III (https://physionet.org/content/ mimiciii/1.4/); Vortal (https://peterhcharlton.github.io/RRest/vortal_dataset.html); Wrist PPG Signals Recorded during Exercise (https://physionet.org/content/wrist/1.0.0/); WESAD (https://archive.ics.uci.edu/ml/datasets/WESAD + %28Wearable+Stress+and +Affect+Detection%29); PPG-BP (https://figshare.com/articles/dataset/PPG-BP_ Database_zip/5459299); PPG-DaLiA (https://archive.ics.uci.edu/ml/datasets/PPG-DaLiA); MIMIC-IV (https://physionet.org/content/mimiciv/1.0/). This figure combines all the databases used in the four publicly available pulse oximeter databases that clearly reported the distribution of ethnic groups. The data supports the hypothesis that disparities exist here. Significant differences are evident between white and Black (p < 0.0001), white and Asian (p < 0.0001), and Black and Asian populations (p = 0.021). All pairs of groups were tested by using a simultaneous pairwise Tukey test. The bottom and the top of the box are the 25th and 75th percentiles, and the line inside the box is the 50th percentile (median). Whiskers from minimum to maximum are determined with a 95% confidence interval. Response to: investigating sources of inaccuracy in wearable optical heart rate sensors. npj Digit Racial bias in pulse oximetry measurement FDA. Pulse oximeter accuracy and limitations: FDA safety communication The use of photoplethysmography for assessing hypertension A wearable tele-health system towards monitoring COVID-19 and chronic diseases Pulse oximetry for monitoring patients with COVID-19 at home. potential pitfalls and practical guidance A new conceptualization of ethnicity for social epidemiologic and health equity research The foundation of modern racial categories and implications for research on black/white disparities in health MIMIC-III, a freely accessible critical care database Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): a public-access intensive care unit database TROIKA: A general framework for heart rate monitoring using wrist-type photoplethysmographic signals during intensive physical exercise A database to support development and evaluation of intelligent intensive care monitoring Introducing WESAD, a multimodal dataset for wearable stress and affect detection An assessment of algorithms to estimate respiratory rate from the electrocardiogram and photoplethysmogram CapnoBase: signal database and tools to collect, share and annotate respiratory signals University of Queensland vital signs dataset: development of an accessible repository of anesthesia patient monitoring data for research Deep PPG: large-scale heart rate estimation with convolutional neural networks A new, short-recorded photoplethysmogram dataset for blood pressure monitoring in China Description of a database containing wrist PPG signals recorded during physical exercise with both accelerometer and gyroscope measures of motion MIMIC-IV (version 1.0) M.E. designed and led this investigation. F.S., R.W., D.A., J.A., R.F., C.M. and M.E. conceived the study. All authors approved the final manuscript. The authors declare no competing interests. Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s43856-022-00121-8.Correspondence and requests for materials should be addressed to Mohamed Elgendi.Peer review information Communications Medicine thanks Steve Greenwald and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.Reprints and permission information is available at http://www.nature.com/reprintsPublisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.