key: cord-0791353-zzs6z49c authors: Khanzada, Amil; Hegde, Siddhi; Sreeram, Shreya; Bower, Grace; Wang, William; Mediratta, Rishi P.; Meister, Kara D.; Rameau, Anaïs title: Challenges and Opportunities in Deploying COVID-19 Cough AI Systems date: 2021-09-07 journal: J Voice DOI: 10.1016/j.jvoice.2021.08.009 sha: f1915615b842a46d01dc1e7ce16a1839fa24ad0b doc_id: 791353 cord_uid: zzs6z49c nan What if COVID-19 could be detected by cough and voice sounds? A reliable diagnosis based on cough and voice sounds would provide a fast and convenient way to detect COVID-19 in both symptomatic and asymptomatic individuals. During the COVID-19 pandemic, several groups, including OpenSigma with the Massachusetts Institute of Technology, AI4Covid-19 with the University of Oklahoma, COVID-19 Sounds with the University of Cambridge, COUGHVID with the Swiss Federal Institute of Technology Lausanne, and Saama AI research, and Wadhwani AI with the Bill and Melinda Gates Foundation, have successfully isolated acoustic patterns characteristic of COVID-19 with smartphone-recorded voluntary cough and voice samples using artificial intelligence (AI) [1] [2] [3] [4] [5] [6] . However, none have been able to scale this approach to a usable algorithm for the world population. The challenges and opportunities in implementing and deploying a COVID-19 diagnosis using cough and voice sounds currently limit the widespread use of AI-based diagnostic tools. Opportunities such as global collaborations have fueled continued efforts in the field. The most common limitation encountered in developing cough-based AI diagnostics stems from the small size and poor quality of the acoustic data on which the algorithm is trained. To acquire large datasets, COUGHVID, COVID-19 Sounds, Virufy, and the Indian Institute of Science's Coswara have turned to crowdsourcing [3, 4, 7, 8 ]. These groups have been able to obtain thousands of labeled cough sound samples, as well as user-reported polymerase chain reaction (PCR) results via app-based collection tools [4, 8] . Although useful for training baseline AI models, such non-validated data fall short for the development of reliable and replicable screening systems, as participating users may not represent a fair sample of the population at large. Additionally, incentives for user participation beyond altruism are often lacking, and marketing requires significant investment for large data collection. High data quality is also 3 essential, and care must be taken to minimize background noise and prevent audio signal clipping. AI model developers must also take into consideration the impact of audio compression and microphone quality among various smartphone devices on algorithm performance. Another significant and related challenge is data privacy. Regulations like those set by the European General Data Protection Regulation (GDPR) threaten multi-million-dollar fines for inconsistent consent or data leakage, especially with respect to sensitive biometric-identifying data, such as audio samples coupled with PCR test results. In some cases, different groups have collected large quantities of data only to learn later that they were unshareable. Additionally, lacking sufficient legal coverage, several organizations including COUGHVID and Coswara have opted for fully anonymous data collection, avoiding GDPR, which comes with its own pitfalls [4, 8] . Furthermore, regulatory approval of AI-based technologies presents challenges. Despite pressure by the World Health Organization to accelerate approval for COVID-19-related innovations, regulatory bodies such as the U.S. Food and Drug Administration (FDA) can be slow-moving, and historically, they have been slow with AI technologies, although efficiency has been improving in recent years [9] . From our research, no US-based groups working on cough sound detection of COVID-19 have yet met the FDA's bar for sensitivity and specificity for diagnostic devices, and thus researchers have been unable to move into the general US population. Additionally, there is a lack of precedent for approval of respiratory diagnostic AI devices, with notable exceptions including ResApp, an asthma-detection smartphone app approved for Conformité Européene (CE) marking after meeting the regulatory requirements of the European Union. While it is CE marked in Europe and approved by the Australian Therapeutic Goods Administration (TGA), the ResApp technology was rejected by the FDA, further muddying the waters on the regulation of AI diagnostics [10]. 4 There may be technological, regulatory, and societal roadblocks obstructing development of widespread AI-based diagnostic tools for COVID-19, but these hurdles are not insurmountable. In fact, the pandemic has provided a unique opportunity to bring together enthusiastic groups of diverse volunteers to tackle this crisis. Academic and nonprofit professionals have driven synergistic efforts leading to expanded and open-access datasets. For instance, Coswara and COUGHVID, both university-based, have published their methods, code, and data fully [4, 8] . Among open-source data efforts, a lack of standardized data collection protocols and data formats has produced inconsistencies among datasets, including acoustic signal quality and collected demographic data. More recently, however, there have been increasing efforts to define a common data format and universal collection protocol through regular exchanges between medical academics, AI engineers, and legal experts across several countries. Such a successful global endeavor relies on the shared values of maintaining open-source and transparent datasets and algorithms, and a commitment to the vision of making AI-based diagnostics a worldwide reality by overcoming regulatory and social challenges. Leveraging innovative technologies is not sufficient for deploying a COVID-19 AI diagnostic solution. This objective is only possible through mobilizing a diverse and multi-national team of experts and volunteers committed to open-source algorithms and standardized data collection for the good of humanity. Funding: None. COVID-19 artificial intelligence diagnosis using only cough recordings AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms Pay attention to the cough: early diagnosis of COVID-19 using interpretable symptoms embeddings with cough sound signal processing Coswara -a database of breathing, cough, and voice sounds for COVID-19 diagnosis The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database We thank Nathaniel Braun, Jack Scala, and Kiarash Shamardani for their comments and help. 5