key: cord-0800317-bi8zl6zs
authors: Bandyopadhyay, Anuja; Goldstein, Cathy
title: Clinical applications of artificial intelligence in sleep medicine: a sleep clinician’s perspective
date: 2022-03-09
journal: Sleep Breath
DOI: 10.1007/s11325-022-02592-4
sha: 280a17520312406521bc5a729aa70f26c9bd918a
doc_id: 800317
cord_uid: bi8zl6zs

BACKGROUND: The past few years have seen a rapid emergence of artificial intelligence (AI)-enabled technology in the field of sleep medicine. AI refers to the capability of computer systems to perform tasks conventionally considered to require human intelligence, such as speech recognition, decision-making, and visual recognition of patterns and objects. The practice of sleep tracking and measuring physiological signals in sleep is widely practiced. Therefore, sleep monitoring in both the laboratory and ambulatory environments results in the accrual of massive amounts of data that uniquely positions the field of sleep medicine to gain from AI. METHOD: The purpose of this article is to provide a concise overview of relevant terminology, definitions, and use cases of AI in sleep medicine. This was supplemented by a thorough review of relevant published literature. RESULTS: Artificial intelligence has several applications in sleep medicine including sleep and respiratory event scoring in the sleep laboratory, diagnosing and managing sleep disorders, and population health. While still in its nascent stage, there are several challenges which preclude AI’s generalizability and wide-reaching clinical applications. Overcoming these challenges will help integrate AI seamlessly within sleep medicine and augment clinical practice. CONCLUSION: Artificial intelligence is a powerful tool in healthcare that may improve patient care, enhance diagnostic abilities, and augment the management of sleep disorders. However, there is a need to regulate and standardize existing machine learning algorithms prior to its inclusion in the sleep clinic.

In recent times, artificial intelligence (AI) has entered our everyday lives, for example through hyper-personalized product suggestions based on our data and virtual assistants (i.e., "Alexa" and "Siri") in our households. Tracing the history of AI in medicine ( Fig. 1 ) demonstrates the rapid advancements over the past decade, due to a number of changes, which include the accrual of massive amounts of health data, greater computing power and storage capacity, and highly sophisticated algorithms powering AI applications.

AI refers to the capability of computer systems to perform tasks conventionally considered to require human intelligence, such as speech recognition, decision-making, and visual recognition of patterns and objects. While AI has gained popularity in several fields of medicine including radiology and oncology, the field of sleep medicine stands to greatly benefit from AI [1, 2] . Sleep is a physiological state marked by dynamic changes in a variety of organ systems, which is reflected by our use of the polysomnogram, which records various physiological signals across the night. Additionally, sleep tracking over long durations is ubiquitous given the availability and popularity of fitness trackers and smart watches. Therefore, sleep monitoring in both the laboratory and ambulatory environments results in the accrual of massive amounts of data. Large and complex datasets are amenable to analysis with AI algorithms, which uniquely positions the field of sleep medicine to gain from AI. Sleep medicine is expected to benefit from artificially intelligent computer programs to effectively score polysomnograms. However, use cases transcend automation and include improved diagnosis of sleep disorders, identification of the mechanisms underlying sleep disorders, treatment selection, and prediction of sleep disorder sequela. The greater insight provided by AI will have applications at both the level of individual patients and in population health.

The emergence of AI is well timed, as we start to realize the constraints of traditional medicine in bridging some of the knowledge gaps which challenge our ability to provide optimal patient care. The heterogeneity of endotypes, interindividual variability in treatment response, and the overreliance on the identification and quantification of specific "events" occurring during sleep studies have been widely discussed [3] . Researchers have successfully leveraged "big data" to offer new insights into sleep physiology, improve accuracy of diagnosis of sleep disorders, predict response and adherence to treatment, define endotypes, and use sleep parameters as predictors of future physical and mental health [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] . This holds promise for the future of sleep medicine, where AI will usher in an era of precision medicine with the advent of "Sleep-Omics" [14] . Integrating sleep physiological data with genetic/imaging markers will provide insight far beyond our clinic walls.

As AI rapidly evolves, it is important to demystify AI for the sleep clinician. This article reviews the basic principles that underlie machine learning algorithms and how to assess the performance of such algorithms in sleep study scoring. Additionally, we will review other applications of AI in sleep medicine and research and identify the challenges of implementing AI tools in clinical practice.

Broadly speaking, machine learning (ML) uses computer algorithms that improve with experience and prior data, without intervention from direct programming commands. Most ML tasks can be divided into supervised learning (learning to map an input x to an output y, based on a set of input-output examples [e.g., predicting human scored sleep stages from polysomnogram signals]), unsupervised learning (finding patterns or clusters in a set of inputs, without labeled output variables provided), or reinforcement learning (algorithms learn based on interacting with the environment and receiving penalties and rewards) (Fig. 2) . Recent advances utilize combinations of these strategies to develop new algorithms that may not clearly fit in one of these categories. Additionally, due to the intricacies of the mathematical or statistical model used, various control systems have been designed. Control systems regulate the behavior of other systems using control loops. The efficiency and precision of control systems can be improved with the help of machine learning algorithms utilizing innovative statistical and mathematical models [15] . For instance, the random forest techniques utilizes the principle of simple regression [16] . The details of control systems are beyond the scope of this review, but awareness of this technique is beneficial as the reader appraises the available literature regarding machine learning and sleep.

Multiple learning models are utilized for training the machine [17] . They are categorized in the following four types:

• Supervised learning is used when there is a training dataset that has a well-defined relationship between each input and expected output. Weights are adjusted in this training process to reduce the error of the predicted output from the expected output. • Unsupervised training involves the use of unlabeled inputs, given to the machine for training without known outputs. During training process, the machine is expected to identify patterns or grouping of the data. • Semi-supervised learning is a combination of the supervised learning and the unsupervised learning. • Reinforcement learning, like supervised learning, involves the use of a measurable outcome to guide training process rather than a predefined expected outcome for each input. This model is typically used when the input is stimuli from the environment rather than a compiled dataset.

These processes are analogous to the biological process of learning, where we would repeatedly study and memorize facts (most like supervised learning model), learn from observation (unsupervised learning model), or learn from trial and error (reinforcement learning model). Conventional machine learning algorithms involve feature extraction and classification. Feature extraction is a process by which an initial set of data is reduced by identifying key features of the data for machine learning. The inputs obtained through feature extraction would then be classified based on predetermined criteria.

In recent times, deep learning has emerged as one of the popular modalities of machine learning. Deep learning is inspired by the way a human brain works. Such biologically inspired computational networks which facilitate deep learning are known as neural networks. Unlike conventional machine learning, neural networks do not have to rely on feature extraction and can sometimes utilize raw signals as input. Thus, neural network is a step towards computers being able to perform tasks without explicit programming.

The development and optimization of ML algorithms is an iterative process involving a training dataset and previously unseen or "held-out" test data. Because ML algorithms learn from the provided data, models may overfit the training dataset. Therefore, use of a held-out test dataset is required to avoid biased (usually inflated) estimates of how well a model performs. Next, to ensure Fig. 2 Types of machine learning generalizability of the model, it is deployed on a completely independent test dataset (i.e., data obtained from a different study cohort or clinical population that uses different data acquisition methods). A simplified depiction of the process of ML algorithm development and testing is depicted in Fig. 3 .

AI sleep staging algorithms have utilized training datasets comprised of both healthy and unhealthy subjects. Notably, an algorithm trained on healthy subjects may demonstrate reduced performance when validated in an unhealthy population, for example, patients with neurodegenerative disease who may display electroencephalogram and electromyogram findings uncharacteristic for a given sleep stage [18] [19] [20] [21] . These datasets are easy to access (some are publicly available) and provide adequate data (even smaller cohorts may provide sufficient data as 800 30-s epochs are available per night of PSG recording). Disadvantages include variability in signal preprocessing and sampling rate and study subject characteristics that may not translate to the heterogeneity of patients and disease presentations seen in real-world clinical populations.

The rapid expansion of AI in sleep is evident from a Pubmed search on "Artificial Intelligence" and "polysomnogram" that shows more than 360 articles, with 204 articles (57%) published in the past 5 years.

One of the first evaluations of AI sleep scoring was the use of a learning vector quantizer and the induction of decision trees to stage polygraphic data in eight infants and demonstrated overall recognition accuracy of 75% [22] . An early use of neural networks specifically was presented by Schaltenbrand and colleagues in 1996 [23] . The automatic scoring of 61,949 epochs from 60 subjects with a neural network model achieved comparable agreement to human experts, with expert-model agreement and inter-expert agreement of 82.3% and 87.5%, respectively. This agreement improved to 90% with expert supervision for unknown or ambiguous epochs.

Over the next few years, several studies used neural network models to score sleep studies in patients with obstructive sleep apnea (OSA), epilepsy, Cheyne Stokes respirations, and Parkinson's disease [24] [25] [26] [27] [28] . While some of the studies focused on analyzing sleep spindles and power spectra of sleep for staging, others focused on integrating cardiorespiratory events to diagnose sleep-related breathing disorders while a few focused on snore signal [29] [30] [31] [32] [33] [34] . Interestingly, some of the earlier studies concentrated on use of AI to score sleep studies in infants, particularly those at risk for sudden infant death syndrome (SIDS) [35, 36] .

Additionally, during this time period, different methods to improve AI sleep scoring were explored; for example, a neural network model to identify sleep-disordered breathing events was iteratively refined with use of a supervised approach [37] . This entailed input from clinicians each time a new pattern was found. Whenever a clinician demonstrated good self-agreement, the neural network model was retrained. Over the next few years, there was a progression towards development of models which relied less on expert supervision. This led to newer approaches such as the fuzzy set theory which allowed modification of the morphological detection criteria and performed a detailed characterization of the identified events to approximate human intuition [38] .

More recently, different techniques were explored to develop ML polysomnogram scoring models (Table 1) . Conventional ML for sleep staging utilizes two main components: feature extraction and classification. The traditional styles of feature extraction relied on raw signal input through a variety of methods, including Fourier transforms, wavelet analysis, and Hilbert transform [39] [40] [41] . Feature extraction can also utilize time-frequency images, generated by shorttime Fourier transform or wavelet transform instead of raw signal inputs [18, 42] . Use of spectrograms has also been reported as a processing method prior to input of polysomnogram data [43] . With advances in ML, feature extraction techniques have evolved to reduce the number of features in a dataset by creating new features from existing ones. A thorough review can be found here [44] .

In addition to improvements in feature extraction, advances were also made in the realm of classification. Most of the published AI sleep staging to date has utilized convolutional or recurrent neural networks or a combination of both [45] . In general, neural networks utilize a network of filters and subsampling layers. While convolutional neural networks consider only the current input, recurrent neural networks consider the current input together with the previously received inputs and, therefore, are well-suited for sequential data.

Multiple considerations are needed regarding the data substrates for AI scoring of sleep and respiratory events. The type and number of channels used, the input of raw or processed signal, and artifact are factors in the development of ML algorithms deployed on polysomnogram data.

While most studies utilize electroencephalogram (EEG) signal for sleep staging, several studies have combined EEG channels with electromyogram (EMG) and electro-oculogram (EOG channels) [32] . General consensus is that use of multi-channel EEG improves performance.

Given the utility of home monitoring, there is growing interest in AI sleep staging from one or a few easily recorded physiological signals [46] . Electrocardiogram (ECG), respiratory effort, and photoplethysmography (PPG) have all demonstrated promise as alternative signals that can be leveraged for sleep staging [6, 8] . For example, a deep neural network that utilized both ECG and respiratory signals performed well in the classification of sleep stages and was not impacted by patient age or comorbid sleep disordered breathing. However, accuracy was lower compared to networks trained on EEG models [6] . Another group demonstrated accurate estimation of sleep time and differentiation between sleep stages with use of PPG signal, obtained from pulse oximetry [8] .

The ability of ML algorithms to estimate sleep stages from limited channels facilitates data acquisition in the ambulatory environment, particularly given the ubiquity of PPG in consumer facing technologies.

Similarly, for ML scoring of respiratory events, analysis often uses signal from traditional sensors (nasal oral thermistor, nasal pressure transducer, abdominal and thoracic respiratory inductance plethysmography, and oximetry). However, to automate scoring of data collected in the home, investigators have used ECG or PPG signal in isolation, or combined with a limited complement of traditional respiratory parameters [47] [48] [49] [50] . For example, studies have utilized ECG inter-beat intervals or heart rate variability (HRV) to detect respiratory events [6, 31] . Because PPG can estimate HRV and is widely available, ML algorithms may allow obstructive sleep apnea detection from consumer facing technologies.

ML algorithms can process raw or pre-processed signal. One of the popular options is to extract features from raw EEG signals for sleep stage classification [51] . However, an EEG spectogram can also act as input by first calculating the power spectral density. Power spectral density is the measure of the signal's power content versus frequency. The importance of power spectral density has been highlighted in studies which have shown an increase in delta and beta EEG activity in certain sleep disorders [30] .

For respiratory event scoring, several studies use raw airflow, respiratory effort, and oximetry signals as inputs [50] . However, another approach utilizes raw signals normalized based on the mean and standard deviation of the normal samples for each subject or employs a combination of raw input signals to reshape it into a matrix for classifier use [52] .

Raw signals can be contaminated with noise that can affect the classifier's performance. Basic band pass filters, as recommended by the technical specifications in the American Academy of Sleep Medicine (AASM) Manual for the Scoring of Sleep and Associated Events, can diminish this noise [53] . Several approaches have been utilized when using filtered signal as information source. For example, one group of 

AI algorithms require testing on unseen data to evaluate their performance, which is often achieved by a cross-fold validation process where the dataset is partitioned into several equal groups. A single group is retained to test model performance while the other groups are utilized for algorithm training. Several performance metrics are available to describe performance of AI algorithms. Sleep stages and obstructive sleep apnea severity classes (no, mild, moderate, or severe disease) are categorical constructs, and therefore, results can be represented as percent agreement with gold standard (visual scoring by a human expert). Use of Cohen's kappa instead of percent agreement is more stringent as it mitigates the effect of agreement occurring by chance.

Accuracy values should be approached with caution if used in isolation to describe algorithm performance. Specifically, accuracy can be misleading if there is an unequal number of observations in each class or more than two classes in the dataset. Use of the accuracy metric can lead to a situation where the model is completely and consistently misidentifying one class, but this misidentification is missed because on average, performance is good. A confusion matrix can overcome these issues. The confusion matrix identifies when the algorithm confuses two classes by counting the number of instances data is misclassified. Each row in a confusion matrix represents a predicted class, while each column represents an actual class. The number of correct and incorrect predictions for each class is calculated and represented in the confusion matrix. Therefore, the confusion matrix may provide a better gauge of performance than accuracy alone ( Table 2) .

Traditional two by two tables can also provide descriptive statistics when comparing a binary outcome (i.e., obstructive sleep apnea versus no obstructive sleep apnea) between algorithm and human. In this case, algorithm-identified cases are compared to cases based on visual scoring of respiratory events and described by true positive (TP, obstructive sleep apnea cases correctly identified by the algorithm), false positive (FP, healthy subjects incorrectly identified as obstructive sleep apnea cases by the algorithm), true negative (TN, healthy subjects correctly identified as normal by the algorithm), and false negative (FN, obstructive sleep apnea cases incorrectly identified as normal by the algorithm) values. Table 3 lists commonly used performance metrics.

One of the commonly encountered problems in classification predictive modeling is imbalanced classification. Most machine learning algorithms used for classification are designed around the assumption of an equal number of examples for each class. When classes are imbalanced, this results in models that have poor predictive performance, specifically for the minority class. This holds true in the realm of sleep studies, given that most of the nighttime period is sleep when healthy participants are used. Additionally, imbalance is present among sleep stages and N1 sleep can be misclassified since the percentage of N1 sleep is less compared to other stages of sleep. This can be overcome by balancing classes in the training dataset or by improving classification algorithms. For imbalance classification, how well the positive class was predicted or sensitivity (TP/(TP + FN)) may be of more interest than how well the negative class was predicted or specificity (TN/(FP + TN)).

Other challenges in appraising the performance of AI sleep staging and respiratory event scoring stem from characteristics of training and testing datasets. Training datasets are often derived from healthy populations or convenience samples. To diagnose sleep disorders, training datasets should consist of patients with heterogeneous sleep problems to facilitate deep learning.

Highlighting the need for diverse data sets, researchers found that having more data sources significantly improved classification performance and generalizability. Specifically, the group noted that using 75% of the PSGs available yielded just as high performance compared to using 100% once they included PSGs from five different sources [7] . This underscores the importance of availability of public datasets from multiple heterogenous populations. If a test dataset comes from the same sleep center, acquired with the same equipment and scored by the same human scorers, performance metrics may be falsely elevated, even with use of held-out, unseen data. Therefore, testing with use of an external, independent database is typically considered more reliable [18, 57] . There is considerable value in standardizing testing data from various sleep laboratories as well as standardizing performance metrics, which can help users compare different algorithms.

Notably, pediatric populations have been underrepresented and expansion of pediatric sleep datasets for algorithm development and testing is required.

Although the first obvious use for AI in sleep medicine is to automate the staging of sleep and scoring of respiratory events to reduce technician burden and decrease time from PSG recording to interpretation, other use cases will deepen our understanding of sleep disorders and the role of sleep in health and disease.

There is growing evidence that the underlying etiology (i.e., endotype) and clinical manifestation (i.e., phenotype) of OSA in an individual are not well described by the traditionally used AHI [58] . Artificial intelligence has paved the way for a better understanding of the various endotypes and phenotypes of OSA to form the foundation of personalized treatment for OSA. AI-assisted graphical models of chemoreflex feedback loop have been used to identify ventilatory instability in OSA patients which can guide treatment selection [59] . Routine polysomnographic characteristics and clinical data have been utilized to estimate upper airway collapsibility and arousal threshold using AI-assisted data-driven models [60] . Endotyping OSA through PSG is increasingly recognized as vastly important to our field and there is an increasing interest in making OSA endotyping algorithms accessible, inexpensive, and, ultimately, scalable [12] . Even adherence to treatment may be better predicted with use of ML. Compliance classifiers with CPAP therapy have enabled early prediction of compliant patients.

In addition to ramifications for personalized treatment, the use of AI in sleep disordered breathing is relevant for outcomes. Unsupervised and supervised clustering models were used to cluster 2277 OSA patients into sic phenotypes based on their polysomnogram data. The phenotypes show different risk for the development of cardio-neuro-metabolic comorbidity, unlike the conventional single-metric apnea-hypopnea index-based phenotype [61] .

Tools to improve risk stratification will also benefit from AI. Support vector machine-based models have been created utilizing clinical data for early identification of patients at risk for OSA presenting to a primary care clinic which may potentially prioritize them for sleep studies [9] .

While AI has made significant strides in the realm of sleepdisordered breathing, this innovative technology has been investigated for the evaluation and management of other sleep disorders including suspected central disorders of hypersomnolence. The objective confirmation of a central disorder of hypersomnolence requires a PSG followed by a multiple sleep latency test (MSLT). An MSLT entails 4-5 nap opportunities with recording of EEG, EOG, EMG, and EKG leads. Sleep onset latency for each nap (averaged as the mean sleep latency) and the presence of sleep onset stage REM (R) sleep are recorded. Completion of the overnight PSG and daytime series of nap opportunities is burdensome for the patient, and manual review of PSG and MSLT data is time-consuming, expensive, and subjective.

The central disorder of hypersomnolence, narcolepsy, type I (narcolepsy with cataplexy), is confirmed by reduced mean sleep latency on MSLT and at least 2 sleep onset stage R periods across overnight PSG and daytime MSLT. However, poor nocturnal sleep consolidation is also a characteristic feature of narcolepsy, type I. After development of an automatic classifier capable of separating sleep and [63] . Next, the derived 2-dimensional sleep state space projection was used to distinguish patients with narcolepsy, type I from controls by leveraging the known sleep state dissociation in narcolepsy patients. More recently, Stephansen and colleagues utilized deep learning to diagnose narcolepsy, type 1 from overnight PSG alone [57] . First, a hypnodensity graph was generated from PSG signal, which does not enforce a single sleep stage label, but instead assigns a membership function to each of the sleep stages. Therefore, use of neural networks not only automated sleep staging but allowed for more detailed representation of sleep trends over the course of the night. Next, deep learning was used to identify features of sleep state dissociation predictive of narcolepsy, type 1. Analysis of a single night of PSG was able to identify narcolepsy, type 1 with high sensitivity (91%) and specificity (96%) compared to the more laborious PSG-MSLT.

Narcolepsy, type 2 presents a different diagnostic challenge given the lack of cataplexy and poor test-retest reliability of the MSLT for this condition [64] . A stochastic gradient boosting (SGB) model was used to explore the features characteristic of type 1 and type 2 narcolepsy based on a dataset of individuals in the European Narcolepsy Network (EU-NN) [65] . The SGB model allowed for selection of features independent from existing diagnostic criteria and demonstrated the capacity to classify narcolepsy subtypes with high accuracy. Furthermore, the model can use a mixture of clinical features and identifies the most important features. Therefore, machine learning may identify novel potential candidates for future diagnostic criteria for narcolepsy, type 1 and 2.

To employ data sources beyond polysomnogram in the evaluation of excessive daytime sleepiness (EDS), Liu and colleagues utilized an artificial neural network of modified adaptive resonance theory to differentiate subjects with and without sleep disorders that cause EDS from normal control subjects based on EEG and pupil size [66] .

Insomnia can also benefit from AI analytic techniques, and one of the initial investigations in this area assessed singular spectrum analysis (SSA) of sleep EEG to differentiate paradoxical insomnia, psychophysiological insomnia, and control groups [67] . In 2016, Chaparro-Vargas et al. used 3 tandem models to distinguish insomnia patients from controls [68] . First, a preprocessing module was used that utilized state-space time-varying autoregressive moving average (TVARMA) processes to identify the features that characterize sleep onset. Next, a hypnogram generation module used a fuzzy inference system to infer sleep stages and the macrostructure of sleep architecture. Lastly, the characterization module compared hypnograms with similarity distances and used logistic regression to distinguish controls from insomnia patients. Another group trained deep neural network classifiers with features extracted from a maximum of two EEG channels and accurately differentiated patients with insomnia from controls [69] . When compared with manual scoring, the classifier had excellent discrimination accuracy between patients and controls using both (92%) or only one EEG channel (86%).

While most of these studies use PSG signal as a substrate for machine learning algorithms, other sleep data sources outside of the laboratory have been explored. For example, natural language processing techniques were used to extract causality from twitter messages that included stress, headache, and insomnia content [70] . Additionally, unsupervised learning has been applied to wearable data and identified 5 different clusters of insomnia activity [71] .

The use of AI in insomnia has expanded to include intervention. During the COVID-19 pandemic, a group of researchers devised a smartphone app called KANOPEE that allowed users to interact with a virtual agent that screened for sleep disturbances and delivered digital behavioral interventions. The program used decision tree architecture and interacted with users through natural body motion and voice [72] . AI digital screening and intervention tools, easily deployed through smart phone applications, confer the ability to provide behavioral interventions remotely, at scale.

The circadian timing system regulates a variety of biological processes in addition to the sleep-wake cycle. Therefore, misalignment of behavioral, light-dark, sleep-wake, and peripheral rhythms can produce detrimental impacts on human health. Data that demonstrate circadian oscillation can be derived from numerous sources and the level, degree, and impact of circadian disruption may vary; therefore, AI provides a unique opportunity to improve our understanding of circadian rhythms.

For example, researchers built an expert system that identifies the characteristics that contribute to negative effects of shift work and then selects mitigation efforts according to their importance in preventing these negative effects. With a fuzzy analytic hierarchy process model, the shift "expert" prioritizes prevention advice to shift workers at the individual and organization level [73] .

Additionally, given the difficulties in measuring circadian rhythms, AI has also been used to understand and predict circadian states. The cyclic ordering by periodic structure (CYCLOPS) algorithm uses machine learning to identify circadian rhythms at a molecular level including rhythmic transcripts in human liver and lung [74] . Another group of researchers utilized machine learning to predict circadian phase within 2 h from gene expression in peripheral blood samples [75] . A particular strength of this study was excellent predictive performance with use of an independent test set, suggesting generalizability of this circadian measurement.

Utilization of machine learning to predict circadian timing from gene expression has ramifications beyond sleep disorders. An application that has drawn considerable attention is precision timing of cancer treatment based on AI estimates of circadian timing. Chemotherapy timed in accordance with the patient's internal time may reduce toxicity and improve outcomes [76] .

Machine learning has not only allowed circadian timing predictions from peripheral blood samples, but also from data collected by ubiquitous wearable devices [77] . Real-time circadian tracking in the ambulatory environment from wearable devices may hold promise as an easy to use, inexpensive adjunct to expert clinical evaluation and management.

Appropriate diagnosis of REM sleep behavior disorder (RBD), which includes dream enactment behavior and loss of normal atonia of stage REM sleep during PSG, is crucial given its association with both co-morbid and incident alpha-synucleinopathy neurodegenerative disease. Furthermore, identifiable characteristics that separate individuals with idiopathic RBD (RBD in the absence of a neurodegenerative disorder) from patients with RBD in the setting of alpha-synucleinopathy (e.g., Parkinson's disease, dementia with Lewybodies, and multiple systems atrophy) could assist with the development of prediction tools. Christensen et al. utilized data driven topic modeling and unsupervised learning to characterize sleep EEG and EOG among controls, patients with periodic limb movements of sleep (PLMS), idiopathic RBD, and Parkinson's disease [28] . A Lasso regularized regression model was then used to differentiate patient groups. The most salient features were the number and stability of EEG topics linked to REM and N3, respectively, and the model was able to distinguish patients with idiopathic RBD from individuals with Parkinson's disease with a sensitivity of 91.4% and a specificity of 68.8%.

Another dilemma in RBD is the determination of REM sleep without atonia (RSWA). Scoring criteria and quantification metrics have been delineated, but the implementation of these rules in the context of manual, visual scoring is laborious [53] . Oftentimes, qualitative assessment of EMG tone in REM sleep is made, which results in a lack of standardization across sleep laboratories. Therefore, automation of the process is an area of active research which may benefit from AI [78] . For example, a random forest classifier was developed that used established RSWA metrics along with an EMG fractal exponent ratio between sleep stages and sleep architecture measures [79] . The random forest classifier that supplemented traditional computerized metrics with novel features related to sleep architecture was able to automate RSWA scoring and identify RBD with accuracy, sensitivity, and specificity of 0.96, 0.98, and 0.94, respectively, and outperformed automated scoring that uses traditional measures in isolation (atonia index, motor activity, and STREAM).

Apart from PSG data, machine learning that incorporates other clinical features, such as olfactory loss, cerebrospinal fluid measurements, and the results of functional imaging, with a diagnosis of RBD may allow model prediction of early, or even preclinical Parkinson's disease [80] . The ability to use clinical or PSG characteristics related to RBD combined with other features to identify individuals at risk for neurodegenerative disease is essential to the development of primary prevention therapeutics.

Movements during sleep may be incidental findings during PSG or may present clinically if troublesome to patients or their bedpartners. Periodic limb movements of sleep (PLMS) are highly prevalent among patients with restless legs syndrome but rarely seen as an isolated finding causing daytime symptoms (periodic limb movement disorder). PLMS are typically scored with use of the anterior tibialis EMG lead and deep learning has been used to automate this process with 85% accuracy; however, with use of a K-nearest neighbors algorithm, investigators could identify PLMS without use of EMG [50, 81] . Additionally, with use of machine learning analysis, novel data sources that do not contact the patient such as 3D cameras and infrared sensors were able to detect 75% of PSG confirmed PLMS [82] .

AI has been used outside the sleep laboratory in sleeprelated movement disorders in the diagnosis of restless legs syndrome by analyzing bed acceleration sensors with deep learning [83] .

An important use of AI beyond the diagnosis and treatment of defined sleep disorders is its application in population health, with emphasis on the relationship between disturbed sleep and morbidity and mortality. Sleep health is a multidimensional construct influenced by inherent, person-specific characteristics and external social and environmental demands. Optimal sleep health has been characterized by satisfactory subjective quality, alertness during desired wakefulness, appropriate timing, adequate duration, sufficient continuity, robust rhythmicity, and high regularity [84] . This comprehensive definition of sleep health provides a more inclusive description than isolated aspects of sleep such as duration and has relevance for individuals without diagnosable sleep disorders.

A multidimensional definition of sleep health has the potential to influence large-scale public health initiatives by informing screening programs and interventions that are more precise and comprehensive with the ultimate aim of improving not only sleep but other aspects of health and wellness. Wallace and colleagues applied three multivariable approaches to determine which sleep characteristics increased mortality risk in the osteoporotic fractures in men cohort [85] . Across multivariable approaches, lower sleep-wake rhythmicity, and continuity (assessed by actigraphy) increased the risk for all-cause mortality even after considering other important sleep, demographic, health, and behavioral risk factors. Notably, use of a random forest model, which is more flexible than traditional statistical models, allowed for the simultaneous consideration of potentially correlated variables and identified which facets of sleep health were the greatest driver of outcomes [85] .

AI also confers the ability to conduct scalable research, as evident by the over 11 million nights of wearable activity characterizing sleep duration and timing data by age and gender [5] . With a focus on younger populations, the application of structural equation modeling to almost 5000 children allowed researchers to assess repeated data and showed a bidirectional association between behavioral sleep problems and health related quality of life [13] .

In addition to ambulatory sleep information, PSG findings that may not be traditionally considered in the quantification of OSA severity, such as sleep fragmentation, oxygen desaturation magnitude, and the percentage of stage REM sleep, are independently related to mortality risk [67] . Therefore, PSG datasets can also inform population health with use of novel measures beyond the AHI. New insights on sleep microarchitecture were already obtained through automated detection of cyclic alternating pattern in older men and women from two community cohorts [4] .

AI algorithms alone will not fully delineate the role of sleep in health and disease, and a combined approach of advanced analytics, novel sensors, and measurement of sleep both in and outside of the laboratory is likely required. The Sleep and Obstructive Sleep Apnoea Monitoring with Non-Invasive Applications (SOMNIA) project helps support this goal as in addition to recording the usual signals, sensors not typically monitored as part of PSG are simultaneously recorded including suprasternal pressure monitoring, multielectrode electromyography of the diaphragm, wrist worn accelerometry and optical photoplethysmography, and mattress embedded sensors. Therefore, in addition to providing a data source that can be analyzed with machine learning algorithms to provide novel insight from data typically recorded in PSG, new sensors may demonstrate utility, and some are even adaptable for ambulatory use [86] .

Despite the huge advances made, there are some critical challenges to consider in the implementation of AI in clinical sleep medicine and sleep research. These include (1) logistics of creating datasets, (2) standardization of commercial algorithms, (3) limited data available for research, (4) regulation, and (5) integration of "omics" data. 1) One of the biggest challenges is creating training datasets. Most of the existing datasets using polysomnogram data are research datasets collected from a subgroup meeting certain inclusion criteria. Hence, they are not generalizable and not representative of what the clinician encounters in real practice. Another challenge is ensuring optimal data quality by reducing external noise and artifact. Finally, algorithm validation requires independent data sets that are sequestered and not available for training purposes. 2) With multiple commercial companies developing FDA cleared algorithms, there is a need to standardize commercial algorithms through certification by an accredited regulatory body. While FDA approval ensures that the algorithms are safe to use, the approval does not ensure clinical validity. This can be overcome by creating standardized certification programs, which will test the algorithms and disclose performance metrics on independent test sets. For appropriate use and generalization, the circumstances in which the data was collected and characteristics of the population the data were derived from should be well described. 3) There is an acute need for larger-scale research trials which can corroborate machine algorithm generated measures to clinically significant outcomes. This prompts the need for research datasets with heterogeneity in signals, patient demographics, sleep disorders, and clinical outcomes. Projects like SOMNIA are strides towards that direction. 4) There is a strong need for policies and best clinical practices regarding use of AI in sleep medicine.

There is a need to integrate data obtained through "omics" technology (transcriptomics, proteomics, metabolomics) with traditional health and demographic data with polysomnographically derived data [14] . This further emphasizes the need for a universal database formed by collaborative efforts across the sleep community.

In addition to concerns development, testing, and certification, clinical implementation of AI tools for sleep staging and respiratory event scoring will also require user interface improvements to streamline use [57] . Additionally, as many programs require upload of sleep data to external servers, security of protected health information is required. Issues regarding bias and health disparities require continued evaluation and mitigation to avoid scaling inequities.

Artificial intelligence in sleep medicine undoubtedly holds promise. There are currently FDA-cleared AI Scoring software available in the market. With regulation and careful standardization, these softwares can facilitate scoring. However, in its present form, it will still require health care provider oversight and clinical correlation will be strongly recommended. As the machines continue to learn, it will be imperative to continuously regulate these scoring systems.

With continued advancement in technology, AI scoring can be further utilized to identify polysomnogram features which are not easily identified by humans or are time/laborintensive. Examples include microspindles, sleep-wake transitions, and thoracoabdominal asynchrony. These features may assist in diagnosis as well as monitoring progression of several sleep disorders. Big data analysis of wearable/ nearable devices can be a very useful tool in the hands of the sleep clinician in determining an individual's sleep health. This can be utilized at the population health level to generate ideas on how to improve health issues including sleep deprivation. AI can improve clinic flow by voice-assisted documentation and automated organization of available clinical information from multiple sources, thereby allowing more time for physician-patient interaction. This is turn will augment physician-patient relationship.

In summary, AI has made considerable advancements in sleep medicine. Polysomnograms result in the acquisition of robust data, and AI applications will allow for improved understanding, screening, diagnosis, and management of sleep disorders. AI augmentation of the polysomnogram scoring process will allow for diversion of human effort and time from repetitive, laborious tasks to face-to-face patient care. Wearable technology and large-scale clinical databases can supplement the novel information extracted from polysomnograms with AI to improve our understanding of the role of sleep in human health and disease. However, there are certain challenges which preclude AI's generalizability and wide-reaching clinical application.

Ethics approval This article does not contain any studies with human participants or animals performed by any of the authors.

This study does not contain any human or animal subjects. Institutional IRB or informed consent was not required.

Author Anuja Bandyopadhyay declares that she has no conflict of interest. Author Cathy Goldstein is on the medical advisor boards of Huxley medical and eviCore. She receives royalties from UpToDate. She is 5% inventor of a circadian mobile application licensed to Arcascope, LLC.

Artificial intelligence and sleep advancing sleep medicine

Artificial intelligence in sleep medicine: background and implications for clinicians

Metrics of sleep apnea severity: beyond the apnea-hypopnea index

Characterization of cyclic alternating pattern during sleep in older men and women using large population studies

Gender differences in nighttime sleep patterns and variability across the adult lifespan: a global-scale wearables study

Sleep staging from electrocardiography and respiration with deep learning

Automatic sleep stage classification with deep residual networks in a mixed-cohort setting

Deep learning enables sleep staging from photoplethysmogram for patients with suspected sleep apnea

Support vector machine prediction of obstructive sleep apnea in a large-scale Chinese clinical sample

Beyond K-complex binary scoring during sleep: probabilistic classification using deep learning

Lower socioeconomic status and co-morbid conditions are associated with reduced continuous positive airway pressure adherence among older adult medicare beneficiaries with obstructive sleep apnea

A scalable method of determining physiological endotypes of sleep apnea from a polysomnographic sleep study

Sleep problems, internalizing and externalizing symptoms, and domains of health-related quality of life: bidirectional associations from early childhood to early adolescence

Sleep and Big Data: harnessing data, technology, and analytics for monitoring sleep and improving diagnostics, prediction, and interventions-an era for Sleep-Omics?

Machine learning in control systems: an overview of the state of the art. International Conference on Innovative Techniques and Applications of Artificial Intelligence

Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants

Basics of machine learning

An end-to-end framework for real-time automatic sleep stage classification

PhysioToolkit, and Physio-Net: components of a new research resource for complex physiologic signals

Montreal Archive of Sleep Studies: an open-access resource for instrument benchmarking and exploratory research

The sleep heart health study: design, rationale, and methods

AI-based approach to automatic sleep classification

Sleep stage scoring using the neural network model: comparison between visual and automatic analysis in normal subjects and patients

Validity of neural network in sleep apnea

Sleep studies of adults with severe or profound mental retardation and epilepsy

The utility of neural network in the diagnosis of Cheyne-Stokes respiration

Obstructive sleep apnea detection using SVM-based classification of ECG signal features

Data-driven modeling of sleep EEG and EOG reveals characteristics indicative of pre-Parkinson's and Parkinson's disease

Automated frequency analysis of synchronous and diffuse sleep spindles

Cardiorespiratory-based sleep staging in subjects with obstructive sleep apnea

ECG biomarkers for simultaneous detection of obstructive sleep apnea and Cheyne-Stokes breathing

Sleep versus wake classification from heart rate variability using computational intelligence: consideration of rejection in classification models

Automatic and unsupervised snore sound extraction from respiratory sound signals

An evaluation of cardiorespiratory and movement features with respect to sleep-stage classification

A fuzzy logic based apnoea monitor for SIDS risk infants

Multivariate analysis of full-term neonatal polysomnographic data

A new method for sleep apnea classification using wavelets and feedforward neural networks

Fuzzy structural algorithms to identify and characterize apnea and hypopnea episodes

A novel, fast and efficient single-sensor automatic sleep-stage classification based on complementary cross-frequency coupling estimates

Multi-class sleep stage analysis and adaptive pattern recognition

Sleep stage classification using single-channel EOG

Automatic sleep stage scoring with single-channel EEG using convolutional neural networks

Automated sleep stage scoring of the sleep heart health study using deep neural networks

Kourtidou-Papadeli C. A review on current trends in automatic sleep staging through bio-signal recordings and future challenges

Automated sleep scoring: a review of the latest approaches

Automatic sleep stage classification based on convolutional neural network and fine-grained segments

Deep learning approaches for automatic detection of sleep apnea events from an electrocardiogram

Development of a minimally invasive screening tool to identify obese pediatric population at risk of obstructive sleep apnea/ hypopnea syndrome

Deep recurrent neural networks for automatic detection of sleep apnea from single channel respiration signals

Expert-level sleep scoring with deep neural networks

A new method for automatic sleep stage classification

Convolutional neural networks on multiple respiratory channels to detect hypopnea and obstructive apnea events

The AASM manual for the scoring of sleep and associated events Rules, Terminology and Technical Specifications

Single channel ECG for obstructive sleep apnea severity detection using a deep learning approach

DeepSleep-Net: a model for automatic sleep stage scoring based on raw single-channel EEG

Automatic human sleep stage scoring using deep neural networks

Neural network analysis of sleep stages enables efficient diagnosis of narcolepsy

More than the sum of the respiratory events: personalized medicine approaches for obstructive sleep apnea

Advancing symptom science through symptom cluster research: expert panel proceedings and recommendations

A novel model to estimate key obstructive sleep apnea endotypes from standard polysomnography and clinical data and their contribution to obstructive sleep apnea severity

Combined unsupervised-supervised machine learning for phenotyping complex diseases with its application to obstructive sleep apnea

Sleep-Wake transition in narcolepsy and healthy controls using a support vector machine

Diagnostic value of sleep stage dissociation as visualized on a 2-dimensional sleep state space in human narcolepsy

Twice is nice? Test-retest reliability of the Multiple Sleep Latency Test in the central disorders of hypersomnolence

Exploring the clinical features of narcolepsy type 1 versus narcolepsy type 2 from European Narcolepsy Network database with machine learning

A neural network method for detection of obstructive sleep apnea and narcolepsy based on pupil size and EEG

Singular spectrum analysis of sleep EEG in insomnia

Insomnia characterization: from hypnogram to graph spectral theory

Deep learning and insomnia: assisting clinicians with their diagnosis

Extracting health-related causality from twitter messages using natural language processing

Clustering insomnia patterns by data from wearable devices: algorithm development and validation study

Smartphone-based virtual agents to help individuals with sleep concerns during COVID-19 confinement feasibility study

Expert system application for prioritizing preventive actions for shift work: shift expert

CYCLOPS reveals human transcriptional rhythms in health and disease

Universal method for robust detection of circadian state from gene expression

An optimal time for treatment-predicting circadian time by machine learning and mathematical modelling. Cancers (Basel)

Predicting circadian phase across populations: a comparison of mathematical models and wearable devices

Precision medicine in rapid eye movement sleep behavior disorder

Detection of REM sleep behaviour disorder by automated polysomnography analysis

High-accuracy detection of early Parkinson's disease through multimodal features and machine learning

Detection of periodic leg movements by machine learning methods using polysomnographic parameters other than leg electromyography. Computational and mathematical methods in medicine

Contactless recording of sleep apnea and periodic leg movements by nocturnal 3-D-video and subsequent visual perceptive computing

A domestic diagnosis system for early restless legs syndrome based on deep learning

Sleep health: can we define it? Does it matter?

Which sleep health characteristics predict all-cause mortality in older Men? An application of flexible multivariable approaches

Protocol of the SOMNIA project an observational study to create a neurophysiological database for advanced clinical sleep monitoring

U-Sleep: resilient high-frequency sleep staging

Automatic sleep-stage scoring in healthy and sleep disorder patients using optimal wavelet filter bank technique with EEG signals

Expert-level automated sleep staging of long-term scalp electroencephalography recordings using deep learning

Automated multi-model deep neural network for sleep stage scoring with unfiltered clinical data

Automatic analysis of single-channel sleep EEG in a large spectrum of sleep disorders

Erratum: Author Correction: Deep learning for automated sleep staging using instantaneous heart rate

Convolution-and attention-based neural network for automated sleep stage classification

Sleep stage classification using time-frequency spectra from consecutive multi-time points

Automated sleep stage scoring of the Sleep Heart Health Study using deep neural networks

A deep learning model for automated sleep stages classification using PSG signals

SeqSleepNet: end-to-end hierarchical recurrent neural network for sequence-to-sequence automatic sleep staging

Complex-valued unsupervised convolutional neural networks for sleep stage classification

A convolutional neural network for sleep stage scoring from raw single-channel EEG

A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series

Deep convolutional neural networks for interpretable analysis of EEG sleep stage scoring