key: cord-0284608-s0zgy1mn
authors: Diepen, A. van; Bakkes, T. H. G. F.; Bie, A. J. R. De; Turco, S.; Bouwman, R. A.; Woerlee, P. H.; Mischi, M.
title: A Model-Based Approach to Synthetic Data Set Generation for Patient-Ventilator Waveforms for Machine Learning and Educational Use
date: 2021-03-29
journal: nan
DOI: 10.1007/s10877-022-00822-4
sha: 6c688c2378ec98bd57fb44d4931b4b6d7bd32db5
doc_id: 284608
cord_uid: s0zgy1mn

Although mechanical ventilation is a lifesaving intervention in the ICU, it has harmful side-effects, such as barotrauma and volutrauma. These harms can occur due to asynchronies. Asynchronies are defined as a mismatch between the ventilator timing and patient respiratory effort. Automatic detection of these asynchronies, and subsequent feedback, would improve lung ventilation and reduce the probability of lung damage. Neural networks to detect asynchronies provide a promising new approach but require large annotated data sets, which are difficult to obtain and require complex monitoring of inspiratory effort. In this work, we propose a model-based approach to generate a synthetic data set for machine learning and educational use by extending an existing lung model with a first-order ventilator model. The physiological nature of the derived lung model allows adaptation to various disease archetypes, resulting in a diverse data set. We generated a synthetic data set using 9 different patient archetypes, which are derived from measurements in the literature. The model and synthetic data quality have been verified by comparison with clinical data, review by a clinical expert, and an artificial intelligence model that was trained on experimental data. The evaluation showed it was possible to generate patient-ventilator waveforms including asynchronies that have the most important features of experimental patient-ventilator waveforms.

Mechanical ventilation is the most common mean of life support applied at the intensive care unit (ICU) [1] . Mechanical ventilation supports ventilation in many patients after major surgery or in critically ill patients with respiratory failure such as during the recent COVID-19 pandemic [2, 3] . The goal of mechanical ventilation is to maintain gas exchange that sustains life while minimizing ventilator-induced lung injury (VILI) and work of breath. Different modes can be applied to mechanically ventilated patients. A mandatory mode is often chosen in patients with more severe respiratory problems as it allows clinicians to completely control ventilation in these patients, who therefore require sedatives. A supportive mode of ventilation, such as pressure support ventilation (PSV), is preferred when a patient's pulmonary condition improves. In general, since this mode of ventilation is triggered (initiation of a breath) and cycled (termination of a breath) by the patient, it is usually perceived as more comfortable and allows weaning from sedation and eventually from mechanical ventilation.

Mismatches between the patient and the mechanical ventilator during both modes of ventilation can adversely affect the objective of minimizing VILI and work of breath. These so-called patient- ventilator asynchronies are associated with worse outcomes such as discomfort and increased mortality. However, a direct causal relationship has not yet been scientifically established, nor have we been able to demonstrate if a reduction of asynchronies results in better outcomes [4, 5] . The identification and quantification of asynchronies are therefore crucial to clarify if asynchronies are a direct causative factor. However, detecting patient-ventilator asynchronies by trained clinicians is extremely challenging. The recognition of asynchronies based on the bed-side analysis of waveforms ( Figure 1 ) is difficult. Also, trained clinicians are not able to continuously assess these waveforms while asynchronies can occur at any moment. Continuous monitoring with computer algorithms can overcome these barriers and help us to detect, analyze, and maybe even predict asynchronies.

Studies indicate that machine learning algorithms can be used to autonomously detect asynchronies [4] . However, classic machine learning approaches are presently insufficiently accurate for practical application [6] . Neural networks provide an interesting opportunity to elevate the quality of automated asynchrony detection.

Unfortunately, neural networks are notorious for requiring large amounts of labeled data for training, of which there is a shortage [7] . Data acquisition in the field is expensive and greatly complicated by the need for advanced monitoring and by regula-tory issues. Apart from the acquisition difficulties, labeling has to be performed manually by an expert, which is labor-intensive, and prone to errors and ambiguities [6] . The scarce labeled data that are available do not contain sufficient examples for training and independent testing. Further complications arise from the variety of asynchronies, which manifest themselves differently in the data. On top of this, various disease archetypes also alter the measurement data and the distribution of asynchronies throughout the data, further stretching the need for a large and diverse data set.

A common approach when dealing with insufficient data is to augment measurements with synthetic simulations. In computer vision, researchers have successfully applied this technique on multiple occasions [8, 9, 10, 11] . The advantages of synthetic data are that they are generated by an independent source, labeling is always accurate, and there is full control over the data generation process. However, to obtain accurate and sufficient synthetic data, precise yet fast simulation models are imperative. The goal of this study was to investigate the feasibility of generating a labeled synthetic data set of patient-ventilator waveforms for training and testing machine learning algorithms and use in education.

Departing from a validated nonlinear onecompartment lung model by Athanasiades et al. [12] , this work derives a fast executable lung model with limited loss of accuracy and extends this lung model with a simple ventilator circuit. The parameters used in the model are based on clinical measurements derived in previous studies of mechanically ventilated patients.

We propose to use this model to generate a sufficiently large, rich synthetic data set that can be used for training and testing machine learning, and that has possible applications in education. The physiological origin of the model allows adaptation to various asynchrony types combined with various disease archetypes, such as chronic obstructive pulmonary disease (COPD), acute respiratory distress syndrome (ARDS), idiopathic fibrosis, and morbid obesity.

The contributions of this work can be summarized as follows:

• Adaptation of the nonlinear lung model from Athanasiades et al. [12] for fast evaluation in an electronic circuit simulator. • Combination of the adapted lung model with a simple mechanical ventilator model to simulate patient-ventilator interactions. This enables automatic timing labeling of the synthetic data and allows modeling different types of asynchronies. • Modelling of different patient archetypes based on experimental data, including various types of lung conditions. • Validation of the proposed model including expert evaluation by visually comparing the synthetic data to clinical data experienced clinicians and validation with a validated machine learning model. The remainder of this work is structured as follows: Section II contains a study of related work and the current state of the art. Adaptation of the lung model, as well as the derivation of the ventilator model, and its implementation are described in Section III. The model validation and the results are presented in Section IV. Section V provides an in-depth discussion, detailing the implications and limitations of our approach as well as expert opinions. Finally Section VI concludes this work.

An approach based on deep learning and neural networks is promising for detecting asynchronies [13, 14] . Instead of a designer deriving rules, like in rule-based algorithms, neural networks derive their own classification mechanisms. However, neural networks are known for requiring many labeled training examples to reach good performance, while preventing overfitting.

To the best of our knowledge, the usage of virtual models to generate patient-ventilator waveforms for asynchrony detection and machine learning training has not been studied. Lino et al. [15] review the currently available virtual mechanical ventilator simulators that are focused on educational purposes. Holanda et al. [16] simulate mechanical ventilator waveforms with asynchronies solely for educational purposes. The advantage of simulating a lung with software compared to an approach based on mechanical test lungs is that software allows for easy changes of the properties of the lung with no constraints due to ventilator needs. However, with test lungs it is easier to test the setup for different types of ventilators [17] .

Isolated lung models are a well-studied topic in the literature. Many lung models are based on the one-compartment model, which models the airways and lung as a pipe and a balloon [18] . We have chosen to base our work on the validated nonlinear one-compartment model proposed by Athanasiades et al. [12] , which in its turn is a continuation of the work of Liu et al. [19] . This model is a more accurate, yet also more complex, version of the regular linear one-compartment model. In contrast to simpler models, it can emulate the collapse of the middle airways, and the turbulence in the upper airways, the small airways, and the lung and chest wall compliance with nonlinear equations, which is important for accurate ventilation modeling.

Although it has been suggested by Athanasiades et al. [12] that their work can be used to model different types of respiratory diseases, there has not been any research on this topic. Fortunately, there is a vast amount of literature on changes in lung mechanics under the influence of different disease types [20, 21, 22, 23, 24, 25, 26, 27] . We use this literature to adapt the model by Athanasiades et al. to different disease archetypes since the model parameters represent real physiological properties of the lung.

To model asynchronies, a ventilator model is required that is compatible with the lung model. The ventilator model only incorporates the most important features of the ventilator and is kept as simple as possible. Most properties of the ventilator are well-studied, e.g., by Kaczka et al. [28] , but may vary between different ventilator brands. Literature is sufficient to serve as a basis for deriving a simple ventilator model that can be paired with the selected lung model.

The first step towards generating synthetic patient-ventilator data is specifying the underlying Fig. 2 . The patient lung model which is an adapted version of [12] , note that the components have nonlinear dynamics and are coupled with each other through these dynamics. The Kelvin body and the chest wall compliance are switched as compared to Athanasiades et al. [12] and an extra resistor R d and voltage source P ipP EEP are added to ensure the correct functional residual capacity. model. The model is divided into two parts: the lung model and the ventilator model. As stated before, the lung circuit is based on the model by Athanasiades et al. [12] , and is altered to accommodate fast evaluation. The first part of this section describes the lung model and the alterations. After this, different sets of parameters for modeling the different patient archetypes are proposed and explained. The next part introduces the simple ventilator extension and the last part describes how we combine and implement the lung model and the ventilator model to generate the synthetic data including asynchronies and the validation methods.

The work of Athanasiades et al. includes a nonlinear one-compartment lumped element model of the lung. Lumped element models simplify the description of spatial systems by discrete components that can together approximate the behavior of the system. In our case, we map a mechanical system (the lung) to the electrical domain (the model). The translations between mechanical properties and electrical elements are as follows:

• The compliance (sometimes called flexibility, a measure of how much a material deforms when stress is applied) is modeled by a nonlinear capacitance in the electrical domain. • The pressure at a certain point in the respiratory system is analog to the voltage at a node in the circuit. A voltage source can thus be seen as a pressure source. Note that usually, when working with both pressure and voltages, we work with voltage and pressure differences. We treat ground as electrical analog for atmospheric pressure. • The resistance to airflow is modeled by a resistor. In general, the smaller the diameter of the tube (or in this case airway) the higher the electrical resistance. • Volume is analog to charge in the electric domain. An important point to keep in mind is that the charge on a capacitor is thus air volume in a specific area of the lung. • The airflow rate is analog to electric current, since this is movement of charge (and thus air volume) from one part of the respiratory system to another.

The advantage of this approach is that the rules that apply to the electric domain also translate correctly to the mechanical domain. Both the lung model and the ventilator model make use of this principle. However, it needs to be noted that not everything is the same and caution must be taken sometimes: for example, negative volume in a physical sense does not exist, while negative charge does. We have taken this into account when using the model.

By combining the basic elements, a model of the lung is created and shown in Figure 2 . The lung model can be divided in four parts:

• The top part consists of three resistors to model the airways: the upper airway resistance R u , the collapsible airway resistance R c and the smaller airway resistance R s . The upper airways are the airways that have cartilage supporting them, due to the large diameter of these airways there is turbulence present at high flows. The turbulence through the large airways is included in the model by adding a Rohrer resistance as described by Athanasiades et al. [12] . The small airways are usually supported by the elastic recoil of the lungs, when the lung volume increases the small airways are stretched open, thus decreasing the resistance of the small airways. This effect is embedded in the formula for the small airway resistance. The collapsible airway refers to the middle airway segment, this part of the airway usually has less support and may therefore collapse during expiration. This is an important factor in various disease types, and is usually not included in simpler lung models. A capacitor C c is added to model the compliance of the collapsible airways. • Underneath the three resistances, a capacitor C l is added which represents the compliance of the lung tissue. Its charge represents the current lung volume. • The visco-elastic structure of the lung tissue and chest wall is modelled by a Kelvin Body with resistance R ve and capacitance C ve . • The nonlinear chest wall compliance is modeled by a variable capacitance C cw . The charge of this capacitor represents the current chestwall volume. • The pressure applied by the muscles of the chest wall P mus is represented by a voltage source. This voltage source may also be used to model cardiac oscillations, although cardiac oscillations have a different origin. • To ensure that the volume of the chest wall and lungs are both equal to the functional residual capacity (FRC) at the start of the simulation for a given positive end-expiratory pressure (PEEP), an extra voltage source P ipP EEP is added in series with a resistance R d to ensure this condition is met. The resistance R d is very high to ensure that there is no influence on the rest of the simulation. Although the circuit may seem simple at first sight, the components in the circuit are not static scalars. To model the airways accurately, the components have complex nonlinear dynamics which are inter-coupled. This makes an analytical solution not possible and simulation tools are needed to obtain a solution. The original model of Athanasiades et al. is implemented in the C programming language for 4 different test persons. We choose to implement it in a circuit simulator. Some minor modifications to the original model are made to improve speed and overcome the limitations of the circuit simulator without losing accuracy:

• We replace the piece-wise continuous function that models the volume of the collapsible airways (V c ) by a function that has the same shape in the physiological region, but in contrast is completely continuous, and the first and second derivative are continuous:

V cmax , B c , A c and D c ∈ R are patientdependent parameters which determine the shape of the curve, and P t is the transmural pressure. This modification does not change the outcome of the model, but enables implementation of the model in a circuit simulator. The function for R c is also changed to a completely continuous function:

with K c ∈ R. • We replace the sine-function that is used to model the muscle pressure P mus in [12] by a rounded trapezoid with a different slope for the rising edge and falling edge to model the muscle pressurization. This is more realistic and corresponds better to the shape of the muscle pressurization during ventilation [29, 30] . • The order of the visco-elastic Kelvin body and the chest wall compliance are switched to enforce the condition: V lung + V collapsible = V chestwall . The remainder of the model for the healthy patient archetype is kept the same as the model by Athanasiades et al. [12] and can be found in Table  A .1 in Appendix A. For the disease archetypes, more modifications are made, which are addressed in the next section.

Athanasiades et al. [12] estimate the parameters of the model based on measurements of four healthy test subjects. A synthetic data set suitable for machine learning has to be diverse and realistic. In clinical practice, different patient archetypes are present, which therefore should be part of the data set. In particular, since the state of health of mechanically ventilated patients is usually compromised, it is important that the synthetic data set includes data of patients with different disease archetypes. Therefore, from the literature in total four common disease types with different degrees of severity are selected that are known to change lung mechanics significantly: COPD with severity "1" and "2" ("2" being the worst), obese "1" and "2", ARDS with severity "1", "2" and "3" and idiopathic fibrosis. In previous studies, a large effort is made to capture the change in mechanics for these different diseases using different types of measurement techniques. The main features of these changes in mechanics can be incorporated into the proposed lung model by changing the lung model's parameters. The differences between the different patient archetypes are thus based on clinical measurements of mechanically ventilated patients.

We propose the following changes based on experimental data from previous studies to model the patient archetypes:

• ARDS patients have a loss of oxygenated lung tissue caused by inflammation of the lungs, fluid accumulation in the lungs, and a partial collapse of the lung (atelectasis). For this Fig. 3 . The pressure-volume and pressure-resistance curves for the healthy archetype (blue), the COPD archetype (red) and the ARDS archetype (yellow). Note that the different patient archetypes breathe in different ranges of transmural pressure.

reason, their lungs become less compliant and stiff [31, 23] . This results in a smaller residual volume, functional residual capacity, and total lung capacity. ARDS is more complex and inhomogeneous than the model proposed in this paper can describe, moreover it differs from patient to patient [32] . However, the model can capture the most important ARDS features when the lungs are open [33] . The smaller lung volume is modeled by changing the shape of the equation of the lung volume V l . The original equation for V l described in [12] is changed to the sigmoidal curve for ARDS described in Venegas et al. [34] , which is specifically developed for describing the lung volume of patients with ARDS:

.

(

A l , B l and D l are constants determining the exact shape of the curve, and are shown in Table I in Appendix C. The change in the shape of the pressure-volume curve compared to the normal archetype is shown in Figure  3 . Although this curve is used for the ARDS archetype, it can also be used for all other patient disease archetypes [35, 36] . Resistances of the airways are moderately increased compared to a healthy individual [37] , which is also shown in Figure 3 . Finally, the time constant of the Kelvin body is increased to model the atelectasis and opening of the lungs. These changes are sufficient to capture the most important features of ARDS "1", "2", and "3". • The term COPD is used for all patients with non-reversible obstructive airflow, however, there exists great variation between the different phenotypes of COPD [38] . For this work, we have made a selection of common changes that are found in COPD patients as compared to the healthy archetype. COPD is characterized by high airway resistance and low lung elastic recoil [24, 25] . This results in high compliance of the lung tissue, which in turn results in higher lung volumes. For this reason, the total lung capacity, residual volume, and functional residual capacity are all higher than in a healthy person [39] . This is depicted in Figure 3 . Note that COPD patients breathe in a lower range of transmural pressure than the normal archetype. The expiratory resistance is much higher in patients with COPD due to excessive central airway collapse especially during expiration [40] . The small airway resistance is also higher than the healthier patient archetype, because of the loss of elastic recoil, which normally opens the small airways. Late cycling is more prominent in the COPD archetype than in other archetypes. • In obesity, due to closure of the peripheral airways and due to the diaphragm being pressed towards the head in supine position [20] , the total lung capacity and functional lung capacity are lower than in healthy subjects. The respiratory compliance is reduced, and therefore the lung volume, and the upper airway resistance is strongly increased mostly due to increased turbulence [20, 21, 22, 41] . • In idiopathic fibrosis, the lung tissue is very stiff and has low compliance. The residual volume, functional residual capacity, and total lung capacity are all lower compared to healthy individuals [42, 43] . The airway resistance is slightly lower than in healthy individuals.

In patients with idiopathic fibrosis, the asynchrony "early cycling" is more prevalent than in other patient archetypes. All changes per patient archetype are shown in Table I that we use in this paper. In clinical practice there exist more phenotypes of the same diseases, however, that is out of the scope of this work. The model equations for the disease archetypes can be found in 

To model patient-ventilator interactions, the adapted lung model by Athanasiades et al. [12] is extended with the simple ventilator circuit shown in Figure 4 . The model consists of the following parts:

• The endotracheal tube R et (the tube which is inserted in the trachea through the mouth), is modeled as a variable resistance to include turbulent airflow. The endotracheal tube adds an extra resistance R et in series with the airway resistances from the patient. We choose the same approach as Flevari et al. [44] , where the resistance of the endotracheal tube is modeled by Rohrer's equation which is a model for turbulence. Moreover, Flevari et al. determine the equation parameters for endotracheal tubes with different diameters. In our simulation setup, we can easily change between different parameters and therefore simulate different endotracheal tubes. • The tubing system connects the endotracheal tube and the ventilator. This system is split into an inspiratory circuit and an expiratory circuit. Both are modeled by an inductor (inertance), capacitor (compliance), and resistor, see Figure  4 . Wenzel et al. [45] model the resistance of breathing circuits with Rohrer's equation for different brands and types of tubes. This equation takes into account the turbulence in the tubes, which is important and should not be neglected; therefore, we take the same approach. The inertance and compliance of these breathing tubes are also measured in Wenzel et al. [45] and are also included in the model. • The ventilator itself is modeled by two pressure sources, different unidirectional valves, and parasitic elements. The pressure sources in the ventilator determine the positive endexpiratory pressure (PEEP), and the peak inspiratory pressure (P insp ). These are the pressures at which the ventilator operates. The ventilator pressurization is modeled by two voltage sources that represent the pressure sources. The valves are modeled by switches and by unidirectional diode-like elements, which block inspiration from the contaminated expiratory connection. Valves are opened and closed when the ventilator is triggered and when it cycles; however, this is not drawn in Figure 4 . This results in a block wave or a trapezoidal wave for the ventilator.

The pressure and the flow in ventilated patients are usually measured at the beginning of the endotracheal tube. 

The model generates noise-free measurements. For machine learning data it is important to have simulations that have the same characteristics as experimental data. For this reason, we add filtered white noise with a bandwidth of 0-15 Hz to the pressure and flow curves. The noise amplitude is chosen in such a way that it mimics the noise in experimental waves. Another artifact that is often observed in ventilator waveforms originates from cardiac oscillations. These are oscillations caused by the pulmonary blood change due to the pulsating heart [46] . We model this as a sine wave that is added to the muscle pressure.

Since both ventilator and patient are fully controlled by the model, the data can be labeled automatically. Every regular breath with correct pressurization consists, next to the pressure, flow, and volume waveforms, of four time labels which are defined as follows:

• The start of patient inspiration: We define this as the start of the patient's muscle contraction. This is the same definition as often employed in the literature, e.g., in [47] and [48] . • The end of patient inspiration. We define this as the moment when the patient's muscle effort becomes zero again. This is the same definition as used in Fabry et al. [48] , but note that it differs from the definition used in Mojoli et al. [30] , who use half relaxation as the optimal cycling/end of inspiration point. • The start of the ventilator pressurization (triggering): the moment the ventilator switches to inspiration. • The end of ventilator pressurization (cycling):

The moment the ventilator switches to expiration.

To determine whether asynchronies are present, we classify breaths by looking at the difference between the start of patient inspiration effort and the start of ventilator pressurization (the startinspiration delay or inspiratory response delay [48] , to look for trigger asynchronies) and the difference between the end of patient inspiration effort and the end of ventilator pressurization (the end-inspiration delay or expiratory response delay [48] , to look for cycling asynchronies).

It should be noted that the inspiratory response delay increases when the inspiratory trigger is less sensitive, when there is a large tidal volume or when expiratory gas flow is restricted [48] . The expiratory response delay is increased due to lower inspiratory peak flow, a higher level of pressure support, or less sensitive cycling criteria [48] .

To classify whether a breath is an asynchrony or a regular breath, these delays need to fall into a certain margin. We employ the same margins as Bakkes et al. [49] since the machine learning algorithm that we use for validation uses the same criteria: A normal breath has an end-inspiration delay larger than -100 ms and smaller than 300 ms. The start-inspiration delay must be lower than 250 ms. All other breaths are asynchronies.

Although there are more asynchronies described in the literature, for this study we focus exclusively on the most common asynchronies during PSV and use the following criteria to classify these asynchronies:

• Early cycling: the duration of ventilator pressurization is shorter than patient inspiratory effort. More specifically, we define that the end-inspiration delay must be shorter than -100 ms. The classification is the same as in Bakkes et al. [49] . The x-axis is the start inspiration delay, on the y-axis the end inspiration delay is shown.

• Late cycling: the duration of the ventilator pressurization is longer than the patient inspiratory effort. The end-inspiration delay is longer than 300 ms. • Delayed inspiration: There is a significant trigger delay between the patient inspiration and the ventilator inspiration; the ventilator inspiration starts late compared to the patient inspiration. The start-inspiration delay is longer than 250 ms. • Ineffective effort is an exception to the asynchronies above: patient effort is not followed by a ventilator pressurization. In other words, there is a patient effort but the ventilator is not triggered, the start-inspiration delay and endinspiration delay are therefore not defined.

The margins are visualized in Figure 5 . However, since the machine learning algorithm estimates the start inspiration delay and the end inspiration delay, it is possible to tune the margins. This is however out of the scope of this paper.

The asynchronies are generated in a random sequence throughout the synthetic data, taking into account that some types of asynchronies are more prominent in particular patient archetypes.

The lung and ventilator models are implemented in LT Spice XVII [50] , which is an analog electronic circuit simulator. A wrapper in MATLAB R2019b

[51] is written in order to facilitate rapid changing of the parameters, improved plotting functionality, and automated runs. The input muscle waveform P mus , and the ventilator triggering and cycling points are also generated by MATLAB R2019b. The MATLAB wrapper consists of three stages: The first stage is when an LT Spice run is performed with only PEEP. This is done to calculate the triggering times. In the next run, the ventilator is triggered at the correct times and is kept on until a certain time limit is reached. With this data, the correct cycling times are calculated. The last run is done with both correct triggering times and cycling times. The resulting timing points are saved together with the generated waveforms. The simulation is ∼70x faster than real-time on an Intel Core i7 7th Gen.

In order to check the feasibility of the model and generated data, we generate a data set with the model described above. Before each simulation, the initialization value for the interpleural pressure for the correct PEEP (P ipP EEP , see Figure 2 ) needs to be calculated in such a way that the chest wall volume and lung volume are equal.

We generate the simulations for the five different patient archetypes and we distribute the asynchronies randomly throughout the dataset. We take into account that some asynchronies are more prominent in particular disease archetypes, we, therefore, specify a different distribution of asynchronies for each disease archetype.

We generate simulations that are always 120 seconds long and contain 30-40 breaths. Longer simulations are possible but this may lead to numerical errors in some cases. The PEEP and inspiratory pressure were chosen in such a way that the tidal volume was always set to around 500 mL. This corresponds to 7 mL/kg with the assumption of an average patient being 70 kg [48] [6] .

We are particularly interested in three aspects of the model: whether the main features of the waveforms are modeled correctly, whether the asynchronies are modeled correctly, and whether the different patient archetypes are modeled correctly. To evaluate the model and the data it generates, we propose three methods to check the validity of the data.

• The synthetic waveforms are compared to clinical ventilator waveforms found in literature [52, 7, 47, 53, 54, 55] by visual inspection.

A clinical expert (AdB) has reviewed and commented on the synthetic waveforms. The review is used to check whether there are visual inconsistencies in the data. • Machine learning trained on validated clinical data is tested on the synthetic data. We used the algorithm proposed by Bakkes et al. [49] , which is trained on clinical data. The data were obtained at Fondazione I.R.C.C.S. Policlinico San Matteo (Pavia, Italy), and contains 4275 breaths from 15 patients, who were not able to breathe independently. The data was labeled by a clinical expert who indicated the inspiratory and expiratory efforts of the patients. The algorithm is trained to recognize the time points described in Section III-F in unlabeled data. The difference in results between the algorithm tested on the clinical and synthetic data might indicate that there is a disagreement between the clinical data and the synthetic data.

In Figure 6 three simulated breaths are shown. The markers in the figure indicate the automatically generated annotations. Figure 6a shows a breath with patient effort using only PEEP (the ventilator is not triggered). This is the situation during continuous positive airway pressure (CPAP) or during an ineffective effort. The start of inspiration, maximum effort, and end-of-effort markers are indicated. The expiratory phase starts when flow changes sign (dotted line). Figure 4b shows the simulated waves for a late cycling event and an ineffective effort during expiration. In the late cycled breath, both the ventilation trigger and cycling markers, along with the start and end of patient effort markers, are shown. The shape of both pressure and flow rate shows the characteristic features of a late cycling event. The pressure at the airway opening is initially lower than the ventilator inspiratory pressure; the difference depends on the inspiratory tube resistance, airway resistance, and airway flow rate. During active inspiration, the flow wave has a nonexponential shape. At the end of patient inspiratory effort, the flow wave changes into an exponential shape. This difference in flow wave shape is an important marker for the end of patient effort and this specific time can be extracted from the flow wave. It coincides with the end-of muscle effort time. During the expiration phase, both pressure and flow waves decrease exponentially. Note that in this figure an ineffective effort occurs during expiration. Both pressure and flow waves show the characteristic shape and features of such an event. A drop in pressure and a decrease in flow are observed. The flow rate is limited by the unidirectional valve in the expiratory tube. Figure 6c shows an early cycling asynchrony. The time point when patient inspiration ends lies after the time point when the ventilator cycles, resulting in a pattern in pressure and flow waves which is characteristic for early cycling. The minima in pressure and maxima inflow after cycling can be observed when cycling occurs between the start of inspiration and maximum inspiratory effort. The start of exponential pressure and flow decay occurs at the end of the inspiration effort. In Figure  7 the simulation results for pressure, flow, and tidal volume for five patient archetypes are shown. The PEEP, maximum inspiratory pressure P insp , and duration of the inspiratory and expiratory times are chosen to be compliant with a lung-protective ventilation scheme of 500 mL tidal volume (corresponding to a 70 kg adult male). Three breaths are shown for each archetype, a combination of normal and asynchronous ventilations. For the COPD case, two late cycling events are shown; for the fibrosis archetype, an early cycling event is shown. A delayed trigger and an ineffective effort are shown for the ARDS archetype. For the obese two archetypes with two late cycling events are shown. Figure 8a shows a plot for 2000 breaths of the true start-inspiration delay and end-inspiration delay (as defined in Section III-F). This shows the distributions of asynchronies in the simulated data set. Figure 8b shows the estimated start-inspiration delay and end-inspiration delay as estimated by the machine learning algorithm. The difference between the ground truth simulations and estimated machine learning results is shown in Figure 9 . The median error for the majority of the breaths is close to zero and the interquartile range is small; however, some outliers are observed. Note that the number of outliers is much smaller than the correctly estimated times within the 25-75% range. The median value error for the inspiration error is small for all breath types. However, for the end-inspiration error, there are systematic deviations of the median error value for the delayed inspiration (DI) types and the median error is around +200 ms. Table II shows the performance metrics per asynchrony class. It shows a lower true positive rate for delayed inspiration and a lower positive predictive value for delayed inspiration + late cycling breaths. This corresponds to the observation in the previous figures.

Due to the emerging need for large, labeled data sets for training and testing of machine learning for mechanical ventilation, this paper explored generating a synthetic dataset with automatic labeling of ventilator waveforms including different types of asynchronies. The method used in this paper combined an already available nonlinear lung model with a simple ventilator model. Some small adjustments were made to generate synthetic ventilator waveforms from this combined lung-ventilator model using a circuit simulator. Both the comparison with clinical data by an expert and the machine learning results suggests that it is possible to generate an automatically labeled synthetic data set with the most important features of a clinical dataset.

The evaluation of the clinical expert shows that the waveforms are recognizable as patient-ventilator [17] . The ventilator model in this work is not complex enough to model these differences and requires three runs to simulate simple control loops, which leads to artifacts in the waveforms. The lung model is also less complex than a patient. The main finding of the machine learning results is that there is an overestimation of the median end-inspiration error (Figure 9b ) for all asynchrony types. Still, for most asynchrony classes the machine learning algorithm has a balanced accuracy higher than 0.85. However, especially the delayed Fig. 7 . The subfigures depict pressure, flow and volume waveforms of 5 different patient archetypes. In the normal archetype, the middle breath is early cycling. In the COPD2 archetype, two times late cycling is shown. In the idiopathic fibrosis archetype, early cycling is shown. The ARDS2 archetype shows delayed triggering and an ineffective effort while the OBESE2-archetype shows two late cyclings.

inspiration asynchrony suffers from a lower accuracy. This is also to some extend visible in the results on the experimental data set in Bakkes et al. [49] , although it is more pronounced in the synthetic data. This was caused by the method used to calculate the triggering time, which required three runs, and did not take the effect from adjacent breaths into account. At this moment this is fixed. Another factor that caused this was the patient effort which was manipulated for delayed inspiration to obtain a delayed trigger. For further work, the muscle waveform should still be refined. The combinations of asynchronies and patient archetypes were chosen in such a way that they corresponded to experimental data, but discrepancies might still be present. There might also be more variation present in the experimental data set. The procedure of generating the data set from 3 runs is different from the clinical case, this may cause some interactions between adjacent breaths even when pressure triggering and flow cycling is used.

The results do fit in the theory that machine learning algorithms might be trained on the synthetic dataset. Moreover, generating the synthetic dataset has given us new insight into the mechanisms of asynchronies, and might be useful for educational purposes.

We want to emphasize that the model used in this paper is a simplification of the real interaction between the patient and the ventilator. The model omits the reaction of the patient on being mechanical ventilation [56] . Also, longer-term effects of mechanical ventilation are not modeled. Ideally, the parameters in Table C would change over time, depending on what would be happening to the patient. Whether this leads to problems with using this data set for machine learning, still needs to be investigated.

The methodology of checking the synthetic data by visual inspection and by machine learning is limited. It is sometimes difficult to judge whether a certain observation in the machine learning results is caused by the data or by the algorithm itself. For more research on the similarities between the clinical dataset and the experimental dataset, more similarity metrics could be included in future research. However, we did not aim to recreate the clinical data exactly.

This study demonstrates how an accurate labeled synthetic data set of patient-ventilator waveforms can be generated for training and testing machine learning algorithms to detect patient-ventilator asynchronies. The patient-ventilator model used in this paper is, however, still a simplification, which might introduce certain artifacts in the synthetic waveforms and makes it impossible to incorporate certain effects in the data. Future research will focus on testing the effect of synthetic data on the training and testing phase of machine learning algorithms. Fig. 9 . On the left the error between the true start-inspiration error and the estimated start-inspiration error, on the left the error between the true end-inspiration error and the estimated end-inspiration error by the machine learning algorithm. 

ICU occupancy and mechanical ventilator use in the United States

Ventilator-induced lung injury". eng

COVID-19 pneumonia: different respiratory treatments for different phenotypes?

Patient-ventilator asynchronies during mechanical ventilation: current knowledge and research priorities

Asynchronies during mechanical ventilation are associated with mortality

Detection of patientventilator asynchrony from mechanical ventilation waveforms using a two-layer long short-term memory neural network

Replicating human expertise of mechanical ventilation waveform analysis in detecting patient-ventilator cycling asynchrony using machine learning

Virtual worlds as proxy for multi-object tracking analysis

On the benefit of synthetic data for company logo detection

A plea for utilising synthetic data when performing machine learning based cyber-security experiments

DermGAN: Synthetic Generation of Clinical Skin Images with Pathology

Energy analysis of a nonlinear model of the normal human lung

A machine learning model for real-time asynchronous breathing monitoring

Minimizing asynchronies in mechanical ventilation: current and future trends

A critical review of mechanical ventilation virtual simulators: is it time to use them?

Patientventilator asynchrony

Bench testing of pressure support ventilation with three different generations of ventilators

The linear singlecompartment model

Airway mechanics, gas exchange, and blood flow in a nonlinear model of the normal human lung

Altered respiratory physiology in obesity

Respiratory system mechanics in sedated, paralyzed, morbidly obese patients

Effects of obesity on respiratory resistance

Respiratory system mechanics in acute respiratory distress syndrome

Respiratory mechanics in ventilated COPD patients: forced oscillation versus occlusion techniques

Lung and chest wall mechanics in mechanically ventilated COPD patients

Simulation of late inspiratory rise in airway pressure during pressure support ventilation

Parameters for simulation of adult subjects during mechanical ventilation

Inspiratory lung impedance in COPD: effects of PEEP and immediate impact of lung volume reduction surgery

Noninvasive estimation of respiratory mechanics in spontaneously breathing ventilated patients: a constrained optimization approach

Is the ventilator switching from inspiration to expiration at the right time? Look at waveforms!

Close down the lungs and keep them resting to minimize ventilator-induced lung injury

Phenotypes in ARDS: Moving Towards Precision Medicine

Recruitment maneuvers and higher PEEP, the so-called open lung concept, in patients with ARDS

A comprehensive equation for the pulmonary pressure-volume curve

Pattern of lung emptying and expiratory resistance in mechanically ventilated patients with chronic obstructive pulmonary disease

A sigmoidal fit for pressure-volume curves of idiopathic pulmonary fibrosis patients on mechanical ventilation: clinical implications

Analysis of behavior of the respiratory system in ARDS patients: effects of flow, volume, and time

Clinical phenotypes of COPD: identification, definition and implications for guidelines

Lung compliance and chronic obstructive pulmonary disease

Tracheobronchomalacia and excessive dynamic airway collapse

Reference equations for respiratory system resistance and reactance in adults

Physiology of the lung in idiopathic pulmonary fibrosis

Ventilatory support and mechanical properties of the fibrotic lung acting as a "squishy ball

Rohrer's constant, K2, as a factor of determining inspiratory resistance of common adult endotracheal tubes

Coaxial tubing systems increase artificial airway resistance and work of breathing

Cardiogenic oscillation and ventilator autotriggering in brain-dead patients: a case series

Patient-ventilator trigger asynchrony in prolonged mechanical ventilation

An analysis of desynchronization between the spontaneously breathing patient and ventilator during inspiratory pressure support

A Machine-Learning Method for Automatic Detection and Classification of Patient-Ventilator Asynchrony

SPICE (Simulation Program with Integrated Circuit Emphasis)

Efficacy of ventilator waveforms observation in detecting patient-ventilator asynchrony

Patient-ventilator asynchrony during assisted mechanical ventilation

Bedside waveforms interpretation as a tool to identify patientventilator asynchronies

Monitoring patientventilator asynchrony

Patient-ventilator interaction

Healthy lung model equations

The authors would like to thank Professor Francesco Mojoli.This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.All the authors have no conflict of interest to declare.