key: cord-0156964-4m7er6gc
authors: Mo, Haimiao; Ding, Shuai; Yang, Shanlin; Zheng, Xi; Vasilakos, Athanasios V.
title: The Role of Edge Robotics As-a-Service in Monitoring COVID-19 Infection
date: 2020-11-17
journal: nan
DOI: nan
sha: 2934c371a58903c532c42b171c2c507e3ab37d70
doc_id: 156964
cord_uid: 4m7er6gc

Deep learning technology has been widely used in edge computing. However, pandemics like covid-19 require deep learning capabilities at mobile devices (detect respiratory rate using mobile robotics or conduct CT scan using a mobile scanner), which are severely constrained by the limited storage and computation resources at the device level. To solve this problem, we propose a three-tier architecture, including robot layers, edge layers, and cloud layers. We adopt this architecture to design a non-contact respiratory monitoring system to break down respiratory rate calculation tasks. Experimental results of respiratory rate monitoring show that the proposed approach in this paper significantly outperforms other approaches. It is supported by computation time costs with 2.26 ms per frame, 27.48 ms per frame, 0.78 seconds for convolution operation, similarity calculation, processing one-minute length respiratory signals, respectively. And the computation time costs of our three-tier architecture are less than that of edge+cloud architecture and cloud architecture. Moreover, we use our three-tire architecture for CT image diagnosis task decomposition. The evaluation of a CT image dataset of COVID-19 proves that our three-tire architecture is useful for resolving tasks on deep learning networks by edge equipment. There are broad application scenarios in smart hospitals in the future.

n recent years, physical health monitoring has received extensive attention [1] . Respiratory rate (RR) monitoring is one of the crucial indicators for evaluating the physiological status [2] , especially in scenes of doctors diagnosing respiratory diseases, such as chronic obstructive pulmonary disease (COPD), asthma, interstitial lung disease, pulmonary sarcoidosis, pneumoconiosis [3] , and sleep apnea [4] . The recent advances in artificial intelligence and big data technology led to the development of robot-based health monitoring systems [5] , [6] . Health monitoring robots could help us understand physical and mental health [7] , [8] , [9] , [10] , which are widely applied to medical human-robot interaction scenarios, intervention for children with autism spectrum disorder (ASD) [11] , efficient human activity recognition [12] , and other fields of healthcare. At present, the coronavirus disease 2019 (COVID-19) is spreading quickly in various countries around the world. Medical personnel on the frontline of the epidemic face extremely difficult situations such as the high risk of cross-infection, shortage of protective clothing, and high work pressure. Traditional physiological monitoring equipment could not sufficiently meet the needs of this unusual scenario. There is an urgent need for a robot system with non-contact The Role of Edge Robotics As-a-Service in Monitoring COVID-19 Infection generally support ward inspections and remote interactions, but they could not monitor physiological indicators in realtime [4] . In the medical scene of non-contact monitoring of physiological indicators, mobile robots need to using video frames captured by the camera mounted on the mobile robot at a near real-time speed in order to track the target patient, obtain physiological signals, and assess physical health. These mobile robots have limited resources and could not meet the needs of unique medical scenarios [21] . In such a situation, task decomposition of the lightweight target tracking network with physiological monitoring is an effective way to handle the issues [5] .

To solve real-time diagnosis in actual medical scenarios and make up for the lack of computing power of mobile robots, we use a three-tier architecture to decompound different computing tasks. In this way, not only have we realized remote diagnosing patient health but also effectively avoided the cross-infection of frontline medical staff. The study of edge robotics as-a-service in monitoring COVID-19 infection has essential research significance. We design a three-tier architecture based on deep learning task decomposition. The three-tier architecture consists of robot layers, edge layers, and cloud layers. They are applied to data collection tasks, data processing tasks, and support decision tasks, respectively.

We summarize our main contributions as follows. i) We achieved a three-tier architecture, including robot layers, edge layers, and cloud layers. We divide the task into a three-tier architecture according to the computing resources of different devices. Heavy tasks are put on edge layers. Light tasks are placed on robot layers and cloud layers.

ii) We collected thermal imaging face video data of 15 subjects, such as medical staff, researchers, and patients in the hospital. We verified that the Deep Learning-based Respiratory Rate Monitoring System (DLRRMS), based on our three-tier architecture, still works when the face is blocked, or the head moves unconsciously.

iii) We adopted a three-tier architecture to extract respiratory signals and provide sufficient information for doctors to achieve rapid diagnoses. Moreover, we have verified the three-tier architecture's usefulness through a public dataset of CT COVID-19 images.

The paper is organized as follows. Section 2 presents the related works of respiratory rate monitoring methods (including contact monitoring methods and non-contact methods), robot-based health monitoring, and deep learning on edges. Section 3 describes our three-tier architecture, including data collection at robot layers, data processing at edge layers, decision support at cloud layers. Section 4, Section 5, Section 6 shows our three-tier architecture in the hospital case study, simulation study, and COVID-19 patients study. Section 7 and section 8 present our discussion and conclusion, respectively.

The contact monitoring method collects breathing signals through contact sensors. Contact monitoring devices, such as conventional electrocardiograms (ECG) [22] , pulse oximetry [23] , and innovative wearable devices [24] , are currently available in the fields of medicine [25] . These methods require physical contact between skin and electrodes, infrared sensors, or pressure sensors [26] . As for RR monitoring, physical appearances in hospitals or special mobile types of equipment, for example, a thermistor [27] , a spirometer [28] , and a breathing belt sensor [29] , are required. These types of equipment usually measure only one of the following parameters: breathing sound [30] , breathing airflow, breathing-related chest or abdominal movement, and breathing carbon dioxide emissions. Although traditional methods could effectively monitor the physiological parameters of the human body, most methods are cumbersome.

The non-contact monitoring methods include visible light technology, Doppler effect technology, infrared thermography technology. Visible light technologies mainly include remote PPG (rPPG). The technology of rPPG mainly collects face videos remotely through a visible light camera to obtain PPG signals [31] [32] . However, the contaminated motion artifact (MA) in remote PPG (rPPG) signals seriously interferes with physiological indicators' estimation. Changes in blood flow caused by exercises are the leading cause of MA. So far, many noise reduction techniques have been proposed, such as independent component analysis [33] , wavelet denoising, and empirical mode decomposition.

According to the Doppler effect principle, Radar technology, electromagnetic induction technology, and wifi technology could monitor RR [34] . Radar technology uses bio-radar to transmit to the human thoracic cavity at a wavelength λ, which generates an echo signal due to the chest cavity's undulating motion. There is a phase difference between the echo signal and the transmitted signal, and the phase difference will change with the displacement of the thoracic cavity. Echo signals could then be used to extract respiration signals for respiration rate estimations. In normal physiological activities of the human body, breathing motion may cause changes in the lungs' conductivity. Electromagnetic induction technology could be applied to detect tissue conductivity changes to monitor the RR [35] .

With the emergence of a new generation of detectors, near-infrared and mid-infrared regions have also been used for medical thermal imaging [36] . Thermal imaging, also known as infrared thermal imaging (IRT), a remote noncontact monitoring method, has become a promising monitoring and diagnostic technology in the medical field [37] . RR monitoring technology based on thermal imaging mainly collects thermal imaging face in successive frames and tracks the ROI (such as nose and carotid artery) to obtain the breath signals. G. Scebba et al. proposed a novel method based on multispectral data fusion that aims at estimating RR and addressing apnea detection tasks [38] .

Contact monitoring equipment needs to be in direct contact with the human body. Prolonged contact would cause much discomfort to the tested body. Non-contact physiological health monitoring, on the other hand, has the characteristics of non-interference, non-invasiveness, and remote control. In non-contact monitoring, methods other than IRT are susceptible to environmental factors, such as light, noise, and magnetic fields. IRT has the characteristics of high blur and unclear contours. It is difficult to track the ROI to extract physiological signals. In previous research, infrared face ROI could not be tracked accurately when the subject spoke, head moved, or face was partially obscured. Moreover, it is rarely considered to use deep learning methods to track infrared face ROI to extract physiological signals.

Due to the rapid development of artificial intelligence technology and the convenience of remote interaction with robots, health monitoring robots are widely used in the fields of medicine [8] [10] [39] . Michaelis et al. designed a learning companion robot to expand guided reading activities and studied the impact of robots on family reading experience [8] . K. Lin et al. proposed a multi-sensor fusion method in medical human-computer interaction scenarios to improve fusion decision-making performance [40] . Al-Taee et al. designed a new eHealth platform that incorporates humanoid robots to support emerging multi-dimensional care methods to treat diabetes [10] . Social robots are applied to intervention research on children with ASD in order to improve their social skills [11] . The children interact with caregivers and robots in a three-way interaction for 30 minutes per day to complete activities related to emotional storytelling, opinion acquisition and sorting.

At present, most of the robot research in the field of public health is aimed at non-contact ultraviolet (UV) surface disinfection, remote interaction, improvement of mental health [13] [14] [15] . They are deployed in different wards to reduce the dare of frontline medical staff. In addition, CT images are usually used to screen for Coronavirus by artificial intelligence technology [41] [42] [43] . Among patients with COVID-19, the most common predictors of new coronary pneumonia include age, body temperature, signs, and symptoms [44] . However, there are few reports of real-time health monitoring robots for special public health scenarios of COVID-19.

As miniaturization continues and computing power increases, edge computing becomes more powerful. It paves ways for autonomous decision-making in the periphery. Ning et al. [44] proposed a transmission strategy based on deep learning in edges to reduce waiting time and improve the throughput of data transmission between vehicles. Zhang et al. proposed a novel cyber-physical systems (CPS) edge computing platform based on joint learning [45] . They trained machine learning models in a trusted joint learning framework to realize smart services and ensured the trustworthiness of smart services that used joint learning frameworks to test and monitor CPS behavior. Yang et al. designed a finger vein recognition system with template protection based on deep learning to improve its security [46] . Due to the limited processing power of existing edge nodes, Li et al. [20] designed a novel offloading strategy to optimize the performance of the internet of things (IoT) deep learning applications through edge computing. Zhang et al. designed a voting strategy so that fog nodes can be selected as coordinators based on distance and computing power indicators to help speed up the training process [47] . Based on distributed deep learning, Tian et al. [48] designed a distributed deep learning system for web attack detection on edges. Chen et al. [49] adopted a distributed intelligent video surveillance (DIVS) system using deep learning (DL) algorithms and deployed it in an edge computing environment to reduce the huge network communication overhead. Zhang et al. proposed a fog-based democratic, collaborative learning program [50] , which used fog to obtain a deep learning model with good performance in a cloud-free IoT environment and reduced the data locality problem of each fog node.

In medical scenarios, higher real-time performance is required. A large number of interactions between users and the cloud may cause communication delays. It would lead to medical safety accidents and even affect the patient's health diagnosis or life safety, especially for patients diagnosed with acute disease. In this case, we not only need to consider real-time performance but also the accuracy of physiological indicators. Only after both of them meet the requirements simultaneously, it becomes possible to provide doctors with decision-making assistance in real-time scenarios.

The three-tier architecture for task decomposition is shown in Fig.1 . We employ a lightweight computing architecture to divide the different tasks into three-tier architecture: robot layers, edge layers, and cloud layers. They are used for data collection, data processing, decision support, respectively.

For respiratory rate monitoring, respiration rate calculation tasks include facial feature extraction, Spatiotemporal features tracking, respiratory signal extraction, and preprocessing. We break down respiratory rate calculation tasks and place them on robot layers, edge layers, and cloud layers.

For CT image diagnosis, diagnosis tasks include CT image acquisition, feature extraction, ReLU-Pooling, and fully connected. They are employed on robot layers, edge layers, and cloud layers, respectively. 

We deploy robot layers to mobile robots. Suppose there are N mobile robots. Each mobile robot is mainly responsible for collecting data.

In non-contact respiratory rate monitoring scenarios, robots are used to collect infrared face videos. Each video lasts for one minute and contains m picture frames. We regard the feature extraction task of m picture frames as a task. We stipulate that a robot can only handle one subtask. One minute picture frames are divided into K subtasks for feature extraction by CNN. Each subtask is calculated on the corresponding robot. The process of disintegrating the i-th task of robot layers into several subtasks is described as:

Where K subtaskR is the k-th subtask of robot layers, and KN  .

We deploy the task of facial feature extraction to robot layers. Robot layers adopt a siamese network to extract facial features. The siamese network consists of two lightweight convolutional neural networks (CNN) with five convolutional layers, ReLU layers, and Pooling layers. One CNN is used to extract features of searched image Z, and the other CNN is used to extract features of sample image X. They are prepared in the next layers to measure the similarity and track Spatio-temporal features. In this case, Z is a picture containing an infrared face. X is the target area of interest in image Z, such as the nose. One input of the siamese network is a sample image X pre-selected by the user, and the other input is a larger search image Z.

In the scene of CT image diagnosis, robot layers are deployed to the CT scanner of mobile robots, such as CT-SOMATOM On.site. We adopt robot layers to break down tasks of CT image acquisition or original CT image format conversion.

We deploy edge layers to edge equipment. Suppose there are M edges. Edge equipment is used to resolve the task of data processing.

For the respiratory rate monitoring, we deploy the task of Spatio-temporal features tracking to edge layers. We use these idle edges to decompose similarity calculation tasks. The process of decomposing the similarity calculation task is described as

Where j taskE is the j-th similarity calculation task of edge layers.

We adopt edge layers to calculate the similarity between searched image Z and sample image X to track face ROIs when robot layers complete a feature extraction of Z and X. By measuring the similarity between the sample image X and each part of the search image Z, edge layers could give a similarity score map. The target is located and tracked according to the similarity score map. According to the maximum value of similarity, we could locate and track the nose ROI. In order to reduce the computational cost, the process of calculating the similarity score of each sector could be replaced with only one cross-correlation layer by a fully convolutional network (FCN) [50] .

The fully-convolutional siamese (SiameseFC) network [51] is used to evaluate the similarity between the current frame X and the search image Z [52] . Accurately tracking face ROI is one of the key factors in obtaining respiratory signals. We use the SiameseFC network to track thermal imaging face ROI to ensure the accurate extraction of the temporal-spatial features and get high-quality breathing signals. For the i-th For CT image diagnosis, we use the densenet169 to realize classification tasks [42] . The densenet169 is divided into two parts. One part is feature extraction layers, and the other part is ReLU-Pooling and fully connected layers. Moreover, the feature extraction layers of densenet169 are deployed on edge layers. In addition, each edge processes the CT image feature extraction that is greater than or equal to one task.

Simultaneously, we adopt edge equipment at edge layers to break down the tasks of CT image feature extraction. It is prepared for the next cloud layers of decision support.

We deploy edge layers to clouds. Suppose there are M' clouds. We use clouds to break down the tasks of decision support. Once the subtasks of edge layers are completed, cloud layers begin to calculate tasks from edge layers.

For respiration signals, we deploy the task of respiration signal extraction and preprocessing to cloud layers. Once the face ROI information in a frame is obtained from edge layers, respiratory signal feature extraction tasks are decomposed at robot layers. After the respiratory signal feature extraction of one minute is completed, initial respiration signals are obtained. We get the respiration rate by preprocessing the initial respiration signal. Respiratory signal preprocessing includes detrending, normalization, and Butterworth filtering.

Respiratory signal extraction is mainly to convert the face ROI into a gray image according to thermal fluctuations under the nostrils [27] [53] , and then calculate the gray image's average pixel by formula 4. The tasks of calculating the average pixel are implemented at cloud layers. Suppose there are L clouds at cloud layers. Tasks of calculating the pixel average are divided into L subtasks, which are calculated in different clouds. 

Where m is the total number of frames of a thermal infrared video.

The initial respiratory signal of RS has a certain degree of oscillation, and low-frequency components may also affect the respiratory signal quality. In addition, due to the instability of the signal acquisition device and the sensitivity to interference from the surrounding environment, this often causes the signal to deviate from the baseline over time. The entire process of deviating from the baseline over time is called the "trend-term" of the signal. It would affect the quality and accuracy. The initial respiratory signals are detrended using a technique based on a smoothness priors approach [54] .

The respiratory signal needs to be normalized. At present, there are many methods of data normalization, which could be divided into linear methods (such as extreme value method and standard deviation method), polyline methods (such as trifold method), and curvilinear methods (such as semi-normal distribution). We adopt the following approach to normalize the respiration rate signal [54] .

Where  is a standard deviation,  is the mean of the original signal, SN is the signal before normalization, and ' SN is the signal after normalization.

In the RR monitoring method based on PPG, frequency domain features are widely used to obtain heart rate information. This method usually uses a bandpass filter to obtain signals in the frequency range related to heart rate. The normal RR is 9-42 bpm, and the corresponding frequency band is 0.15-0.7 Hz. We use a fast Fourier transform (FFT) to convert the time-domain signal into the frequency domain. We retain data from 0.15 to 0.7 Hz through bandpass filtering and set the data in other frequency ranges to zero. The noise frequency could be eliminated in this way.

Moreover, frequency domain information useful for respiration rate analysis should be extracted. The waveform of breathing is relatively stable, and it corresponds to a lowfrequency signal. We decompose the original signal to achieve the purpose of denoising. The low-frequency part is retained, and the high-frequency part is filtered out. Butterworth filters have the maximum flat amplitude characteristic in the passband, and the amplitude in the positive frequency decreases monotonically with increasing frequency [55] . It is usually used for low-pass filtering, which could filter out the noise of respiratory signals. Butterworth filter is defined as

Where p  and  are the upper and lower cut-off frequencies of the passband, and N is the Butterworth filter's order. However, we could add algorithms through plugins at cloud layers to adapt to different scenarios. After Butterworth filtering, respiration signals need a false peak estimation algorithm for additional processing to improve the respiration rate accuracy.

For CT image classification, we adopt cloud layers to resolve the calculation task of ReLU-Pooling and fully connected layers of densenet169. ReLU-Pooling layer is used to downsampling. The fully connected layer of densenet169 is used as a classifier to classify CT images. Similarly, once the feature extraction subtask from edge layers is completed, cloud layers begin to break down the calculation task of ReLU-Pooling and fully connected layers. When the CT image classification task is completed, the doctor can provide support for decision-making based on the classification results.

After signal preprocessing, initial breathing signals are obtained. We need to detect peaks in the initial breathing signal and eliminate false peaks.

The signal of RR peaks always fluctuates within a specific range. Moreover, the peak point should satisfy two properties: i) its position is near the maximum amplitude point in the current cycle; ii) its amplitude is greater than other points in its neighborhood. We only need to find such a point.

If the peaks at points A and B are equal, and they both satisfy the two conditions above, but only one of them could be used as a peak. We mark the first occurrence as the default peak. The peak detection algorithm detects the peaks of the preprocessing signal and marks the peak information according to the above principle. An example of RR calculation is shown in Fig.3 . Due to various factors, some erroneously detected peaks may still be retained after preprocessing respiratory signals. These erroneous peaks need to be eliminated. We set an amplitude threshold to eliminate false peaks. The main steps for removing false peaks are the following:

i) After eliminating the peaks in 1 2, , ,

 that are less than zero, the candidate peak position information

M is the total number of peaks after the filtered signal is less than zero. N is the total number of peaks of the filtered signal. Due to the influence of low-frequency signals, error peaks different from the breathing signal would be generated. The value of these peaks is less than zero. ii) Calculate average peak amplitude avg_value_peaks in the candidate peaks by formula8. iii) Calculate peak threshold low_value_peak according to formulas 8 to 9. If the amplitude value of candidate peaks is greater than or equal to low_value_peak, one breath is accumulated. 

ε is an amplitude parameter (adjust the ε through experimentation) during the process of error peak detection. Suppose peak_num is the number of peaks obtained after removing the wrong peaks. Then we mark the index of the first peak and the last peak as peak_start and peak_end, respectively. The average distance between two adjacent peaks is __ avg dis peak . Suppose the total number of frames of thermal infrared video acquired in one minute is total_frame. The final respiration rate is calculated as follows [56] . _ ( _ _ ) _ _ _ peak start total frame peak end RR peak num avg dis peak

Since the breathing information [0, _ ] peak start and [ _ , _ ] peak end total frame is not considered, the number of breaths in this part needs to be considered.

In the Second Affiliated Hospital of Anhui Medical University, medical staff participated in collecting clinical data. Fifteen healthy subjects were recruited. They are between 20 and 31 years old and weigh between 46 and 105 kg. Participation in the study was voluntary, and all participants provided informed consent. The local ethics committee approved this study.

We collected data in an environment with a temperature of 23-26°C. We started collecting data on March 14th, 2020. The amount of data collected per day was 5 to 30 minutes. Subjects were allowed to perform unconscious head movements in the process of data collection. As of April 14th, we collected a total of 305 minutes of data. Due to equipment and subject issues, a portion of the data collected was invalid. After removing invalid data, 274 minutes of useful video was used for our experiment.

Firstly, the purpose of our comparative experiments is to evaluate the accuracy among advanced methods and DLRRMS of ours. Advanced methods include DLRRMS without eliminating erroneous peaks (DLRRMS-EEP), Gaussian window (GW) [31] , and frequency domain analysis method (FDAM) [57] . DLRRMS-EEP is a method of DLRRMS without steps to eliminate error peaks. It is to evaluate the effectiveness of false peaks elimination. GW is a scheme with a fixed width of the Gaussian window to process the respiratory signal after Butterworth filtering. FDAM is a method of converting time-domain signals into frequency domains to calculate RR. We use mean absolute error (MAE), root means square error (RMSE), and mean square error (MSE) to evaluate the performance of DLRRMS. MAE represents the average value of the absolute error between the predicted value and the observed value. RMSE represents the sample standard deviation of the difference between the predicted value and the observed value, which illustrates the degree of dispersion of the sample. MSE is the square of RMSE. Secondly, the purpose of our comparative experiments is to evaluate the consistency between the respiration rate monitoring by advanced methods and the real respiration rate. The consistency evaluation usually uses the Bland-Altman method. The problem of comparing two detection methods is often faced in clinical research. For example, A is a classic method, and B is a new method. It is by the 95% consistency limit of the measurement results of the two methods. The average of ground truth and predictive value is the horizontal axis. We draw a scatterplot and mark the 95% consistency limit. Finally, combining with the maximum error allowed by clinical practice, it is concluded whether the two methods are consistent.

Nose ROI was selected to extract the respiration rate signal to compare with existing RR monitoring methods, including DLRRMS without eliminating erroneous peaks (DLRRMS-EEP), Gaussian window (GW) [31] , and frequency domain analysis method (FDAM) [57] .

The width of the Gaussian window was set to be 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 to process the respiratory signal. The results were shown in TABLE 1. The overall results were the best when the width of the Gaussian window was 130. At this time, their MAE and RMSE were 0.8359 and 1.1734, respectively. TABLE 2 summarizes the comparison results of total MAE and RMSE for four methods, which were evaluated for RR estimation accuracy with a total of 274 minutes (13 subjects) of spontaneous respiration tasks. DLRRMS has the best performance in comparison methods of this paper. Fig.4 shows the comparison of MAE and RMSE for four methods. Due to the unbalanced amount of data collected from each subject, MAE and RMSE are more volatile. Furthermore, each subject's height, weight, and age are different. The differences in pixel intensity due to the temperature of the nostril airflow may cause different respiratory signals. Due to motion artifacts and the absence of eliminating erroneous peaks, MAE and RMSE fluctuations obtained by DLRRMS-EEP have significant changes. FDAM is greatly affected by factors such as individual differences in subjects and motion artifacts. In general, DLRRMS outperforms the other three schemes for most subjects, except for the 5-th and 11-th subjects. The lowest MAE and RMSE support it in most instances.

The Bland-Altman method was used to evaluate the consistency between RR monitoring by DLRRMS and the real respiration rate. The difference between the measurement results is the vertical axis. The average of ground truth and predictive value is the x-axis. The difference between measurement results is the y-axis. We draw a scatterplot and mark the 95% consistency limit. Fig.5 shows that DLRRMS has the best consistency. It is supported by the lowest mean of difference  and the smallest interval 

The robot layers consist of many mobile robots. There are three cameras and a display on a mobile robot. Two visible light cameras are on the bottom (Logitech HD Webcam C310) and top of the robot (c922 Pro Stream Webcam). The visible light camera on the top enables interaction with subjects, which is designed to provide a view for robot navigation. Another thermal infrared camera (Guide Sensmart IPT640) on the top is used to collect thermal infrared videos. The wavelength range of this camera is 8~14 μm, the temperature measurement range is -20 to 150 degrees Celsius, and the working environment temperature is -10 to 50 degrees Celsius. The resolution of the three cameras is 640*480 pixels.

The computing tasks of edge layers are undertaken by a workstation, which simulates the computing power of the cloud. Its configuration is Intel (R) Core (TM) i7-8565 with main frequency 1.8 GHz and highest turbo frequency 4.6 GHz, NVIDIA Geforce MX150, and RAM 8GB.

The computing tasks of cloud layers are implemented by a workstation with Intel (R) Core (TM) i7-10510U, main frequency 1.8 GHz and highest turbo frequency 4.80 GHz, and RAM 16.0 GB.

The purpose of our experiments is to analyze the differences between the "Robot+Edge+Cloud" architecture of ours, the "Edge+Cloud" architecture, and cloud architecture. Moreover, we use time costs and system utility to evaluate the performance of each architecture. Time costs are composed of two parts: communication time costs and computation time costs. The former refers to network delay and time costs of data exchange between subtasks. The latter refers to the computational cost of performing calculation tasks related to RR calculation at robot layers, edge layers, and cloud layers. In computing task decomposition, we use the amount of decomposed tasks to evaluate system utility. Our goal is to maximize the tasks decomposed per unit time. It is described as 

We have done simulation experiments to simulate different numbers of robots, edges, and clouds to decompose calculation tasks. It aims to explore the impact of different architectures on time costs and system utility.

The experimental results of time costs are shown in Table  3 . One breathing signal is obtained by calculating 1500 picture frames through our RR monitoring system. We regard this process as a task. When NR=1, NE=1, and NC=1, the cost of convolution operation, similarity calculation, signal process, communication time costs, and computation time costs are 2.255 ms per frame, 27.483 ms per frame, 0.78 seconds (for processing one-minute length respiratory signals), 12.86 seconds, 58.25 seconds, respectively. "Robot +Edge+ Cloud" (REC) architecture could ensure the realtime monitoring of respiratory rate. It is supported by that its computation time costs are 58.25 seconds.

We have done simulation experiments using different numbers of robots, edges, and clouds. The effect of different numbers of devices on the three architectures is shown in Fig.6 to Fig.8. Fig.6 shows the communication time costs of the three architectures. Their communication time costs increase linearly as the number of frames increases. The communication time costs of "Robot+Edge+Cloud" architecture are less than that of cloud architecture and "Edge+Cloud" (EC) architecture. process tasks. Overall, the computation time costs of ERC architectures and EC architecture are not much different. The results of Fig.8 show that we could increase the number of devices to increase computing power and system utility, which reduces computation time costs of task decomposition. It is supported by the REC curve (Robot=5, Edge=20, Cloud=20), and the REC curve (Robot=5, Edge=10, Cloud=10). Communication time costs increase, and system utility reduces when subtasks compete for computing resources in the early stage. It is supported by the curve of REC (Robot=5, Edge=20, Cloud=20) and the curve of EC (Edge=20, Cloud=20). When Robot=5, Edge=20, and Cloud=20, the size of each task queue is 5000 frames, and each task queue is divided into ten equal subtasks. When Edge=20 and Cloud=20, the size of each task queue is 7500 frames, and each task queue is divided into ten equal subtasks. When the proportion of edges and mobile robots increases, our respiratory monitoring system has good scalability, which is supported by the curve of REC (Robot=10, Edge=20, Cloud=40; the size of each task queue is 7500 frames, and each task queue is divided into ten equal subtasks) and the curve (Robot=5, Edge=10, Cloud=10). As computing equipment increases, the overall computing power increases. The performance of REC architecture is better than the other two architectures.

We use our three-layer architecture of "Robot+Edge+Cloud" to decompose classification tasks of CT images for COVID-19 patients. The public COVID-CT dataset includes 349 CT images positive for COVID-19 from 216 patients and 397 CT images that are negative for COVID-19 [42] . CT COVID-19 images are collected from COVID-19 related papers by medRxiv, bioRxiv, NEJM, JAMA, Lancet. The dataset is accessible on https://github.com/UCSD-AI4H/COVID-CT.

We adopt the pre-trained network structure of densenet169 to do classification tasks. The classification network is divided into feature extraction layers, ReLU-Pooling layer, and fully connected layer. We put feature extraction layers with a heavier computing task on the edges. ReLU-Pooling and fully connected layers are deployed on mobile robots and clouds, respectively.

In the "Edge+Cloud" architecture, the feature extraction layer and ReLU-Pooling layer are deployed on edges while the fully connected layer is deployed on clouds. In the cloud architecture, feature extraction layers, the Relu-Pooling layer, and the fully connected layer of the classification model is deployed in the cloud.

The public dataset of CT images for COVID-19 patients is used to verify the effectiveness of the proposed "Robot+Edge+Cloud" architecture. We design comparative experiments on classification tasks to analyze differences in communication time costs, computation time costs, and system utility of "Robot+Edge+Cloud" architecture, "Edge+Cloud" architecture, and cloud architecture. We adopt a classification model to diagnose COVID-19 patients. It is based on our "Robot+Edge+Cloud" architecture to decompose classification tasks. The experimental results are shown in Fig.9 to Fig.11 . Fig.9 and Fig.10 , the experimental results show that the communication time costs and computation time costs of the three architectures increase as the number of diagnosed pictures increases. Among the three architectures, the "Robot+Edge+Cloud" architecture has the best performance. REC architecture uses the least communication time and computational time costs when the number of diagnosed pictures is the same. Since the computing power at edges is the best, the computational time costs of the "Robot+Edge+Cloud" and "Edge+Cloud" architectures are not much different.

From the experimental results in Fig.11 , it shows that the "Robot+Edge+Cloud" architecture has the best system utility among the three architectures. "Robot+Edge+Cloud" and "Edge+Cloud" architectures decompose the more computing tasks into edge layers with better computing power, and their overall system utility is not much different. 

In the first comparative experiments, GW with a fixed width processes the respiratory signal after Butterworth filtering. It is not suitable for peak filtering of any width when the instantaneous breathing frequency changes and may miss some peaks. In addition, the unconscious head movement would produce motion artifacts and bring the noise to the respiratory signal, and it may affect the RR accuracy of FDAM. DLRRMS-EEP calculates RR without eliminating falsing peaks, and it counts more peaks that should not be retained. However, DLRRMS effectively overcomes the defects of the above existing methods by setting an amplitude threshold, and it is verified through the experimental results of TABLE 2 and Fig.5 . Fig.12 shows the plots of GW, DLRRMS, original signal. Both GW and DLRRMS use the same signal preprocessing method, such as detrending, standardization, Butterworth filtering. However, the steps applied to eliminate error peaks are different. The former uses a polynomial Gaussian fitting method, and the latter eliminates false peaks by setting an amplitude peak threshold. Since GW and DLRRMS use the same signal preprocessing method and only use different methods to eliminate false peaks, the experimental results between them are very close. Moreover, it could be seen that the correlation of peaks retained between DLRRMS and the original signal is higher than that of GW and the original signal from Fig.12 . This is the reason why DLRRMS proposed is better than GW. There are many broad application prospects in the medical field. However, due to the imbalance of data and individual differences, we need to collect more data and explore related future work issues. Moreover, we need to integrate more non-contact and contact physiological parameters into our system to provide a scientific basis for realizing comprehensive health monitoring.

In the second comparative experiment, as the number of robots, edges, and clouds increases, the computing resources of architecture increase. We decompose time-consuming computing tasks into devices with relatively good computing capabilities. In "Robot+Edge+Cloud" and "Edge+Cloud" architectures, the most time-consuming Spatio-temporal feature tracking calculation tasks are placed on edges, and respiration signal extraction and preprocessing tasks are placed on clouds. In the "Edge+Cloud" architecture, we put facial feature extraction tasks on edges. Moreover, respiratory signal feature extraction and preprocessing tasks are placed on clouds. In cloud architecture, all computing tasks are placed on clouds. In this case, the overall system effect increases by increasing the number of edges. However, when the number of devices increases to a certain extent, communication delays and communication time costs increase. Communication time costs are mainly induced by information interaction and resource competition between robots, edges, and clouds. When the communication network is congested, the system utility is reduced.

Assuming that CT images from COVID-19 patients are collected from the robot, we have adopted our "Robot+Edge+Cloud" architecture to break down tasks of diagnosing patients with COVID-19. Additionally, we adopt "Robot+Edge+Cloud" architecture, "Edge+Cloud" architecture, and cloud architecture to do a comparative experiment. The time cost and system utility are used to evaluate the system performance of the three architectures. Computing performance at edges is better than that of robots and clouds. In REC architecture and EC architecture, because the computing tasks of feature extraction are all placed on the edge, the communication time cost and computing time cost between them are not much different. Similarly, the system utility between REC architecture and EC architecture is not significantly different.

However, there is still much work to be improved before it could be realistically applied to special medical scenarios. Firstly, CT images from COVID-19 patients are not collected by our mobile robots. We have not considered the delay in the process of CT acquisition. At the same time, since most of the public CT images are collected from COVID-19 related papers, they are not the original TC image, which could not reflect the actual amount of calculation for diagnosing COVID-19 in the medical scene. In addition, we have not tested our three-tier architecture under network congestion or network instability, which is what we need to strengthen in the future. Furthermore, in our three-tier structure, there is a lack of privacy protection for confirmed patients with COVID-19. Finally, in real medical application scenarios, we still need to consider the dynamic allocation of computing resources in future work.

In this paper, we designed a deep learning-based respiratory rate monitoring system using mobile robots and edge equipment. Due to the limited computing resources of mobile robots and the real-time requirements of physiological parameter monitoring in medical scenarios, a three-tier architecture with robot layers, edge layers, and cloud layers is adopted to decompose different computing tasks. We deployed feature extraction tasks, Spatio-temporal features tracking tasks, and respiration signal extraction and preprocessing tasks at robot layers, edge layers, and cloud layers, respectively. This architecture meets the needs of real-time tracking of infrared noses and extracting breathing signals while considering the limitations of mobile robots' computing resources. Compared with "Edge+Cloud" architecture and cloud architecture, our three-tier architecture of "Robot+Edge+Cloud" has better performance in task decomposition. It spends fewer time costs than the other two architectures. In this situation, realtime and effective breathing signals could provide data support for medical staff to make decisions if more physiological data is integrated into the system. In terms of accuracy, compared with current mainstream GW, FDAM, and DLRRMS-EEP, the MAE and RMSE of DLRRMS proposed in this paper are reduced by 5.12%, 5.06% at least, respectively. Furthermore, the consistency evaluation of DLRRMS is better than other comparison methods. In short, our DLRRMS could not only calculate the respiratory rate in real-time but also ensure the accuracy of respiratory monitoring. Furthermore, we have conducted experiments on a public dataset of CT COVID-19 images to verify the three-layer architecture's effectiveness. And our three-tier architecture will have broad application scenarios in future unmanned hospitals.

Haimiao Mo is a PhD student at the School of Management, Hefei University of Technology, China. His current research is in the area of noncontact health monitoring, data mining, and machine learning. 

Sleepdisordered breathing after targeted ablation of preBötzinger complex neuronsSleep-disordered breathing after targeted ablation of preBötzinger complex neurons

Situ: A situation-theoretic approach to context-aware service evolution

Global, regional, and national incidence, prevalence, and years lived with disability for 354 Diseases and Injuries for 195 countries and territories, 1990-2017: A systematic analysis for the Global Burden of Disease Study

Apnea MedAssist: Real-time sleep apnea monitor using single-lead ECG

Lightweight Privacy-preserving Medical Diagnosis in Edge Computing

A Double Deep Q-Learning Model for Energy-Efficient Edge Scheduling

Human-computer collaboration for skin cancer recognition

Reading socially: Transforming the in-home reading experience with a learning-companion robot

Expert discovery and interactions in mixed service-oriented systems

Robot Assistant in Management of Diabetes in Children Based on the Internet of Things

Improving social skills in children with ASD using a long-term, in-home social robot

Efficient human activity recognition using a single wearable sensor

COVID-19 infection: the China and Italy perspectives

Combating COVID-19-The role of robotics in managing public health and infectious diseases

The potential of socially assistive robots during infectious disease outbreaks

Multi-User Multi-Task Computation Offloading in Green Mobile Edge Cloud Computing

A New Deep Learning-Based Food Recognition System for Dietary Assessment on An Edge Computing Service Infrastructure

Edge Computing: Vision and Challenges

Privacy-Preserving Reputation Management for Edge Computing Enhanced Mobile Crowdsensing

Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing

Offloading optimization in edge computing for deep learning enabled target tracking by internet-of-UAVs

A Wireless Health Monitoring System Using Mobile Phone Accessories

All-organic optoelectronic sensor for pulse oximetry

A soft and transparent visuo-haptic interface pursuing wearable devices

Wearable devices for precision medicine and health state monitoring

Hand Gesture Recognition and Finger Angle Estimation via Wrist-Worn Modified Barometric Pressure Sensing

Thermistor at a distance: Unobtrusive measurement of breathing

Tidal Volume and Instantaneous Respiration Rate Estimation using a Volumetric Surrogate Signal Acquired via a Smartphone Camera

Noninvasive respiration movement sensor based on distributed Bragg reflector fiber laser with beat frequency interrogation

Estimation of Respiratory Rates Using the Built-in Microphone of a Smartphone or Headset

Robust tracking of respiratory rate in high-dynamic range scenes using mobile thermal imaging

Robust respiration detection from remote photoplethysmography

Respiratory rate estimation from the built-in cameras of smartphones and tablets

Psychoacoustic Annoyance Implementation with Wireless Acoustic Sensor Networks for Monitoring in Smart Cities

A Noninvasive, Electromagnetic, Epidermal Sensing Device for Hemodynamics Monitoring

Tissue viability by multispectral near infrared imaging: A fuzzy C-means clustering analysis

A reappraisal of the use of infrared thermal image analysis in medicine

Multispectral Video Fusion for Non-contact Monitoring of Respiratory Rate and Apnea

Modeling engagement in long-term, in-home socially assistive robot interventions for children with autism spectrum disorders

Multi-sensor fusion for body sensor network in medical human-robot interaction scenario

A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19)

COVID-CT-Dataset: a CT scan dataset about COVID-19

Predicting COVID-19 malignant progression with AI techniques

Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal

Deep Learning in Edge of Vehicles: Exploring Trirelationship for Data Transmission

A Democratically Collaborative Learning Scheme for Fog-enabled Pervasive Environments

A Distributed Deep Learning System for Web Attack Detection on Edge Devices

Distributed Deep Learning Model for Intelligent Video Surveillance Systems with Edge Computing

Achieving democracy in edge intelligence: a fog-based collaborative learning scheme

Fully Convolutional Networks for Semantic Segmentation

Fully-convolutional siamese networks for object tracking

When Correlation Filters Meet Siamese Networks for Real-Time Complementary Tracking

Remote monitoring of breathing dynamics using infrared thermography

Remote detection of photoplethysmographic systolic and diastolic peaks using a digital camera

A method for respiration rate detection in wrist PPG signal using Holo-Hilbert spectrum

Multiparameter respiratory rate estimation from the photoplethysmogram

Noncontact Vision-Based Cardiopulmonary Monitoring in Different Sleeping Positions

The authors would like to thank Professor Chao Lu from the Second Affiliated Hospital of Anhui Medical University for supporting data collection and anonymous reviewers for their detailed and thoughtful feedback, which improved the quality of this paper significantly. This work is fully supported by the National Natural Science