key: cord-0561464-swbivgrw authors: Cot'e-Allard, Ulysse; Gagnon-Turcotte, Gabriel; Phinyomark, Angkoon; Glette, Kyrre; Scheme, Erik; Laviolette, Franccois; Gosselin, Benoit title: Unsupervised Domain Adversarial Self-Calibration for Electromyographic-based Gesture Recognition date: 2019-12-21 journal: nan DOI: nan sha: a612cad5b25abb043169eec70ba0a364dcb8896d doc_id: 561464 cord_uid: swbivgrw Surface electromyography (sEMG) provides an intuitive and non-invasive interface from which to control machines. However, preserving the myoelectric control system's performance over multiple days is challenging, due to the transient nature of this recording technique. In practice, if the system is to remain usable, a time-consuming and periodic re-calibration is necessary. In the case where the sEMG interface is employed every few days, the user might need to do this re-calibration before every use. Thus, severely limiting the practicality of such a control method. Consequently, this paper proposes tackling the especially challenging task of adapting to sEMG signals when multiple days have elapsed between each recording, by presenting SCADANN, a new, deep learning-based, self-calibrating algorithm. SCADANN is ranked against three state of the art domain adversarial algorithms and a multiple-vote self-calibrating algorithm on both offline and online datasets. Overall, SCADANN is shown to systematically improve classifiers' performance over no adaptation and ranks first on almost all the cases tested. Robots have become increasingly prominent within humans' life. As a result, the way in which people interact with machines is constantly evolving to reach a better synergy between human intention and machine action. The ease of transcribing intention into commands is highly dependent on the type of interface and its implementations [1] . Within this context, muscle activity offers an attractive and intuitive way to perform gesture recognition as a guidance method [2] , [3] . Such activity can be recorded from surface electromyography (sEMG), a non-invasive technique widely adopted both for prosthetic control and in research as a way to seamlessly interact with machines [4] , [5] . sEMG signals are non-stationary, and represent the sum of subcutaneous motor action potentials generated through muscular contractions [3] . Artificial intelligence can then be leveraged as the bridge between these biological signals and robot input guidance. Current state of the art algorithms in gesture recognition routinely achieve accuracies above 95% for the classification of offline, within-day datasets [6] , [7] . However, many practical issues still need to be solved before implementing these type of algorithms into practical applications [4] , [8] . Electrode shift and the transient nature of the sEMG signal are among the main obstacles to a robust and widespread implementation of real-time sEMG-based gesture recognition [4] . In practice, † These authors share senior authorship this means that users of current myoelectric systems need to perform periodic recalibration of their device so as to retain their usability. To address the issue of real-time myoelectric control, researchers have proposed rejection-based methods where a gesture is predicted only when a sufficient level of certainty is achieved [9] , [10] . While these types of methods have been shown to increase online usability, they do not directly address the inherent decline in performance of the classifier overtime. One way to address this issue is to leverage transfer learning algorithm to periodically re-calibrate the system with less data than normally required [11] , [12] . While these types of methods reduce the burden on the user, they still require said user to periodically record labeled data. This work focuses on the problem of across-day sEMGbased gesture recognition both within an offline and online setting. In particular, this work considers the setting where several days are elapsed between each recording session. Such a setting naturally arises when sEMG-based gesture recognition is used for video games, artistic performances or, simply, to control a non-essential device [13] , [5] , [14] . In contrast to within-day or even day-to-day adaptation, this work's setting is especially challenging as the change in signal between two sessions is expected to be substantially greater and no intermediary data exists to bridge this gap. The goal is then for the classifier to be able to adapt over-time using the unlabeled data obtained from the myoelectric system. Such a problem can be framed within an unsupervised domain adaptation setting [15] where there exists an initial labeled dataset on which to train, but the classifier then has to adapt to data from a different, but similar distribution. Huang et al. [16] proposes to use this setting to update a support vector machine by replacing old examples forming the support vectors with new unlabeled examples which are close to the old ones (and assigning the same label as the example which get replaced). Other authors [17] propose instead to periodically retrain an LDA by updating the training dataset itself. The idea is to replace old examples with new, near (i.e. small distance within the feature space) ones. Such methods however are inherently restricted to single-day use as they rely on smooth and small signal drift to update the classifier. Additionally, these type of methods do not leverage the potential large quantity of unlabeled data generated. Deep learning algorithms however are well suited to scale to large amounts of data and were shown to be more robust to between day signal drift than LDA, especially as the amount of training data increases [18] . Within the field of image recognition, deep learning-based unsupervised domain adaptation has been extensively studied. A popular approach to this problem is domain adversarial training popularized by DANN [15] , [19] . The idea is to train a network on the labeled training dataset while also trying to learn a feature representation which makes the network unable to distinguish between the labeled and unlabeled data (see Section III for details). Building on this idea, VADA [20] tries to also minimize the cluster assumption [21] (i.e. decision boundary should avoid area of high data density). Another state of the art algorithm is DIRT-T, which starting from the output of VADA, removes the labeled data and iteratively tries to continue minimizing the cluster assumption. Detailed explanation of VADA and DIRT-T are also given in Section III. DANN, VADA and DIRT-T are state of the art domain adversarial algorithm which achieve a two-digit accuracy increase on several difficult image recognition benchmarks [20] . This work thus proposes to test these algorithms on the challenging problem of multiple-day sEMG-based gesture recognition both within an offline and online setting. An additional difficulty of the setting considered in this work is that real-time myoelectric control imposes strict limitations in relation to the amount of temporal data which can be accumulated before each new prediction. The window's length requirement has a direct negative impact on classifiers' performance [22] , [10] . This is most likely due to the fact that temporally neighboring segments most likely belong to the same class [23] , [24] . In other words, provided that predictions can be deferred, it should be possible to generate a classification algorithm with improved accuracy (compared to the real-time classifier) by looking at a wider temporal context of the data [10] . Consequently, one potential way of coping with electrode shift and the non-stationary nature of EMG signals for gesture recognition is for the classifier to self-calibrate using pseudo-labels generated from this improved classification scheme. The most natural way of performing this re-labeling is using a majority vote around each classifier's prediction. [24] have shown that such a recalibration strategy significantly improves intra-day accuracy on an offline dataset for both able-bodied and amputees (tested on the NinaPro DB2 and DB3 datasets [25] ). However for real-time control, such a majority vote strategy will increase latency, as transitions between gestures inevitably take longer to be detected. Additionally, trying to re-label every segment even when there is no clear gesture detected by the classifier will necessarily introduce undesirable noise in the pseudolabels. Finally, the domain divergence over multiple days is expected to be substantially greater than within a single day. Consequently, ignoring this gap before generating the pseudo-labels might negatively impact the self re-calibrated classifier. To address these issues, the main contribution of this paper is the introduction of SCADANN (Self-Calibrating Asynchronous Domain Adversarial Neural Network), a deep learning-based algorithm, which leverages domain adversarial training and the unique properties of real-time myoelectric control for inter-day self-recalibration. This paper is organized as follows. An overview of the datasets and the deep network architecture employed in this work is given in Section II. Section III presents the domain adaptation algorithm considered in this work, while Section IV thoroughly describes SCADANN. Finally, results and their associated discussions are given in Section V and VI respectively. This work leverages the 3DC Dataset [26] for architecture building and hyperparameters optimization and the Long-term 3DC Dataset [12] for training and testing the algorithms presented in this work. Both datasets were recorded using the 3DC Armband [26] ; a wireless, 10-channel, dry-electrode, 3D printed sEMG armband. The device samples data at 1000 Hz per channel, allowing to take advantage of the full spectra of sEMG signals [27] . As stated in [26] , the data acquisition protocol of the 3DC Dataset and Long-term 3DC Dataset were approved by the Comités d'Éthique de la Recherche avec desêtres humains de l'Université Laval (approbation number: 2017-0256 A-1/10-09-2018 and 2017-026 A2-R2/26-06-2019 respectively), and informed consent was obtained from all participants. The Long-term 3DC Dataset features 20 able-bodied participants (5F/15M) aged between 18 and 34 years old (average 26 ±4 years old) performing eleven gestures (shown in Figure 1 ). Each participant performed three recording sessions over a period of fourteen days (in seven days increments). Each recording session is divided into a Training and two Evaluation sessions. 1) Training Session: During the training session, each participant was standing and held their forearm, unsupported, parallel to the floor, with their hand relaxed (neutral position). Starting from this neutral position, each participant was asked to perform and hold each gesture for a period of five seconds. This was referred to as a cycle. Two more such cycles were recorded. In this work, the first two cycles are used for training, while the last one is used for testing (unless specified otherwise). Note that in the original dataset, four cycles are recorded for each participant, with the second one recording the participant performing each gesture with maximal intensity. This second cycle was removed for this work to reduce confounding factors. In other words, cycle two and three in this work correspond to cycle three and four in the original dataset. In addition to the eleven gestures considered in the Longterm 3DC Dataset, a reduced dataset from the original Longterm dataset containing seven gestures is also employed. This Reduced Long-term 3DC Dataset is considered as it could more realistically be implanted on a real-world system given the current state of the art of EMG-based hand gesture recognition. The following gestures are selected to form the reduced dataset: neutral, open hand, power grip, radial/ulnar deviation and wrist flexion/extension. They were selected as they were shown to be sufficient in conjunction with orientation data to control a 6 degree-of-freedom robotic arm in real-time [14] . 2) Evaluation Session: In addition to the offline datasets (i.e. the normal and the reduced datasets from the training sessions), the evaluation sessions represent a real-time dataset. Each evaluation session lasted three and a half minutes. During that time, the participants were asked to perform a specific gesture at a specific intensity and at a specific position. A new gesture, intensity and position were randomly asked every five seconds. These evaluations were also recorded over multiple days and the participants were the ones placing the armband on their forearm at the beginning of each session. As such, the evaluation sessions provide a real-time dataset which include the four main dynamic factors [28] in sEMG-based gesture recognition. Note that while the participant received visual feedback within the VR environment in relation to the performed gesture, gesture intensity and limb position, the performed gestures were classified using a leap motion camera [29] as to not bias the dataset towards a particular EMG-based classifier. In this work, when specified, the first evaluation session of a given recording session was employed as the unlabeled training dataset for the algorithms presented in Section III and IV, while the second evaluation session was used for testing. 3) Data Pre-processing: This work aims at studying unsupervised re-calibration of myoelectric control systems. Consequently, the input latency is a critical factor to consider. The optimal guidance latency was found to be between 150 and 250 ms [22] . Consequently, within this work, the data from each participant is segmented into 150 ms frames with an overlap of 100 ms. Each segment thus contains 10 × 150 (channel × time) data points. The segmented data is then band-pass filtered between 20-495 Hz using a fourth-order butterworth filter. Given a segment, the spectrogram for each sEMG channel are then computed using a 48 points Hann window with an overlap of 14 yielding a matrix of 4×25 (time×f requency). The first frequency band is then removed in an effort to reduce baseline drift and motion artifacts. Finally, following [30] , the time and channel axis are swapped such that an example is of the shape 4 × 10 × 24 (time × channel × f requency). The 3DC Dataset features 22 able-bodied participants and is used for architecture building and hyperparameter selection. This dataset, presented in [26] , features the same eleven gestures as the Long-term 3DC Dataset. Its recording protocol closely matches the training session description (Section II-A), with the difference being that two such sessions were recorded for each participant (within the same day). This dataset was preprocessed as described in Section II-A3. Note that when recording the 3DC Dataset, participants were wearing both the Myo and 3DC Armband, however in this work, only the data from the 3DC Armband is employed. Spectrograms were selected to be fed as input to the ConvNet as they were shown to be competitive with the state of the art [6] , [24] . A simple ConvNet's architecture inspired from [31] presented in Figure 2 was selected as to reduce potential confounding factors. The ConvNet's architecture contains four blocks followed by a global average pooling and two heads. The first head is used to predict the gesture held by the participant. The second head is only activated when employing domain adversarial algorithms (see Section III and IV for details). Each blocks encapsulate a convolutional layer [32] , followed by batch normalization [33] , leaky ReLU [34] and dropout [35] . ADAM [36] is employed for the ConvNet's optimization with an initial learning rate of 0.0404709 and batch size of 512 (as used in [31] ). Early stopping, with a patience of 10 epochs, is also applied by using 10% of the training dataset as a validation set. Additionally, learning rate annealing, with a factor of five and a patience of five, was also used. Dropout is set to 0.5 (following [31] ). The architecture choices and hyperparameter selections were derived from the 3DC Dataset and previous literature using it (mainly [31] , [26] ). Note that the ConvNet's architecture implementation, written with PyTorch [37] , is made readily available here (https://github.com/UlysseCoteAllard/LongTermEMG). This work considers three calibration methods for long-term classification of sEMG signals: No Calibration, Re-Calibration and Unsupervised Calibration. In the first case, the network is trained solely from the data of the first session. In the Re-Calibration case, the model is re-trained at each new session with the new labeled data. Unsupervised Calibration is similar to Re-Calibration, but the dataset used for re-calibration is unlabeled. Section III and IV presents the unsupervised calibration algorithms considered in this work. Domain adaptation is a research area in machine learning which aims at learning a discriminative predictor from two datasets coming from two different, but related, distributions [19] (referred to as D s and D t ). In the unsupervised case, one of the datasets is labeled (and comes from D s ), while the second is unlabeled (and comes from D t ). Within the context of myoelectric control systems, labeled data is obtained through a user's conscious calibration session. However, due to the transient nature of sEMG signals [28] , [38] , classification performance tends to degrade over time. This naturally creates a burden for the user who needs to periodically recalibrate the system to maintain its usability [38] , [39] . During normal usage however, unlabeled data is constantly generated. Consequently, the unsupervised domain adaptation setting naturally arises by defining the source dataset as the labeled data of the calibration session and the target dataset as the unlabeled data generated by the user during control. The PyTorch implementation of the domain adaptation algorithms is mainly based on [40] . The Domain-Adversarial Neural Network (DANN) algorithm proposes to predict on the target dataset by learning a representation from the source dataset that makes it hard to distinguish examples from either distribution [15] , [19] . To achieve this objective, DANN adds a second head to the network. This head, referred to as the domain classification head, receives the features from the last feature extraction layer of the network (in this work case, from the global average pooling layer). The goal of this second head is to learn to discriminate between the two domains (source and target). However, during backpropagation, the gradient computed from the domain loss is multiplied by a negative constant (-1 in this work). This gradient reversal explicitly forces the feature distribution of the domains to be similar. The backpropagation algorithm proceeds normally for the original head (classification head). The two losses are combined as follows: where θ is the classifier's parametrization, L y and L d are the prediction and domain loss respectively. λ d is a scalar that weights the domain loss (set to 0.1 in this work). Decision-boundary Iterative Refinement Training with a Teacher (DIRT-T) is a two-step domain-adversarial training algorithm which achieves state of the art results on a variety of domain adaptation benchmarks [20] . 1) First step: During the first step, referred to as VADA (for Virtual Adversarial Domain Adaptation) [20] ), training is done using DANN as described previously (i.e. using a second head to discriminate between domains). However, with VADA, the network is also penalized when it violates the cluster assumption on the target. This assumption states that data belonging to the same cluster in the feature space share the same class. Consequently, decision boundaries should avoided crossing dense regions. As shown in [41] , this behavior can be achieved by minimizing the conditional entropy with respect to the target distribution: Where θ is the parametrization of a classifier h. In practice, L c must be estimated from the available data. However, as noted by [41] , such an approximation breaks if the classifier h is not locally-Lipschitz (i.e. an arbitrary small change in the classifier's input produces an arbitrarily large change in the classifier's output). To remedy this, VADA [20] proposes to explicitly incorporate the locally-Lipschitz constraint during training via Virtual Adversarial Training (VAT) [42] . VAT generates new "virtual" examples at each training batch by applying small perturbation to the original data. The average maximal Kullback-Leibler divergence (D KL ) [43] is then minimized between the real and virtual examples to enforce the locally-Lipschitz constraint. In other words, VAT adds the following function to minimize during training: As VAT can be seen as a form of regularization, it is also applied for the source data. In summary, the combined loss function to minimize during VADA training is: Where the importance of each losses functions are weighted by hyperparameters (λ d , λ vs , λ vt , λ c ) . A diagram of VADA is given in Figure 3 . Step: During the second step, the signal from the source is removed. The idea is then to find a new parametrization that further minimizes the target cluster assumption violation while remaining close to the classifier found during the first step. This process can then be repeated by updating the original classifier with the classifier's parametrization found at each iteration. The combined loss function to minimize during the nth iteration thus becomes: Where β is a hyperparameter which weighs the importance of remaining close to h θn−1 . In practice, the optimization problem of Eq. 4 can be approximately solved with a finite number of stochastic gradient descent steps [20] . Note that, both DANN and VADA were conservative domain adaptation algorithms (i.e. the training algorithms try to generate a classifier that is able to discriminate between classes from both the source and target simultanously). In contrast, DIRT-T is non-conservative as it ignores the source's signal during training. In the case where the gap between the source and the target is important, this type of nonconservative algorithm are expected to perform better than their conservative counterparts [20] . In principle, this second step could be applied as a refinement step to any other domain adaptation training algorithms. Following [20] , the hyperparameters values are set to λ d = 10 −2 , λ vs = 1, λ vt = 10 −2 , λ c = 10 −2 , β = 10 −2 . Within an unsupervised domain adaptation setting, the classifier's performance is limited by the unavailability of labeled data from the target domain. However, real-time EMG-based gesture recognition offers a particular context from which pseudo-labels can be generated from the recorded data by looking at the prediction's context of the classifier. These pseudo-labels can then be used as a way for the classifier to perform self-recalibration. [24] proposed to leverage this special context by re-labeling the network's predictions. Let P (i, j) be the softmax value of the network's output for the jth gesture (associated with the jth output neuron) of the ith example of a sequence. The heuristic considers an array composed of the t segments surrounding example i (included). For each j, the median softmax value over this array is computed, + 1, j) , ..., The pseudo-label of i then becomes the gesture j associated with the maximalP (i, j). The median of the softmax's outputs is used instead of the prediction's mean to reduce the impact of outliers [24] . This self-calibrating heuristic will be refered to as MV (for Multiple Votes) from now on. The hyperparameter t was set to 1 second, as recommended in [24] . This work proposes to improve on MV with a new selfcalibrating algorithm, named SCADANN. SCADANN is divided into three steps: 1) Apply DANN to the network using the labeled and newly acquired unlabeled data. 2) Using the adapted network, perform the re-labeling scheme described in Section IV-A. 3) Starting from the adapted network, train the network with the pseudo-labeled data and labeled data while continuing to apply DANN to minimize domain divergence between the two datasets. The first step aims at reducing the domain divergence between the labeled recording session and the unlabeled recording as to improve classification performance of the network. The second step uses the pseudo-labeling heuristic described in Section IV-A. In addition to using the prediction's context to enhance the re-labeling process, the proposed heuristic introduces two improvements compared to [24] : First, the heuristic tries to detect transition from one gesture to another. Then, already re-labeled predictions falling within the transition period are vetted and possibly re-labeled to better reflect when the actual transition occurred. This improvement aims at addressing two problems. First, the added latency introduced by majority-voting pseudo-labeling is removed. Second, this re-labeling can provides the training algorithm with gestures' transition examples. This is of particular interest as labeled transition examples are simply too time consuming to produce. In fact, given a dataset with g gestures, the number of transitions which would need to be recorded is g × (g − 1), something that is simply not viable considering the current need for periodic re-calibration. Introducing pseudo-labeled transition examples within the target dataset, could allow the network to detect transitions more rapidly and thus reduce the system latency. In turn, due to this latency's reduction, window's length could be increases to improve the overall system's performance. The second improvement, introduces the notion of stability to the network's predictions. Using this notion, the heuristic removes from the pseudo-labeled dataset examples that are more likely to be re-labeled falsely. This second improvement is essential for a realistic implementation of selfcalibrating algorithms, as otherwise the pseudo-labeled dataset would rapidly be filled with an important quantity of noise. This would result in a rapidly degenerating network as selfcalibration is performed iteratively. The third step re-calibrates the network using the labeled and pseudo-labeled dataset in conjunction. DANN is again employed to try to obtain a similar feature representation between the source and target datasets. The source dataset contains the labeled dataset alongside all the pseudo-labeled data from prior sessions, while the target dataset contains the pseudo-labeled data from the current session. The difference between the first step is that the network's weights are also optimized in relation to the cross-entropy loss calculated from the pseudo-labels. Early stopping is performed using only examples from the pseudo-labeled dataset as examples. If only the pseudo-labeled dataset was employed for re-calibration, the network performance would rapidly degrade from being trained only with noisy labels and possibly without certain gestures (i.e. nothing ensure that the pseudo-labeled dataset is balanced or even contains all the gestures). For concision's sake, the pseudo-code for the proposed relabeling heuristic is presented in Appendix A-Algorithm 1. Note also that a python implementation of SCADANN (alongside the pseudo-labeling heuristic) is available in the previously mentioned repository repository. The main idea behind the heuristic is to look at the network's prediction one after the other, so that when the next prediction is different than the previous, the heuristic goes from the stable state to the unstable state. During the stable state, the prediction of the considered segment is added to the pseudo-label array. During the unstable state, all the network's output (after the softmax layer) are instead accumulated in a second array. When this second array contains enough segments (hyperparameter sets to 1.5s in this work), the class associated with the output neuron with the highest median value is defined as the new possible stable class. The new possible stable class is confirmed if the median percentage of this class (compared with the other classes) is above a certain threshold (85% and 65% for the seven and eleven gestures dataset respectively (selected using the 3DC dataset)). If this threshold is not obtained, the oldest element in the second array is removed and replaced with the next segment. Note that the computation of the new possible stable class using the median is identical to MV. When the new possible class is confirmed, the heuristic first looks if it was in the unstable state for too long (2s in this work). If it was, all the predictions accumulated during the unstable state are removed. If the unstable state was not too long, the heuristic can then take two paths: 1) if the new stable state class is the same as before, or 2) if they are different. If they are different, it means that a gesture's transition probably occurred. Consequently, the heuristic goes back in time before the instability began (maximum of 500ms in this work) and looks at the derivative of the entropy calculated from the network's softmax output to determine when the network started to be affected by the gesture's transition. All the segments from this instability period (and adding the relevant segments from the look-back step) are then re-labeled as the new stable state class found. If instead the new stable state class is identical to the previous one, only the segments from the instability period are re-labeled. The heuristic then returns to its stable state. As suggested in [44] , a two-step statistical procedure is employed whenever multiple algorithms are compared against each other. First, Friedman's test ranks the algorithms amongst each other. Then, Holm's post-hoc test is applied (n = 20) using the No Calibration setting as a comparison basis. Additionally, Cohen's d [45] is employed to determine the effect size of using one of the self-supervised algorithm over the No Calibration setting. In this subsection, all training were performed using the first and second cycles of the relevant training session, while the third cycle was employed for testing. Training sessions one through three contains data from 20 participants, while the fourth session contains data from six participants The timegap between each training session is around seven days (21 days-gap between session 1 and 4). Note that for the first session, all algorithms are equivalent to the no re-calibration scheme and consequently perform the same. 1) Offline Seven Gestures Reduced Dataset: The average test-set accuracy obtained from the first training session across all subjects is 92.71%±5.46%. This accuracy for a ConvNet using spectrograms as input is consistent with other works using the same seven gestures with similar datasets [30] , [6] . Table I shows a comparison of the No Calibration setting alongside the three DA algorithms, MV (using the best performing All-Session recalibration setting [24] ) and SCADANN. Figure 4 shows a point-plot of the No Calibration, SCADANN and the Re-Calibration classifiers. 2) Offline Eleven Gestures Dataset: The average test-set accuracy obtained from the first training session across all subjects is 82.79%, which is consistent with accuracies obtained on the 3DC datasets [26] , [31] . Table II compares In this subsection, training using labeled data were conducted using the first, second and third cycles of the relevant training session. Table III compares the No Calibration setting with the three DA algorithms, MV and SCADANN on the second evaluation session of each experimental session, when the labeled and unlabeled data leveraged for training comes from the offline dataset. The average accuracy obtained on the second evaluation session of each experiment's session across all participant is 39.91%±14.67% and 48.89%±10.95% for the No Calibration and Re-Calibration setting respectively. Table IV presents the comparison between the No Calibration setting and using the first evaluation session of each experiment's session as the unlabeled dataset for the three DA algorithms, MV and SCADANN. A point-plot of the online accuracy of the No Calibration, Re-Calibrated, SCADANN and Re-Calibrated SCADANN using the first evaluation session of each experimental session as unlabeled data is shown in Figure 5 . The task of performing adaptation when multiple days have elapsed is especially challenging. As a comparison, on the within-day adaptation task presented in [24] , MV was able to enhance classification accuracy by 10% on average compared to the No Calibration scheme. Within this work however, the greatest improvement achieved by MV was 3.07% on the reduced offline dataset. Overall, the best improvement shown in this paper was achieved by SCADANN on the same task achieving an improvement of 8.93%. All three tested domain adversarial algorithms were also able to constantly improve the network's accuracy compared to the No Calibration scheme. When used to adapt to online unsupervised data, they were even able to achieve higher overall accuracy than SCADANN. This decrease of performance from SCADANN and MV on harder datasets is most likely due to the reduction of the overall classifier's performance. This phenomena is perhaps best shown by looking at Table III and IV, where all algorithms were tested on the same data in both tables. Note how SCADANN was the best ranked adaptation method and MV was the second best on Table III, whereas on Table IV they degenerated into being the worst (with MV being even worst than the No Calibration setting on session 0 and 2). Even more so than the general performance of the classifier however, the type of error that the classifier makes has the potential to affect the self-calibrating algorithms the most. In other words, if the classifier is confident in its error and the errors span a large amount of time, the pseudo-labeling heuristic cannot hope to re-label the segments correctly. This can rapidly make the self-calibrating algorithm degenerate as the adaptation might occur when a subset of gestures is completely misclassified in the pseudo-labeled dataset. To address this issue, future work will also leverage a hybrid IMU/EMG classifier which where shown to also be able to achieve state of the art gesture recognition [46] , [47] . The hope of this approach is that using two completely different modalities will result in a second classifier which makes mistakes at different moments, so that SCADANN re-labeling heuristic is able to more accurately generate the pseudo-labels. Note that, overall, this re-labeling heuristic substantially enhanced pseudo-labels accuracy compared to the one used with MV. As an example, consider the supervised Re-Calibrating classifier trained on all the training cycles of the relevant training session and tested on the evaluation sessions. This classifier achieves an average accuracy of 48.36% over 544 263 examples. In comparison, The MV re-labeling heuristic achieves 54.28% accuracy over the same amount of examples, while the SCADANN re-labeling heuristic obtains 61.89% and keeps 478 958 examples using the 65% threshold. When using a threshold of 85%, the accuracy reaches 68.21% and retains 372 567 examples. SCADANN's improved re-labeling accuracy compared to MV is in part due to the look-back feature of the heuristic (when de-activated, SCADANN's relabeling accuracy drops to 65.23% for the 85% threshold) and its ability to remove highly uncertain sub-sequences of predictions. Within this study, the reason for the 65% threshold was due to the limited availability of unlabeled data within each session, as removing too many examples might completely erase some gestures. It is suspected that in real application scenarios, where the amount of unlabeled data is not limited as it is on a finite dataset, higher thresholds should be preferred as discarding more examples can be afforded and would most likely enhance the performance of the self-calibrated classifier further. The main limitation of this work was that the self-calibration algorithms were tested without having the participant be able to react in real-time to the classifier's updates. While the effect's size was often small, the tested self-calibrating algorithms, and in particular SCADANN, consistently outperformed the No Calibration scheme. However, due to the type of data available, the self-calibration could only occur once every seven days, thus substantially augmenting the difficulty of this already challenging problem. In comparison, MV [24] was shown to almost completely counteract the signal drift when used on a within-day offline dataset. Future works will thus focus on using SCADANN to self-calibrate the classifier in real-time as to measure its ability to adapt in conjunction with the participant over longer period of real-time use. This paper presents SCADANN, a self-calibrating domain adversarial algorithm for myoelectric control systems. Overall, SCADANN was shown to improve the network's performance compared to the No Calibration setting in all the tested cases and the difference was significant in almost all the tested cases. This work also tested three widely used, state of the art, unsupervised domain adversarial algorithms on the challenging task of EMG-based self-calibration. These three algorithms were also found to consistently improve the classifier's performance compared to the No Calibration setting. MV, a previously proposed self-calibrating algorithm specifically for EMG-based gesture recognition, was also compared to the three DA algorithms and SCADANN. Overall, SCADANN was shown to consistently rank amongst the best (and often was the best) of the tested algorithm both using offline and online datasets as test. FUNDING Algorithm 1 Pseudo-labeling Heuristic 1: procedure GENERATEPSEUDOLABELS(unstable len, threshold stable, max len unstable, max look back, threshold derivative) 2: pseudo labels ← empty array 3: arr preds ← network's predictions 4: arr net out ← network's softmax output 5: begin arr ← The unstable len first elements of arr net out 6: stable ← TRUE arr unstable output gets empty array 7: current class ← The label associated with the output neuron with the highest median value in begin arr 8: for i from 0..arr preds length do 9: if current class different than arr preds[i] AND stable TRUE then 10: stable ←FALSE 11: first index unstable ← i 12: arr unstable output ← empty array 13: if stable is FALSE then 14: APPEND arr net out to arr unstable output 15: if length of arr unstable output is greater than unstable len then 16: REMOVE the oldest element of arr unstable output 17: if length of arr unstable output is greater or equal to unstable len then 18: arr median ← The median value in arr unstable output for each gesture 19: arr percentage medians ← arr median / the sum of arr median 20: gesture found ← The label associated with the gesture with the highest median percentage from arr percentage medians 21: if arr percentage medians[gesture found] greater than threshold stable then 22: stable ← TRUE 23: if current class is gesture found AND The time within instability is less than max len unstable then 24: Add the predictions which occurred during the unstable time to pseudo labels with the gesture found 25: else if current class is different than gesture found AND The time within instability is less than max len unstable then 26: index start change ← GetIndexStartChange(arr net out, first index unstable, max look back) 27: Add the predictions which occurred during the unstable time to pseudo labels with the gesture found label 28: Re-label the predictions from pseudo labels starting at index start change with the gesture found label 29: current class ← gesture found 30: arr unstable output ← empty array 31: else 32: Add current prediction to pseudo labels with the current class label return pseudo labels Algorithm 2 Find index start of transition heuristic 1: procedure GETINDEXSTARTCHANGE(arr net out, first index unstable, max look back, threshold derivative) 2: data uncertain ← Populate the array with the elements from arr net out starting from the first index unstablemax look back index to the first index unstable index 3: discrete entropy derivative ← Calculate the entropy for each element of data uncertain and then create an array with their derivatives. 4: index transition start ← 0 5: for i from 0..data uncertain length do 6: if discrete entropy derivative[i] greater than threshold derivative then 7: index transition start ← i 8: Get out of the loop return first index unstable + index transition start Intuitive adaptive orientation control for enhanced human-robot interaction A convolutional neural network for robotic arm guidance using semg based frequency-features Myoelectric control systems-a survey Electromyogram pattern recognition for control of powered upper-limb prostheses: state of the art and challenges for clinical use Engaging with robotic swarms: Commands from expressive motion Deep learning for electromyographic hand gesture signal classification using transfer learning Gesture recognition by instantaneous surface emg images Current state of digital signal processing in myoelectric interfaces and related applications Confidence-based rejection for improved pattern recognition myoelectric control Adaptive windowing framework for surface electromyogram-based pattern recognition system for transradial amputees Transfer learning for rapid re-calibration of a myoelectric prosthesis after electrode shift Virtual reality to study the gap between offline and real-time emg-based gesture recognition Game-based myoelectric training A convolutional neural network for robotic arm guidance using semg based frequency-features Domain-adversarial neural networks A novel unsupervised adaptive learning method for long-term electromyography (emg) pattern recognition Robust emg pattern recognition in the presence of confounding factors: features, classifiers and adaptive learning Multiday emg-based classification of hand motions with deep learning techniques Domain-adversarial training of neural networks A dirt-t approach to unsupervised domain adaptation Introduction to semi-supervised learning Determining the optimal window length for pattern recognition-based myoelectric control: balancing the competing effects of classification error and controller delay Self-correcting pattern recognition system of surface emg signals for upper limb prosthesis control Self-recalibrating surface emg pattern recognition for neuroprosthesis control based on convolutional neural network Electromyography data for non-invasive naturally-controlled robotic hand prostheses A low-cost, wireless, 3-d-printed custom armband for semg hand gesture recognition A feature extraction issue for myoelectric control based on wearable emg sensors Electromyogram pattern recognition for control of powered upper-limb prostheses: state of the art and challenges for clinical use Electronic sensor Transfer learning for semg hand gestures recognition using convolutional neural networks Interpreting deep learning features for myoelectric control: A comparison with handcrafted features Deep learning Batch normalization: Accelerating deep network training by reducing internal covariate shift Empirical evaluation of rectified activations in convolutional network Dropout as a bayesian approximation: Representing model uncertainty in deep learning Adam: A method for stochastic optimization Automatic differentiation in pytorch Reduced daily recalibration of myoelectric prosthesis classifiers based on domain adaptation Surface emg-based inter-session gesture recognition enhanced by deep domain adaptation Github repository for pytorch implementation of a dirt-t approach to unsupervised domain adaptation Semi-supervised learning by entropy minimization Virtual adversarial training: a regularization method for supervised and semi-supervised learning Information theory and statistics. Courier Corporation Statistical comparisons of classifiers over multiple data sets Statistical power analysis for the behavioral sciences Recognizing hand and finger gestures with imu based motion and emg based muscle activity sensing Combined influence of forearm orientation and muscular contraction on emg pattern recognition