key: cord-1000578-ffw4gp2b authors: Singla, Manisha; Ghosh, Debdas; Shukla, K. K. title: [Formula: see text] -TSVM: A Robust Transductive Support Vector Machine and its Application to the Detection of COVID-19 Infected Patients date: 2021-07-17 journal: Neural Process Lett DOI: 10.1007/s11063-021-10578-8 sha: 3292b65fefd592d1c4a0141b90fbcb389902d6fd doc_id: 1000578 cord_uid: ffw4gp2b Training a machine learning model on the data sets with missing labels is a challenging task. Not all models can handle the problem of missing labels. However, if these data sets are further corrupted with label noise, it becomes even more challenging to train a machine learning model on such data sets. We propose to use a transductive support vector machine (TSVM) for semi-supervised learning in this situation. We make this model robust to label noise by using a truncated pinball loss function with it. We name our approach, [Formula: see text] -TSVM. We provide both the primal and the dual formulations of the obtained robust TSVM for linear and non-linear kernels. We also perform experiments on synthetic and real-world data sets to prove the superior robustness of our model as compared to the existing approaches. To this end, we use small as well as large-scale data sets to perform the experiments. We show that the model is capable of training in the presence of label noise and finding the missing labels of the data samples. We use this property of [Formula: see text] -TSVM to detect the coronavirus patients based on their chest X-ray images. Support Vector Machine (SVM) [39] covers a considerable part of machine learning literature. It is one of the most well-performing models in the family of supervised machine learning models. Due to this, researchers applied SVM in various application areas, like bioinformatics [8] , medical sciences [17] , time-series prediction [32] , image classification [11] and signal processing [33] , etc. Besides the pros, several cons also come under the description of SVM. Sensitivity towards noise is one of them [35, 36] . Time to time researchers have proposed several variants of SVM, like transductive support vector machine (TSVM) [40] , twin SVM (TWSVM) [20] , one-class SVM (OCSVM) [15] , etc. The limitation of sensitivity towards noise is inherited in these variants too as these variants also use the conventional hinge loss function, which is sensitive to noise and outliers [35] . Many research works in the literature of SVM focus on overcoming the limitation of noise sensitivity. In addition to SVM, there are many works to deal with the sensitivity issue in the classification and regression variants of SVM [37] . This is best described in [35] . In all the above-discussed works, it is assumed that all the class labels are available during the training of a model, i.e., they come under supervised machine learning. The assignment of labels during data set creation is one of the costly and error-prone tasks. Therefore, in practice, we often come across data sets with missing labels. Training the models over such data sets comes under the category of semi-supervised learning. Transductive support vector machine (TSVM) is a semi-supervised variant of SVM [40] . It was first proposed in [41] and implemented in [6] . There are various applications in which TSVM is used for learning purposes when there are some unlabeled data samples. The survey article [14] best describes the rich literature of TSVM. Similar to SVM, TSVM is also sensitive to the label noise. This is due to the presence of a noise-sensitive loss function, e.g., the hinge loss function. The novelty of the present study lies in the fact that we propose to use the truncated pinball loss function with TSVM and solve the corresponding optimization problem by implementing both the primal and dual forms. Next, in Subsect. 1.1, we describe the conventional TSVM and the existing robust TSVMs. Robust TSVM handles noise sensitivity. In Subsect. 1.2, we mention the motivation behind this work and describe main contributions of this work. For a set L = {(x 1 , y 1 ), . . . , (x L , y L )}, x ∈ R d , y ∈ {+1, −1} of L labeled training instances and U unlabeled instances U = {x L+1 , . . . , x L+U }, we need to find an optimal separating hyperplane defined by θ = (w, b), where w is the weight vector and b is the bias term. The decision function of the form is used to label new samples, where the kernel function, φ, maps the original data into a higher dimensional feature space. We train SVM using L and the trained SVM provides the best separating hyperplane with the largest possible margin. It then assign the labels to the U unlabeled instances of the set U. TSVM is a combinatorial classifier of SVM and a constraint that the unlabeled samples should be as far as possible from the margin [41] . The optimization formulation where C and C * are the weight controlling parameters corresponding to the labeled and unlabeled instances. The minimization problem (2) can be written as an unconstrained optimization problem of the form [6] J (θ ) = 1 2 H 1 (z) = max{0, 1 − z} is the hinge loss function [35] , z = y f θ (x). In TSVM, H 1 (z) is used for the labeled samples while H 1 (|z|) is used for the unlabeled samples. These are shown in Fig. 1 . The TSVM has the limitation of assigning all the unlabeled samples to one of the classes, leading to abysmal accuracy. To solve this problem, Chapelle and Zien [13] used a significant relaxed balancing constraint: TSVM is used in many applications, like cancer classifications [26] , classification of mammographic abnormalities [46] , glaucoma classification [47] , image retrieval [10] , ship category recognition [27] , etc. Li et al. [23] proposed a robust TSVM for multi-view classification. They observed that the multi-view representation of data from a different perspective could effectively improve the generalization performance [23] . Training a model on huge data sets is a tedious task. Training on small labeled sets is also challenging since the model has insufficient learning instances. Xu et al. [44] proposed an improved version of TSVM that can learn a small labeled training set well and applied this to the motor imagery based brain-computer interface. Besides these, there are also many formulations in which researchers added robustness to the conventional TSVM by changing the loss functions. These methods are tabulated in Table 1 . This table mentions the loss function for labeled and unlabeled samples that are used by various researchers to make the TSVM robust to noise. These loss functions include the conventional hinge loss function, symmetric sigmoid loss function [24] and the ramp loss function [25] . A recent work on TSVM proposed to address its problem with Universum data [42] . In that work, Xiao et al. followed two steps: to select informative examples from the Universum data and to use that data for semi-supervised classification [42] . They used Lagrange method to solve it further. Another recent work on TSVM handled the problem of lack of sparsity in LapSVM [5] . To do this, Zheng et al. [49] used L 1 norm in LapSVM. The method performed well (in terms of accuracy) on UCI data sets. Recently, SSL is also extended to various applications like fault identification in electricity distribution networks [22] , for intrusion detection system [30] and enhanced prediction of heart disease [38] . In this work, we focus on the following three challenges: (i) To train the model in the presence of a significant number of unlabeled data. (ii) To train the model to handle small as well as large data sets effectively. (iii) To train the model under the varied amount of label noise in the data. The robust behavior of the pinball loss function [34] is the primary motivation behind this work. The use of pinball loss function in other variants of SVM, like TWSVM [43] , also made the model robust towards label noise. Since the use of pinball loss function affects the sparsity of a classifier [34] , we use the truncated pinball loss function in this work, which leads to less computational time (shown experimentally in Sect. 3). We list here the main contributions of this work: In the next section, we describe the proposed robust TSVM with a truncated pinball loss function. In Sect. 3, we report the results of the experiments performed using the proposed approach and compare those with the existing approaches. Further, in Sect. 4, we show that the proposed approach can be used to predict the COVID-19 infected patients. Finally, we conclude the work in Sect. 5. In this section, we describe the robust TSVM formulation using truncated pinball loss function. The truncated pinball loss function is where 0 ≤ τ ≤ 1. It is shown in Fig. 2 . Please note that s > 0 is the hinge point [34] . In (3), we replace the hinge loss function by the truncated pinball loss function. Accordingly, J (θ ) in (3) becomes To avoid the poor classification of the unlabeled samples, we also use the same constraint as described earlier in (4) . Now, putting in (6), we get Now, we represent each unlabeled sample as two instances labeled with both positive and negative classes. This leads to the creation of new samples [9] y i = +1, i ∈ [L + 1, . . . , L + U ], We can now split this function into convex (J convex (θ )) and concave (J concave (θ )) parts [34] : and To perform the minimization of J (θ ) with respect to θ = (w, b), we use the concaveconvex procedure (CCCP) [45] as given by Algorithm 1. CCCP decomposes the non-convex function into a concave and a convex part. It uses an iterative procedure where in each iteration, concave part is approximated by its tangent [9] . In Algorithm 1, J (θ ) represents ∂θ . The convergence of CCCP algorithm is given in [45] . The Concave-Convex Procedure (CCCP) [45] Input: J concave (θ ) and J convex (θ ) 1: Initialize θ 0 . 2: repeat 3: Next, we find the gradient of J concave (θ ) with respect to θ where Therefore, the problem (7) can now be stated [45] as the minimization of By introducing the slack variable, ξ in (13), we get our final minimization problem [34] min θ,ξ We solve (14) by using the stochastic gradient descent (SGD) method [7] given in Algorithm 2. To implement (14) using SGD, we require data set, from which we get the value of L and U . We also input λ, the learning rate of SGD and , the tolerance value required in the convergence of Algorithm 2. Since the CCCP algorithm converges fast [9] in maximum five iterations in our experiments, we consider T = 5 in all the algorithms. However, we also mention the convergence conditions in the algorithms (Step 12 in Algorithm 2, Step 13 in Algorithm 3 and Algorithm 4). The time complexity of Algorithm 2 is mainly due to the Step 2 and the conventional steps of SGD (Step 8 and Step 9). Step 2 is executed using the svmtrain() (LIBSVM) which has a time complexity of O(n 3 ) [12] . However, the time complexity of SGD is O(d/λ ) [9] , wherē d is used for the non-zero attributes of the data set, λ is the learning rate of SGD and is the tolerance value. Therefore, the overall time complexity of Algorithm 2 is O( . We also implement pin-TSVM using the dual form of (14) . is the tolerance value; L is the number of labeled instances in the data set D; U is the number of unlabeled instances in the data set D; λ 0 > 0 is the learning rate of SGD; Output: Optimal weight vector and bias term, w t and b t respectively. 1: Split the data set D into training set and test set. 2: Train SVM on training set and get w 0 and b 0 . 3: Initialize t = 0 and > 0. Compute the sub-gradient of (13) w.r.t w and b Set w t ←ŵ t and b t ←b t 11: end for 12: Compute β t+1 i using (12). 14: Set t = t + 1. To get the dual form of (14), we find the Lagrangian function where α i , ν i ≥ 0, for i = 1, 2, . . . , (L + 2U ). The necessary Karush-Kuhn Tucker (KKT) optimality conditions for (15) are For simplification, we define a new sample From (16), we get where β 0 = 0 and y 0 = 1. On putting the value of (21) in (15), we get On simplification, Now, adding and subtracting α 0 y 0 Simplifying (23) using (18) and (20), we get Considering the kernel matrix, K such that (25), we follow Algorithm 3 to find the optimal weight vector and bias term. We then use the weight vector and the bias term to find the sign(w T x + b) in case of linear vector. In Algorithm 3, we first train the SVM using svmtrain() function using LIBSVM whose time complexity is O(n 3 ), where n represents the number of instances in a data set [12] , and then we use mlcv_quadprog() [9] to implement Step 7 in Algorithm 3, whose time complexity is again O(n 3 ) [48] . Therefore, the overall time complexity of the Algorithm 3 is O(n 3 ). Similarly, we can also solve the dual optimization problem given in (25) using the nonlinear kernels. The Algorithm 4 shows the steps to be followed for non-linear kernels. Algorithm 3 pin-TSVM to get optimal weight vector and bias term (Linear Kernel) ; T is the maximum number of iterations; 1 and 2 are the tolerance values; L is the number of labeled instances of data set D; U is the number of unlabeled instances of data set D; Output: Optimal weight vector and bias term, w t and b t respectively. 1: Split the data set D into training set and test set . 2: Train SVM on training set and get w 0 and b 0 . 3: Initialize t = 0 and 1 , 2 > 0. 4: Compute β 0 i using (12). 5: Solve the convex optimization problem given by (25) . 8: Compute w using Set w t+1 = w. 10: Compute b using the following constraints; Compute β t+1 i using (12) . 13 : Set t = t + 1. 15: end while Algorithm 4 pin-TSVM to get Accuracy using Non-linear Kernel Solve the convex optimization problem (25) using α 0 , b 0 and SV initial . Findα and support vectors, S. 10: Compute b using the following constraints Compute β t+1 i using (12) . 13 : It is noteworthy that the time complexity of both the algorithms, Algorithm 3 and Algorithm 4 is same as we use mlcv_quadprog() [9] to implement both the algorithms. In this section, we report the results obtained by pin-TSVM on various data sets. We also compare our model with the standard SVM, TSVM and TSVM with ramp loss function (Ramp-TSVM). Firstly, we evaluate the model performance on synthetic data sets. We generate the two-dimensional synthetic data set of 100 samples with 50 samples for both positive and negative classes. We add a different amount of label noise in this data set to test the performance of the proposed method against the existing TSVM methods. To add k% noise to the data set, we switch the k% labels of the labeled training data from −1 to +1 and vice-versa. We consider k = 10, 15, 20 and 25 to add label noise in the synthetic data set. These results are reported in Table 2 . In Table 2 , the best accuracies are marked in bold. Note that we use the labeled set to train SVM since it is a supervised learning model. For the rest of the techniques, unlabeled test data is also used for training. It is noteworthy that for all the experiments, we use the weight adjusting parameters C and C * from the set {1, 2, 3, 4, 5} and {0.1, 0.2, 0.3, 0.4, 0.5}, Bold marked entries in Table 2 represent the best accuracy in a row respectively. We perform cross-validation on 10% of the training data to optimally select the value of C and C * . We observe that the proposed method, pin-TSVM-SG, shows better accuracy for most cases; however, the dual form of the proposed method lacks in terms of accuracy on this small synthetic data set. These experiments are performed on linear kernel. However, we provide the algorithms for both linear and non-linear kernels (see Algorithm 2, 3 and 4). Similar trends were observed with other kernels as well. In this subsection, we compare the performance of the existing TSVM techniques with the proposed technique on real-world data sets. We first report experiments on small real-world data sets and then we evaluate the techniques on large real-world data sets. Small real-world data sets are listed in Table 3 . In Table 3 , we arrange the data sets in the increasing order of instances. A short description of each data set is given as follows: -Sonar [4] : To classify two types of sonar signals: one bounced off a roughly cylindrical rock and those sonar signals which bounced off a metal cylinder. -Cleveland Heart [4] : To classify patients based on presence or absence of heart disease. -Haberman [31] : Classification based on the survival status of patients (who died within 5 years or patients who survived 5 years or longer) who had undergone surgery for breast cancer. Bold entries in Table 4 represent the best values of performance metrics in a row -WDBC [4] : Features describe the characteristics of the nuclei of the cell present in the images. It classifies if the case is benign or malignant. -Australian [4] : The task is to classify the applications approved for credit card. -Pima Indians: Based on the women living in Arizona and the task is to classify them as diabetic or non-diabetic. -CMC [4] : The task is to predict the contraceptive method choice of a woman based on her socio-economic and demographic characteristics. -Spambase [4] : To classify an email as spam or non-spam. To perform the experiments over these data sets, we divide the data sets into the ratio of 55:45, where 55% of data is used for training while the rest 45% of data is used for testing purposes. We only use 55% labeled data to train the model. In this way, we test the performance of all the methods with greater complexity. Therefore, we have 45% unlabeled data for training. We train the SVM model over the labeled training set (similar to the synthetic data set) using svmtrain() from LIBSVM [12] . We use the weight vector and the bias term obtained from training the SVM to train other models. To check the robustness of the proposed model, we add noise to the labeled training data. In this part of the work, we evaluate model's performance by adding 0%, 15%, and 30% noise in the data sets. To add k% noise to the data set, we change the k% labels of the labeled training data from −1 to +1 and vice-versa. For experimentation, we consider k = 0, 15 and 30 for real-world small and large data sets. We compare the performance of all these models based on accuracy, precision [3] and recall [3] . We also mention the computational time (in seconds) of all the methods. It is noteworthy that these experiments on small data sets are performed on a Lenovo laptop with Windows 10 operating system having 4GB RAM and RADEON graphics. All the codes are written in MATLAB and are available at https://github.com/manisha1427/ TruncpinTSVM. First, we report the results over data sets with 0% noise in Table 4 . The boldfaced accuracies, precision and recall values represent the best values corresponding to the data sets. We observe that the dual form of pin-TSVM perform better than rest of the techniques on most data sets. Note that for SVM and TSVM, we implement the dual forms of these techniques for small data sets only as the computational time depends on the number of examples [48] , so we cannot use it for large-scale data sets. We implement the primal form using SGD (Algorithm 1) for large real-world data sets. Next, we add 15% label noise to the data sets and report the results in Table 5 . The pin-TSVM-dual outperforms the other techniques in most cases. We also observe that the decrease in the accuracy after adding 15% noise in the data sets is also less for pin-TSVM-dual than the other techniques. We further increase the training data sets' label noise to 30% and report the results in Table 6 . In Table 6 , we observe that the pin-TSVM-dual still outperforms the rest of the methods in terms of accuracy, precision and recall. The method is also in close comparison to the TSVM in terms of computational time. In all the above tables, Tables 4, 5 and 6, the computational time for SVM is significantly less since we use only the labeled training set to train the model while in the rest of the techniques, we use labeled as well as the unlabeled set for training. We next perform experiments on large real-world data sets to compare the results of the proposed technique with the existing models. The data sets that are used here are listed in Table 7 . We implement the CCCP form of TSVM, Ramp-TSVM, and the proposed approach to perform these experiments. pin-TSVM is implemented using Algorithm 2. Bold entries in Table 5 represent the best values of performance metrics in a row Table 6 Comparison of various techniques with 30% noise in the real-world data sets Table 6 continued Data Sets Bold entries in Table 6 represent the best values of performance metrics in a row Please note that we run the models on the master node of the IIT (BHU), Varanasi server with 96GB RAM to perform these experiments. We report these results in Table 8 . For image data sets like CIFAR-10, we follow similar steps as in [9] . We obtain the feature set from these images. The number of instances and the number of attributes of this feature set are listed in Table 7 . • Description of the Data Sets listed in Table 7 -Banana [28] : This data set is based on the two types of banana classification based on its shape. -Page Blocks [4] : The task is to classify those page blocks of a document that has been detected by a segmentation process. -Musk (version2) [4] : To classify whether the new molecules are musk or not. -Cats vs Dogs [18] : To classify the new image as cat image or dog image. -CIFAR-10 [21] : CIFAR-10 data set has 60,000 color images of size 32×32 pixels. These images belong to ten classes. To use this data set, we extract features from the image data set. -MNIST [4] : MNIST database comprise of images of handwritten digits. The task is to identify the new digit based on the image. -Cover Type [4] : The task is to predict the forest cover type based on the cartographic variables [4] . It includes a total of seven classes marked as integer 1 to 7 in the data set. Similar to the experiments performed on small data sets, we add different levels of noise in these experiments also. For multi-class classification, we follow the one versus rest approach. From Table 8 , we observe that the proposed approach is close to the rest of the approaches for the noise-free data set. However, when we add noise to the data, the proposed approach outperforms significantly. Please note that the methods with empty values in Table 8 indicate that these methods have not produced any results in one month. We also compare the abovediscussed techniques with convolutional neural network (CNN) on image data sets. To compare the above-discussed techniques over the computational time, we choose a data set with the maximum number of instances, Forest cover type. The computational time of SVM is 6.67 × 10 3 minutes, TSVM is 6.81 × 10 5 , Ramp-TSVM is 2.82 × 10 4 and the proposed method is 4.51 × 10 3 minutes. In the training time of the proposed method, pin-TSVM is less than the others. As the proposed approach performs well on real-world data sets, we also apply this approach to detecting disease due to the presence of novel coronavirus in the human body. Based on chest X-ray images, we find whether a person is infected or not. Since assigning Steps to extract features from the COVID-19 data set and training pin-TSVM these labels manually is time-consuming and difficult, especially during this pandemic, we can predict the labels using our proposed robust semi-supervised learning framework, pin-TSVM. Bold entries in Table 9 represent the best accuracy in a row In this section, we discuss the use of the pin-TSVM model to predict if a person is infected by COVID-19. To do this, we train the model using the chest X-ray images of humans. It has been observed that the early detection of the disease with mild symptoms can help the patient in recovering from the disease. Therefore, it is required to detect the disease in its early stage. In this work, we use a semi-supervised machine learning model, TSVM, to detect the disease in humans using their chest X-ray images. Since labels come from human experts, and they do make mistakes, particularly in a pandemic like the situation where they are under considerable stress due to a large number of severe cases. We use the robust TSVM, pin-TSVM, to detect the presence of COVID-19 in a human body. We first create a data set using chest X-ray images of COVID patients, normal humans, and patients with bacterial infection. These images are shown in Fig. 3 . We use pre-trained VGG19 model to extract features from the images [29] . In VGG19, the feature extraction part is from the first input layer to the max-pooling layer. The rest of the part of VGG19 is used for classification purposes. VGG19 uses multi-channel array signals to generate images and hence, it is superior than other machine learning models in terms of classification [50] . Therefore, we use VGG19 for feature extraction. To perform the experiments on the COVID-19 data set, we follow the steps shown in Fig. 4 . In these experiments, we also switch some of the labels of the training data (as described earlier in Sect. 3) to test the robustness of pin-TSVM on the COVID-19 data set. Therefore, when a few labels in the training set are wrong, the task is to formulate a model that is robust enough such that it maintains its accuracy to some extent, i.e., degrades gracefully rather than catastrophically. pin-TSVM has proved its robustness through its performance on real-world data sets as discussed in the previous Sect. 3 . We use this model on the COVID-19 data set (having chest X-ray images of humans). We compute the results in two ways: directly using the features obtained by applying the VGG 19 model and extracting the essential features from this step using principal component analysis (PCA) [1] . The results are reported in Table 9 . We mention the accuracies and the computational time (in parenthesis) of the various techniques. The last column of Table 9 represents the value of C used in these experiments. Note that we use the same value of C and C * in these experiments. From Table 9 , we observe that the proposed model outperforms the existing techniques even after increasing noise in the data set. We can also use this method to assign labels to the unlabeled samples efficiently. In this paper, we proposed an improved and robust TSVM towards label noise in the data set. We used the truncated pinball loss function instead of the conventional hinge loss function to introduce robustness in this framework (Sect. 2). We implemented both the primal form and the dual form of the proposed technique. We used CCCP on the primal form and implemented it using SGD (see Algorithm 2) . The dual form is implemented using the mlcv_quadprog() function [9] in MATLAB (see Algorithms 3 and 4) . In this work, we provided algorithms for both linear and kernelized pin-TSVM. We compared our technique with the existing techniques on both the synthetic and real-world data sets. The proposed technique outperformed other techniques on the majority of the data sets. We also extended the use of pin-TSVM in the detection of coronavirus infected patients using their chest X-ray images. The proposed technique resulted in better accuracy, precision and recall even under the noisy environment. It is found that the method can be efficiently used to detect the coronavirus infected patients using their chest X-ray images. In continuation of this study, we will attempt to implement the proposed method on other real-world applications to find the missing labels. Principal component analysis Understanding of a convolutional neural network An exact analytical relation among recall, precision, and classification accuracy in information retrieval UCI machine learning repository Manifold regularization: A geometric framework for learning from labeled and unlabeled examples Semi-supervised support vector machines Large-scale machine learning with stochastic gradient descent Support vector machine applications in bioinformatics Large-scale robust transductive support vector machines Large-scale image retrieval using transductive support vector machines Survey on SVM and their application in image classification LIBSVM: A library for support vector machines Semi-supervised classification by low density separation Optimization techniques for semi-supervised support vector machines One-class SVM for learning in image retrieval Large scale transductive SVMs Survey of machine learning algorithms for disease diagnostic Improving shape deformation in unsupervised image-to-image translation Transductive inference for text classification using support vector machines TSVM: A Robust Transductive Twin support vector machines for pattern classification Cifar10-dvs: an event-stream dataset for object classification Semi-supervised learning for fault identification in electricity distribution networks Robust transductive support vector machine for multi-view classification A note on margin-based loss functions in classification Ramp loss least squares support vector machine Cancer classification through filtering progressive transductive support vector machine based on gene expression data Transductive attributes for ship category recognition Benchmark repository. Intelligent Data Analysis Group Pre-trained convolutional neural network features for facial expression recognition Enhanced transductive support vector machine classification with grey wolf optimizer cuckoo search optimization for intrusion detection system The network data repository with interactive graph analytics and visualization. In: Association for the Advancement of Artificial Intelligence Time series prediction using support vector machines: a survey A brief survey of machine learning methods and their sensor and IoT applications Support vector machine classifier with truncated pinball loss Robust statistics-based support vector machine and its variants: a survey Improved sparsity of support vector machine with robustness towards label noise based on rescaled α-hinge loss with non-smooth regularizer Robust twin support vector regression based on rescaled hinge loss Enhanced prediction of heart disease using particle swarm optimization and rough sets with transductive support vector machines classifier. Data Management Pattern recognition using generalized portrait method Vapnik V (2006) 24 transductive inference and semi-supervised learning On structural risk minimization or overall risk in a problem of pattern recognition A new transductive learning method with universum data A novel twin support-vector machine with pinball loss Improved transductive support vector machine for a small labelled set in motor imagery-based brain-computer interface The concave-convex procedure Automated classification of mammographic abnormalities using transductive semi supervised learning algorithm Robust feature selection algorithm based on transductive SVM wrapper and genetic algorithm: application on computeraided glaucoma classification Algorithm of ε-SVR based on a large-scale sample set: Stepby-step search L 1 -norm laplacian support vector machine for data reduction in semisupervised learning Multisignal VGG19 network with transposed convolution for rotating machinery fault diagnosis based on deep transfer learning Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations The authors truly appreciate comments and suggestions by anonymous reviewers that have resulted in substantially increasing the quality of the paper. The first author would like to acknowledge a research fellowship from the IIT (BHU) Varanasi. The authors declare that they have no conflict of interest.