key: cord-0150353-y5a2n9xp authors: Cao, Xu; Chen, Zijie; Lai, Bolin; Wang, Yuxuan; Chen, Yu; Cao, Zhengqing; Yang, Zhilin; Ye, Nanyang; Zhao, Junbo; Zhou, Xiao-Yun; Qi, Peng title: VeniBot: Towards Autonomous Venipuncture with Automatic Puncture Area and Angle Regression from NIR Images date: 2021-05-27 journal: nan DOI: nan sha: 02337abae350a54df4963af886f2383d680acea3 doc_id: 150353 cord_uid: y5a2n9xp Venipucture is a common step in clinical scenarios, and is with highly practical value to be automated with robotics. Nowadays, only a few on-shelf robotic systems are developed, however, they can not fulfill practical usage due to varied reasons. In this paper, we develop a compact venipucture robot -- VeniBot, with four parts, six motors and two imaging devices. For the automation, we focus on the positioning part and propose a Dual-In-Dual-Out network based on two-step learning and two-task learning, which can achieve fully automatic regression of the suitable puncture area and angle from near-infrared(NIR) images. The regressed suitable puncture area and angle can further navigate the positioning part of VeniBot, which is an important step towards a fully autonomous venipucture robot. Validation on 30 VeniBot-collected volunteers shows a high mean dice coefficient(DSC) of 0.7634 and a low angle error of 15.58{deg} on suitable puncture area and angle regression respectively, indicating its potentially wide and practical application in the future. Nowadays, venipuncture is the first step of clinical practice and has been exclusively performed in both routine examination and emergency. Statistics show that 1.4 billion times per year of venipuncture has been performed in the United States [1] . In the standard settings, after visually locating a proper superficial vein in the antecubital area, a needle is used to pierce the skin and introduced into the lumen parallelly to the longitudinal axis of vein. Once venous blood is extracted, the establishment of venous access is accomplished and further procedures become practicable [2] . It is not a very complex procedure, however, reports have shown that the success rate of venipuncture could be < 50% [3] . Venipuncture is difficult even for professional clinicians, for elder people, infants, or people with shock state. Complication and infection also come along with venipuncture, for example, hematoma, phlebitis, artery or nerve damage [4] , [5] , [6] , irborne diseases like corona virus [7] and hepatitis and human immunodeficent virus (HIV) [8] . Hence, venipucture is a procedure that takes time, manpower and experience, which is in an essential need to be automated. This work is supported by the National Natural Science Foundation of China (Number 51905379) and Shanghai Science and Technology Development Funds (Number 20QC1400900) 1 Xu Cao, Zijie Chen, Zhengqing Cao, Zhilin Yang, Yu Chen, Yuxuan Wang and Peng Qi are with Tongji University, Shanghai, China. pqi@tongji.edu.cn 2 Bolin Lai is with PingAn Technology Co. Ltd., Shanghai, China. 3 Nanyang Ye is with Shanghai Jiao Tong University, Shanghai, China. 4 Junbo Zhao is with Zhejiang University, Hangzhou, China. 5 Xiao-Yun Zhou is with PAII Inc., MD, USA. * corresponding author. A few robotic platforms have been developed, i.e., (1) Veebot, which includes a puncturing part and a robotic arm, is navigated by a infrared light and ultrasound, needs a medical staff to assist in attaching the appropriate test tube or intravenous bag (IV bag) [9] ; (2) Venouspro from Vasculogic, which is smaller and more portable than Veebot, includes a six degree of freedom (DoF) positioning part and a three DoF distal manipulator, is navigated by 3D NIR and ultrasound [10] ; (3) Anatomical structure tracking system, it controls a wireless ultrasound scanner to scan the vein with a robotic arm (KUKA LBR Med, KUKA AG, Germany) [11] . In this paper, we develop a compact venipucture robot -VeniBot. It includes a imaging, puncturing, positioning and supporting part. We also develop an automatic positioning algorithm to automatically find the suitable puncture areas and their longitudinal angle on the NIR image. For the problem of determining the suitable puncture area and angle from the NIR image, traditional methods usually utilize features, i.e., the location, size and curvature of vein sections. However, it is inevitable for experts or experienced clinicians to build a specialized feature extractor and classifier according to clinical experience. Recently, the development of deep convolutional neural network (DCNN) can extract and classify the features automatically with multiple non-linear modules [12] . Among DCNN tasks, including image classification [13] , object detection [14] , etc., segmen-tation approaches our aim most. Segmentation related DCNN works at early stage usually take advantage of the sliding window-based method, i.e., Deepmedic [15] . However, the sliding window-based method wastes a lot of computational resources in repeatedly computing the network activation of overlapping regions between image parts. Hence, they were replaced later by fully convolutional network (FCN) [16] . On medical image or volume, UNet [17] [18] , which is a typical FCN, has shown outstanding performances and is used in this paper as the basic network structure. Due to the fact that, suitable puncture area and angle determination is a highly abstract task, pure and simple segmentation can not achieve reasonable performance, as shown in Sec. IV. In this paper, we propose a new Dual-In-Dual-Out network based on ResNeXt50-UNet, which fully explores the advantage of two-step learning and two-task learning, to fulfill the automatic puncture area and angle regression, hence to achieve an automatic positioning part. In the following sections, we introduce the hardware design of VeniBot in Sec. II-A and the automatic algorithm of determining puncture area and angle in Sec. II-B, illustrate the experimental setup in Sec. III and show the results in Sec. IV. A. Hardware Design of VeniBot VeniBot includes four parts: (1) the imaging part which holds the NIR and ultrasound device; (2) the puncturing part which accesses the vein; (3) the positioning part which delivers the puncturing part to suitable puncture areas; (4) the supporting part which holds the imaging, puncturing and positioning part. A visual illustration of the solidworks (CAD) design of VeniBot is shown in Fig. 2 . The workflow of VeniBot is that, first, the positioning part delivers the puncturing part to suitable puncture areas; second, the puncturing part accesses the vein. In this paper, we mainly focus on the automation of the positioning part while the automation of the puncturing part will be worked on in the other IROS submission. On top of the supporting part, we assemble the positioning part which is with four DoFs, as indicated in the light brown color in Fig. 2 . The main function of positioning part is to deliver the puncturing part to suitable puncture areas, including the puncture position and angle in the xOy plane, under the navigation of NIR images. Four motors consist the positioning part where three of them (motor 1, 2, and 3) are with one DoF of translation along the y, z, and x axis respectively and one of them (motor 4) is with one DoF of rotation along the z axis. A NIR device (Projection Vein Finder VIVO500S from Shenzhen Vivolight Medical Device & Technology Co., Ltd) is mounted at the base support on the imaging part. It scans the forearm of a volunteer, and from its NIR image, the proposed Dual-In-Dual-Out network in Sec. II-B automatically regresses the suitable puncture area and angle in the xOy plane. Then motor 1 and 3 move along the y and x axis under the guidance of xOy coordinates and angle of the puncture area. As shown in Fig. 3 , Motor 4 rotates along the z axis under the guidance of puncture angle. After motor 1, 3, and 4 move to the target xOy position, motor 2 moves downward along the z axis, until the ultrasound probe mounted on the imaging part touches the volunteer's skin properly and hence a clear and normal ultrasound image can be collected for navigating the puncturing part further. The bottleneck in the automation of positioning part is the determination of suitable puncture area and angle from the NIR images, from which, VeniBot can deliver the puncturing part to the target site easily. We randomly select six NIR images as examples and show the vein segmentation and suitable puncture areas in Fig. 4 . First, different from the existing research on artery-vein segmentation in fundus images [19] , the forearm vein segmentation faces noise caused by injury positions, dark skin blemishes, and hairs on the skin, as shown in the yellow boxes in Fig. 4 -top. Second, as shown in Fig. 4 bottom, the suitable puncture area is a straight, large diameter and long venous area. While the red boxes in Fig. 4 area is very difficult, as it is too close to a segmentation task and distinguishing between segmenting all or part of the vein is difficult for a DCNN. Moreover, to learn the abstract semantic information, such as the birfurcation and tortuosity of vein, is tough for a DCNN. This is illustrated in the results of traditional Single-In-Single-Out and Single-In-Dual-Out network in Sec. IV, where both networks can not output suitable puncture areas properly. Suitable puncture angle determination is even more difficult than the suitable puncture area determination, as it is more abstract and difficult for a DCNN to learn. This is illustrated in Sec. IV where a traditional Single-In-Single-Out network can not output any useful angle information. In this paper, we propose a Dual-In-Dual-Out network with two-step learning and two-task learning to determine the suitable puncture area and angle from the NIR image inputs. A visual illustration of the proposed network is shown in Fig. 5 . It contains two steps of training: first, it trains a Single-In-Single-Out network to segment the vein from the NIR image; second, it inputs both the NIR image and the vein segmentation from the first step training into the Dual-In-Dual-Out network to regress the suitable puncture area and angle. Considering the population of UNet [17] , [20] and ResNeXt [21] , we choose the UNet model with the backbone of ResNeXt50 to fulfill the basic network architecture, details of each layer are shown in Tab. I. In the first step of training, binary cross entropy (BCE) loss is used as the loss function, while DSC is used to evaluate the segmentation accuracy. Considering that our input image size is 128 × 208 (208 is not divisible by 32), we omit the first Max-pooling layer in a standard ResNeXt50. We follow the idea of UNet -copy and crop both the first and second block to the corresponding up-sampling block. In details, the encoder, which consists of multiple convolutional and down-sampling layers, can extract features at different scales from the input image. While the decoder, which consists of multiple convolutional and upsampling layers, can recover the size of feature map to the same size as the original input image. In the second step of training, we use regression instead of segmentation to deal with the suitable puncture area and angle determination, as the segmentation strategy always outputs the concrete vein segmentation rather than abstract puncture area and angle prediction. For the output feature We binary the output of ResNeXt50-UNet network with choosing the threshold based on experience. L2 distance is used as the loss fuction. DSC is also used as the evaluation for the suitable puncture area regression. Except the proposed Dual-In-Dual-Out network, we also tried a few traditional methods, we list the details of each as below: 1) Single-In-Single-Out network: To predict the suitable puncture area and angle, the most common idea is to train a ResNeXt50-UNet to output the suitable puncture area and angle directly. However, this Single-In-Single-Out network can not perform well on the suitable puncture area regression and fail totally on the sutiable puncture angle regression in our experiments, because the feature of 'proper for puncture' is too abstract for a Single-In-Single-Out network to learn. So we turn to learn abstract features though two-step learning and two-task learning. 2) Single-In-Dual-Out network: We found that directly train a Single-In-Single-Out network to output the suitable puncture angle may lead to the non-convergence of the network. Also, we found that the Single-In-Single-Out network performs well in segmenting the vessel from the NIR image. Considering that there may be a strong correlation between vessel segmentation and suitable puncture area/angle regression, we try to let the network both output the segmentation result and the puncture area/angle regression. We hope that this two-task learning with shared variables for dual outputs can help the network convergence and ensure a promising result of puncture area/angle regression. We try to modify the model on the Single-In-Single-Out network. At first, we only change the last layer from one output to two outputs. However, too many shared variables cause the two outputs to be indistinguishable. Hence, we modify the scheme so that the model is divided into two parts from the last third block. The result of Single-In-Dual-Out network is better than the Single-In-Single-Out network. 3) Dual-In-Single-Out network: The two-task learning achieves some improvements. However, the accuracy is not high enough for medical usage. We further include twostep learning from concrete to abstract, and design a Dual-In-Single-Out network. One input is the ResNeXt50-UNet output of the vein segmentation before the sigmoid layer, which provides the information of vessel. Another is the original input image, providing complete information which may be lost during the vein segmentation. This Dual-In-Single-Out network shows a competitive result. 4) Dual-In-Dual-Out network: Inspired by the success of two-step learning and two-task learning, we further propose a Dual-In-Dual-Out network, where both the original image and the vein segmentation are used as the input. To save the training times, we set the two-task as suitable puncture area and angle regression respectively. Hence, train one Dual-In-Dual-Out network regresses both the suitable puncture area and angle. To verify the performance of the proposed VeniBot and Dual-In-Dual-Out network, we conduct experiments on 30 volunteers, including 14 females and 16 males, and with the age between 18-60 years old. All the images are collected by the proposed VeniBot and the NIR device on it. The data collection process is demonstrated in Fig. 6 . In total, we collected 900 NIR images, with 30 images of the forearm vein per volunteer. Because the device is not equipped with a memory card, we added a composite video broadcast signal (CVBS) video transmission connector on the NIR device and established a connection through CVBS to high definition multimedia interface (HDMI) signal conversion 1 , hence realizing the collection and storage of NIR vein images. The ground truth (GT) of vein segmentation was labeled by self-defined labeling pipeline which mainly consists of operations, such as Gaussian filtering, eroding-dilating, brightness adjustment, histogram normalization, Hessian feature detection and binarization etc.. By manually adjusting the parameters image by image, the optimal segmentation GT of each NIR image is obtained. The suitable puncture area GT was labeled manually by erasing areas that are not suitable for puncture, such as vein bifurcations, large curvature vein sections, veins sections close to the imaging edge of NIR camera, and short vein sections. Since the suitable puncture areas are straight and without obvious curvature or vein bifurcation, generally they are spindle-shaped. We performed ellipse fitting function of OpenCV on each vein section and got the corresponding angle. 2 As shown in Fig. 7(b) , the angle value is not continuous at the junction of 0 • and 180 • , which may cause unstable training of neural networks. Hence, we calculate continuous angles by: where γ is the longitudinal axis angle of the fitted ellipse, θ is the continuous angle and is shown in Fig. 7 . In addition, θ can not distinguish between the clockwise and counterclockwise angle relative to the x-axis. Hence the difference between y-value of point A and B is used to determine the clockwise and counterclockwise of θ . In order to avoid over-fitting and to promote the generalization ability of the model, we propose to regularize the networks by adopting rich data augmentations, mainly including two kinds, that is, spatial data augmentation and intensity data augmentation. Spatial data augmentation includes random resize and crop, random horizontal and vertical flip, random rotation. Intensity data augmentation includes random brightness, contrast and saturation adjustments within the specified range. Details are listed in Tab. II. Including baselines, all experiments were carried out with a Win10(64-bit) computer, which is equipped with Intel i7-8750 CPU @2.2 GHz, 16 GB DDR4 memory and 4GB NVIDIA GTX 1050Ti GPU. All networks are built based on Pytorch. Adam [22] with the weight decay as 10 −5 was adopted to optimize the model. The batch size is 2. The epoch is 5, that is, 1575 iterations. The training process took around 1 hour on a NVIDIA GTX 1050Ti GPU. The learning rate was set as 10 −3 at the beginning of training. Besides, during the experiments, we found that iterations required for model convergence was generally lower than 1000, so the validation interval in the training stage was selected as relative small, that is, 20 iterations in the second step training of Dual-In-Single-Out and Dual-In-Dual-Out network and 50 iterations in all other model training. In addition, ReduceLROnPlateau was adopted to schedule the learning rate with the decay factor as 0.5 and patience as 5. For the vein segmentation, the learning rate was scheduled according to the DSC of each validation. While for suitable puncture area and angle regression, the learning rate was scheduled according to the mean square error (MSE) of each validation. Hence, Single-In-Single-Out, Single-In-Dual-Out, Dual-In-Single-Out, and Dual-In-Dual-Out are compared for the suitable puncture area and angle prediction, while only Single-In-Single-Out network is used on the vein segmentation. For the segmentation of vein from NIR image, a Single-In-Single-Out network is used. The mean and std DSC and several visual segmentation results are shown in Tab. III and Fig. 8 , respectively. We can see that, even with only a simple Single-In-Single-Out network 3 , a high mean DSC of 0.767 and reasonable vein segmentation results are achieved. This vein segmentation result is used as an additional input to improve the following suitable puncture area and angle regression. For the regression of suitable puncture areas, Single-In-Single-Out, Single-In-Dual-Out, Dual-In-Single-Out, and Dual-In-Dual-Out network are used. The mean and std DSC are shown in Tab. III. For Single-In-Single-Out vs. Dual-In-Single-Out, and Single-In-Dual-Out vs. Dual-In-Dual-Out, we can see that, adding the vein segmentation as an additional input improves the performance of suitable puncture area regression significantly by almost 0.2 DSC. While for Single-In-Single-Out vs. Single-In-Dual-Out, and Dual-In-Single-Out vs. Dual-In-Dual-Out, we can see that, adding the vein segmentation or suitable puncture angle regression as an additional output improves the performance of suitable puncture area regression slightly by almost 0.01 DSC. Five examples of the suitable puncture area regression by the four methods are shown in Fig. 9 . We can visually see that both the Dual-In-Single-Out and Dual-In-Dual-Out network can distinguish between the suitable and nonsuitable puncture area better than the Single-In-Single-Out and Single-In-Dual-Out network, indicating the importance and value of bringing the vein segmentation into the network's input. For the regression of suitable puncture angle, Single-In-Single-Out, Single-In-Dual-Out, Dual-In-Single-Out, and Dual-In-Dual-Out network are used. The mean and std angle error are shown in Tab. IV. Among the four methods, the Single-In-Single-Out network fails to predict any useful angle information. From the results of Single-In-Dual-Out, Dual-In-Single-Out, and Dual-In-Dual-Out network, we can see that, similar to the trend in suitable puncture area regression, both the Dual-Input and Dual-Output strategy improve the performance of suitable puncture angle regression notably. Overall, these results indicate the great value and advantage of the two-step learning and two-task learning in the proposed Dual-In-Dual-Out network. In this paper, we build a compact venipuncture robot -VeniBot, and propose a novel Dual-In-Dual-Out network for suitable puncture area and angle regression. The Dual-In-Dual-Out network builds a pipeline that includes two models, that is, vein segmentation and puncture area/angle regression. It enables the end-to-end determination of vein, puncture area, and puncture angle simultaneously. We evaluate it on a newly VeniBot-collected NIR dataset and it outperforms the other three baselines with remarkable margins. This paper focuses on the automation of positioning part, while the automation of puncturing part is introduced in another submission. In the future, we will work on integrating these two parts, formulating a fully-automatic VeniBot system. Ce: Continuing education article vascular access management 1: An overview A practical guide to venepuncture and management of complications Methods of obtaining peripheral venous access in difficult situations Phlebotomy-related lateral antebrachial cutaneous nerve injury Incidence, severity and risk factors of peripheral intravenous cannula-induced complications: An observational prospective study The incidence and risk of infusion phlebitis with peripheral intravenous catheters: A meta-analysis Vascular access in covid-19 patients: smart decisions for maximal safety Worldwide prevalence of occupational exposure to needle stick injury among healthcare workers: A systematic review and meta-analysis Profile: Veebot drawing blood faster and more safely than a human can Deep learning robotic guidance for autonomous vascular access Robot-assisted ultrasound-guided tracking of anatomical structures for the application of focused ultrasound Deep learning Imagenet classification with deep convolutional neural networks Fast r-cnn Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation Fully convolutional networks for semantic segmentation U-net: Convolutional networks for biomedical image segmentation Learning dense volumetric segmentation from sparse annotation Artery-vein segmentation in fundus images using a fully convolutional network Normalization in training u-net for 2-d biomedical semantic segmentation Aggregated residual transformations for deep neural networks Adam: A method for stochastic optimization