key: cord-0626905-jeda37of authors: Goswami, Dipam; Aggrawal, Hari Om; Gupta, Rajiv; Agarwal, Vinti title: Urine Microscopic Image Dataset date: 2021-11-19 journal: nan DOI: nan sha: 9c5154832ca16267256ae6257d4f01cc139456fb doc_id: 626905 cord_uid: jeda37of Urinalysis is a standard diagnostic test to detect urinary system related problems. The automation of urinalysis will reduce the overall diagnostic time. Recent studies used urine microscopic datasets for designing deep learning based algorithms to classify and detect urine cells. But these datasets are not publicly available for further research. To alleviate the need for urine datsets, we prepare our urine sediment microscopic image (UMID) dataset comprising of around 3700 cell annotations and 3 categories of cells namely RBC, pus and epithelial cells. We discuss the several challenges involved in preparing the dataset and the annotations. We make the dataset publicly available. Analysis of urine sediment particles from microscopic images play a vital role in the diagnosis of urinary and kidney diseases. The constituents of urine samples are Red Blood Cells (RBCs), White Blood Cells (WBCs) referred to as pus cells, epithelial cells, casts, bacteria, crystals and other artifacts. A routine urine test involves chemical, physical and microscopic analysis of the sample. We are concerned with the microscopic analysis of the urine sediment. Routine microscopic analysis generally looks for the presence of RBCs, pus cells and epithelial cells. A urine sample may also contain some indeterminate objects as well as some out-of-focus objects and lens artifacts. The classification of cells is vital for the detection of diseases and its diagnosis. This work is aimed at providing medical support to medical centres across rural areas where there is a shortage of skilled lab technicians to analyze urine samples of patients. To the best of our knowledge, our UMID dataset is the first urine images dataset to be made publicly available to all researchers. Our dataset has been annotated under the supervision of medical professionals and have been cross-verified. We expect that the availability of the dataset will accelerate research in the urine sediment cell detection domain. Urine Sediment Datasets Various datasets have been used to study cell detection from urine sediments though not publicly available. Some of these studies like [1] are focused on classifying only RBC and WBC cells while other works [2] classified objects into erythrocyte, leukocyte, epithelial cell, crystal, cast, mycete, epithelial nuclei and noise, [3] used sub-classes of epithelial cells like low-transitional epithelium and squamous epithelial cells and [4] also considered bacteria and sperms. Some papers [4, 5] have highlighted the regular occurrence of WBC and RBC clusters in urine datasets but are limited to detecting the clusters [4] or predicting segmentation mask boundaries for RBC and WBC clumps [5] . Detecting individual cells from these cell clusters have not been studied yet and our work focuses to detect the densely packed cells using weak instance-level information. With improved precision in detection of cluster cells, the urine test reports can give a better count of the different cells leading to better diagnosis of diseases. In our dataset, we have observed RBCs, pus cells and epithelial cells as the main components. We have labelled the confusing and poorly focused components into a missedlabel category and we avoid these instances while training the deep learning models. A physician at our university hospital collected all urine microscopic images in the course of his patient diagnosis. He followed the normal routine procedure for data collection. His primary objective was to differentiate and count urine sediments manually from the images for diagnosis instead of acquiring good quality images for the dataset. Hence, the acquired images is not a best set of images that one would like to have to train a machine learning model in view of obtaining a remarkable performance. But, the acquired dataset aligns quite well with our objective. In this study, our goal is to build a neural network model that generalizes well and work satisfactorily even with low quality images. We explain image quality in our context later in this section. In practice, doctors and lab technicians does not follow the recommended protocols [6] very precisely due to the shortage of time specially in highly populated countries where the daily influx of patients is quite high. Moreover, microscopes does not get servicing regularly that leads to many flaws in the images. Instead of assuming ideal situation, we should design a robust learning framework that could sustain these issues up to some extent. All the automatic urine analyzers comes with a special microscope that is attached to the analyzers. Hence, they works well in practice. However, our long term goal is to develop a system that works with existing lab setup with only a minor addition of camera or a smartphone to one of the eyepiece. Due to the unavailability of proper infrastructure, the patient information is not stored anywhere. Hence, images are inherently anonymized at the source. Images are subject to bias of the doctor towards his acquisition habits; it is one of the limitation of our study. A routine urine diagnostic report mention the numbers of erythrocytes/RBC, leukocytes/pus, epithelial cells, and cast per high power field of a urine sample. Other urine sediment constituents such as crystals, bacteria, yeast cells, salts, and spermatozoa are not counted, but instead given as crosses [7] . Casts appears very rarely in a urine sample [7] . During our data collection duration, we did not get any urine sample with cast. Hence, in this study our focus is mainly to detect and classify only three types of urine sediments, RBC, pus, and epithelial cell. These three classes can be also sub classified. For example, the epithelial cells can further be classified into squamous and transitional epithelial cells. Though, these sub-classifications have clinical relevance, but they are hardly reported in a routine urine diagnostic report. Morphology RBC are round cells, contains no nucleus or granules, smaller than pus cells, and has relatively sharp boundaries. Based on the pH value of the urine, it is either biconcave, disc-shaped (pH = 6), thorn apple-shaped (pH < 6), or a light disc (pH > 6). They also sometimes appears as ring-shaped. Pus cells are generally larger than RBC cells. They are round in shape and has dark and granular surface. Pus cells could also have segmented nucleus. Pus cells lie individually and in clusters. Epithelial cells are irregular in shape, generally larger than pus cells, and has a small size nucleus. Sometimes, epithelial cells have two nucleus as well. These cells could also lie individually and clusters. We used two bright-field microscopes for acquiring images. Images from one microscope was of a lower quality than the other. Majorly, we observe a half circular ring artifact in the images. We could not find the exact reason behind the artifact, but we suspect that it is mainly because of misalignment between lens and the light source. In our dataset, the ratio of images are approximately 10:2 from old to new microscopes. We could acquire only a limited amount of images from the hospital due to the limited patients visiting the hospital due to the COVID-19 restrictions. The urine sample on a glass slide has a certain depth and the cells are present at different depths. Hence, we say that the urine slide has a multi-layer structure. Due to that, all the cells are not clearly visible within the depth of field of a lens focused at a particular focal plane. Hence, numerous focus adjustments are required to identify every cell in the sample. Moreover, microscope could examine only a certain area of the whole urine sample. Hence, it is recommended to examine 20-30 high power fields (HPF) for one urine sample and accumulate the findings to build the final diagnostic report. Certainly, it is a very time consuming task. We observed in practice that the doctors follow the recommendations only for a few sensitive cases. Otherwise, doctors spend only a few seconds to analyse one urine slide that covers approximately 10 HPF in a urine slide and adjust focus of the lens up to a certain level that is enough to differentiate the cell features and its size. Hence, the acquired images majorly have a certain amount of blur and hence, object edges are not very sharp and cell granularity deteriorates. There a few more limitations that originates from urine multi-layer structure. We observe that cells in an image can have a different degree of blur based on their depth with respect to the distance from the lens. Few cells could be highly out of focus that lies beyond the depth-of-field of the lens. This irregular blurring effect makes it very hard sometimes to differentiate cells even for an experienced doctor. To handle such cases, we introduced a missed label class. Both microscopes are the product of MAGNUS OPTO SYSTEMS INDIA PVT. LTD, model number CH20iBIMF. Eyepieces are with 10x magnification and 18mm field number and the achromatic objective lens are with 40x magnification. Each high power field is examined at 400×magnification (corresponding to an 10× eyepiece and a 40× objective). We connected a COSLAB 5 megapixel microscope digital camera (model number COSUSB5000) to one of the eyepiece of the microscope to acquire a digital image of the urine sample and store it on the desktop. The images are acquired at the lower resolution of 1280 × 720. The video feed to the desktop from the camera was quite slow at higher resolution and hence avoided by the doctor for image acquisition. If camera is attached to the microscope, doctors prefer to use video feed on the desktop to differentiate cells rather than directly observing through the eyepiece. The video feed provides more flexibility and easiness to scan the entire urine slides. Annotating a urine microscopic image is a challenging task. Optical blurring, overlapping of cells, low contrast, small cell size are some sources of difficulties. It is easy to overlook RBC cells due to their small size and very smooth surface that leads to very low contrast in the surrounding of the cell. Three authors of this paper including a doctor jointly annotated the cells in the images. We used Microsoft VoTT software for annotations. We primarily followed three strategies to annotate cells from which cluster and missed label strategies are novel approaches that we proposed in this work. These novel approaches are based on the properties of urine microscopic images that we describe in the next paragraph. Moreover, these approaches greatly reduces the overall annotation time. Doctors are generally over occupied with patients and have very busy schedule. Hence, reducing the annotation time greatly helped to get feedback from the doctors. Standard annotation strategy For a object detection and classification task, the standard approach is to train the neural network with respect to the ground truth of all the foreground objects present in the image. Each object is assigned a class and a bounding box that encodes the location and size of the object in a given image. We follow the same annotation strategy in our work, but only for a few cells that lie individually in the image. Novel missed label class In some circumstances, doctors could not make a reliable guess about the appropriate class of a cell. Hence, in theory, we can not place them neither in main classes nor in the background. If we treat them as a background, in principle, we are forcing network to learn a wrong classification. Instead of that, we place such cells in a separate missed label class and do not include members of this class during training and leave it to the network to predict the class for such cells. Point annotations for clusters In urine images, cells are also found in clusters. In clusters, cells are densely packed and overlapping with each other, hence, annotation is a difficult, puzzling, and a time-consuming task inside a cluster. The annotations can result in inconsistent bounding boxes. Hence, we avoid annotating each cell inside the clusters. We propose to specify a single point on the cell (near to its center approximately) instead of deciding boundaries and drawing boxes to localize the cells in a cluster. The size of cells can be estimated from points and these generated pseudo boxes can be used during training. Annotating a point is a much simpler task and reduces time, efforts, and dilemma. Training, testing, and validation set In the UMID dataset, we have 366 urine images of resolution 1280 × 720 that we divided into three sets for training (≈ 76%), validation (≈ 10%), and testing (≈ 14%); see Table ? ?. The division is based on the total 3733 annotated cells in the dataset instead of the images. The cell features are the primary source of learning. As discussed in the last section, cells inside the cluster are difficult to annotate; hence we annotate cells with boxes (outside clusters) and points (inside clusters) in the training dataset. But for testing and validation dataset, we purposely annotate cells lies inside clusters with boxes to evaluate the detection performance of the ML model. Generally, pus and epithelial cells form clusters; hence, we have more points annotations compared to the RBC cells. In total, 20% cells are annotated with points for which pseudo ground truth boxes are required for training. The dataset has approximately 44% RBC, 33% pus, and 23% epithelial cells. Though, the classes are slightly imbalanced, the performance of the ML model is not biased towards one class. We have approximately 11% cells for which the annotators were unsure about the correct class. For these cells only bounding boxes are drawn and label them to missed label class. These structures are not used during training the models. Urine sediment detection based on deep learning An end-to-end system for automatic urinary particle recognition with convolutional neural network Inspection of visible components in urine based on deep learning Research on urine sediment images recognition based on deep learning Improved extraction of objects from urine microscopy images with unsupervised thresholding and supervised u-net techniques Preanalytics of urine sediment examination: effect of relative centrifugal force, tube type, volume of sample and supernatant removal Urine Sediment