key: cord-1043428-oe4kl79k authors: Mun Li, Megan; Kuo, Tsung-Ting title: Previewable Contract-Based On-Chain X-Ray Image Sharing Framework for Clinical Research date: 2021-09-28 journal: Int J Med Inform DOI: 10.1016/j.ijmedinf.2021.104599 sha: 12cc671cbe60e119f77a66ce0c067e71608f10c8 doc_id: 1043428 cord_uid: oe4kl79k BACKGROUND: An image sharing framework is important to support downstream data analysis especially for pandemics like Coronavirus Disease 2019 (COVID-19). Current centralized image sharing frameworks become dysfunctional if any part of the framework fails. Existing decentralized image sharing frameworks do not store the images on the blockchain, thus the data themselves are not highly available, immutable, and provable. Meanwhile, storing images on the blockchain provides availability/immutability/provenance to the images, yet produces challenges such as large-image handling, high viewing latency while viewing images, and software inconsistency while storing/loading images. OBJECTIVE: This study aims to store chest x-ray images using a blockchain-based framework to handle large images, improve viewing latency, and enhance software consistency. BASIC PROCEDURES: We developed a splitting and merging function to handle large images, a feature that allows previewing an image earlier to improve viewing latency, and a smart contract to enhance software consistency. We used 920 publicly available images to evaluate the storing and loading methods through time measurements. MAIN FINDINGS: The blockchain network successfully shares large images up to 18 MB and supports smart contracts to provide code immutability, availability, and provenance. Applying the preview feature successfully shared images 93% faster than sharing images without the preview feature. PRINCIPAL CONCLUSIONS: The findings of this study can guide future studies to generalize our framework to other forms of data to improve sharing and interoperability. As the COVID-19 pandemic persists among us, it is crucial for healthcare institutions to share COVID-19 related data representing symptoms and side-effects to aid downstream processes that find and maintain the best prevention methods and treatments [1] [2] [3] [4] [5] [6] . An important type of COVID-19 related data to be shared are chest x-ray images [7, 8] , which can be investigated in pictorial reviews to determine prognostic COVID-19 pneumonia features and characteristics with more sample data [1, 2] or be used to build more generalizable machine learning or deep learning models such as Convolutional Neural Network (CNN) for COVID-19 detection [3, [9] [10] [11] . Therefore, there is a need for image sharing between medical institutions which require a trustworthy data interoperability framework that can share large amounts of data, ideally independent of a singular controller [12, 13] . Current public centralized image sharing mechanisms, such as hospital image databases or opensource image sharing websites, enable collaborative and shareable image repositories [14] . However, they present the possibility of having a single-point-of-failure as seen in Figure 1A . That is, any corruption or maintenance of the central repository would block access from other institutions to the medical images stored in the centralized server. ❌ Immutability ❌ Provenance ❌ Availability or pointers to the images on-chain such that when any site becomes unavailable, other sites can still access one another. However, only the stored hashed images or pointer to the images, and not the images themselves, receive the blockchain benefits of availability, immutability, and provenance. (C) Our solution handles large images by splitting and merging images, high viewing latency by scaling images for a preview feature, and code inconsistency by using a smart contract to provide availability/immutability/provenance to the code. To address the single-point-of-failure issue above, prior studies [15] [16] [17] [18] have proposed blockchain-based solutions, which rely on blockchain, a decentralized, distributed ledger based on peer-to-peer networks and various consensus algorithms [19] . Blockchain has been proposed for various applications such as for genomic data assess logging [20] , pharmaceutical supply chain [21] , and privacy-preserving predictive modeling on clinical research data [22] [23] [24] [25] because of its three main benefits: availability, immutability, and provenance to the data stored on-chain [26] First, the decentralized architecture of blockchain contributes to the continuous availability of medical images without a single-point-of-failure. Second, the block creation process generates an immutable audit trail (i.e., an unalterable ledger), which is crucial for storing medical images. Lastly, possessing traceable and verifiable records and transactions ensures legitimacy, which is important in medical image sharing so future procedures and findings using those images are valid [26] . However, existing proposals [15] [16] [17] [18] only store hashed medical images or pointers to the images on the chain rather than the images themselves ( Figure 1B) . Therefore, the images themselves are not highly available, immutable, and provable. Although storing the images directly on the blockchain can provide availability, immutability, and provenance to the images, several challenges ( Figure 1C ) still exist that could preclude existing proposals from adopting this solution: (1) Large image size. Blockchain platforms usually have a limit on the transaction size (e.g., Ethereum [27] can only support up to around 20 to 30 KB per transaction [28] ), which could be smaller than the size of the medical images. Hence, a mechanism to handle large-sized images is important. (2) Viewing latency. The block creation times of blockchain, when compared to traditional databases, may be slower and thus would hinder the ability to quickly access the images to be shared. Therefore, a way to quickly preview the images is desirable. (3) Code inconsistency. Although the data could be guarded from being altered by the blockchain, the software to store/load the images may be changed accidentally/maliciously and thus inconsistent across different healthcare institutions. Thus, it would be desirable if the computer programs are also immutable, provable, and highly available to improve the consistency. We aim to utilize the blockchain benefits of immutability, provenance, and availability while addressing the (1) large-sized image handling, (2) image viewing latency, and (3) code inconsistency issues that emerge from sharing images through the blockchain. Our image sharing framework will (1) handle large images, (2) reduce viewing latency, and (3) enhance code consistency. To achieve these three goals, we devised a framework with three corresponding components: (1) splitting and merging, (2) scaling and previewing, and (3) smart contract ( Figure 1C) . (1) Splitting and merging. To handle large images, we split images into smaller "image pieces" that are within the blockchain transaction size limit when storing the images, and then merge the pieces back into images when loading the images. (2) Scaling and previewing. To reduce viewing latency, we created, stored, and loaded "preview" images, which are descaled images for their corresponding image and thus allow users to quickly glance over preview images before the original image is stored and loaded (e.g., like the preview images on Internet websites). (3) Smart Contract. To improve code consistency across multiple sites, we developed a smart contract, which is a digital and immutable set of programs deployed on certain blockchain platforms such as Ethereum [29] , to store and load image pieces and preview images on the blockchain. The design of our framework is displayed in Figure 2 . Images are split when stored (Figure 2A) and images are merged when loaded ( Figure 2C ). Storing and loading is supported through a smart contract ( Figure 2B ). The details of the storing, smart contract, and loading parts are introduced in the following: Store Images Load Images Piece11_14 = (X modulo C) KB  Storing previews/images (Figure 2A) . The input of this step are the images uploaded, and the output are preview images and image pieces to be recorded on the blockchain. Each patient is stored in a patient structure with its patient ID as its unique identifier. If a patient has multiple images, these images can be stored at different times by using the patient's ID as an identifier to link the images to the patient. First, each image will be initialized as an image structure mapped to a patient structure using its filename as a key. We scale each image to be less than or equal to C KB (C = 30 in our experiments). Then, we split each image into C KB image pieces, where the sum of all these pieces is the size of the image. The scaled preview image and the image pieces will be stored on the blockchain.  Smart contract ( Figure 2B) . The input/output of this step are both the preview images and the image pieces recorded on the blockchain. We created a smart contract with the specifically designed data structures, Image and Patient, along with functions to store and load pieces and getter functions to retrieve information (Figure 3) . This allows us to store/load the preview images and image pieces while maintaining patient/image relationships.  Loading previews/images ( Figure 2C) . The input of this step includes the preview image and the image pieces retrieved from the blockchain, and the output are the preview images and the merged images. First, we load the preview image to allow fast glancing of the image on another site. Next, we load the original image by extracting its relevant image pieces. Finally, we merge these image pieces back together to form their original image. The architecture of our implementation is shown in Figure 4 . Based on prior review [30, 31] , we chose the platform, Ethereum [27] , because it executes smart contracts, is open-source, and is supported by a community [29, 32] . Ethereum has been adopted for medical applications such as medical records management [33] and gene-drug interaction data sharing [32] . We configured Ethereum as a private/permissioned blockchain [34] (i.e., can be joined by only allowed blockchain nodes/computers) to emulate the scenario of an early-stage image sharing platform where only few authorized institutions can participate in the blockchain network. Also, we adopted Clique [34] , a Proof-Of-Authority (PoA) consensus protocol [35] that is specifically designed for a permissioned blockchain. PoA is used instead of other consensus protocols like Proof-Of-Work (PoW) [32] because it can reduce extensive computational cost and energy (by assuming the nodes in the network are authorized participants already) when compared to the latter, thus can improve the sustainability of our proposed solution. We implemented our Smart Contract in Solidity 0.5.10 [36] in Remix IDE [37] and deployed it on Ethereum [27] . We coded off-chain processes in Java and used Web3j [38] to work on the Ethereum blockchain network. We set C to 30, where the size of each piece stored on-chain is at most 30 KB. We used two virtual machines to represent two medical imaging institutions, each We extracted n = 920 chest Table 1 . To understand the performance of our proposed method ("patient-level with preview"), we compared it with a variant without the preview feature ("patient-level without preview"). Additionally, to further investigate the extreme situation of "one image per patient", we removed the patient-image relationship to form another pair of methods ("image-level with preview" and "image-level without preview"). All the above comparing methods share the core functionalities of splitting, merging, and utilizing a smart contract. Also, we stored and loaded images sequentially by the patient IDs and then by the patient's image filenames. For each method, we measured their storing (i.e., time required to publish the whole original image to the blockchain), loading (i.e., time required to retrieve the whole original image from the blockchain), first-viewable storing, first-viewable loading, and total first-viewable times. The first-viewable storing time indicates the time taken for a researcher from a site to store the first "viewable" image (i.e., the preview image for methods with the preview feature, and the original image for methods without the preview feature). Similarly, the first-viewable loading time indicates the time taken for a researcher from another site to load the first "viewable" image. Finally, the total first-viewable time is the sum of the first-viewable storing and first-viewable loading times; this time represents how long it would take for a researcher to preview an image being stored by another researcher, and therefore is our main metric to compare the methods with/without the preview feature. We further conducted a paired two-sample t-test and calculated the Pearson Correlation Coefficient (PCC) for the two pairs of methods (i.e., between the two patient-level methods and between the two image-level methods). All times for the two patient-level methods are summarized in Table 2 , and all times for the image-level methods are listed in Table 3 The detailed comparison of the average total first-viewable times as well as the p-value and PCC of the two pairs of methods are shown in Figure 5 . The time improvement of "patient-level with preview" method over the "patient-level without preview" method has a p-value < 10 -9 and a PCC = 0.887, while the time enhancement of "image-level with preview" method over the "imagelevel without preview" method has a p-value < 10 -44 and a PCC = 0. 626. To understand the impact of the number of images per patient, we further analyzed our proposed "patient-level with preview" method ( Figure 6 ). In general, larger total image size of a patient, especially larger number of images per patient, lengthen the total first-viewable time. We have the following major findings: (1) Large image handling from splitting and merging. As the sizes of images increase, so does the length of time needed to store and load the images (Figure 6 ). We were able to store all images in the COVID-19 image dataset, where the largest image in our experiment was 18.5 MB (Table 1) . Furthermore, after stratifying first-viewable times by number of images per patient, the first-viewable times for patients with more images were lengthened despite having the same total image size (KB) as other patients. (2) Reducing viewing latency from preview feature. The preview feature increases storing and loading time; however, there is a significant reduction in the user viewing time, which is crucial in real-world applications. There is a 93.2% reduction in first-viewable time for patientlevel methods (Table 2E ) and 92.7% reduction in first-viewable time for image-level methods ( This increased traffic is relatively insignificant when compared to the  93% reduction in firstviewable time, which could represent the perceived daily usage experience by the researchers and the super users. (3) Enhancing code consistency from the smart contract. We were able to deploy a smart contract that ensures code immutability, provenance, and availability, as well as the images, stored on-chain. The limitations of our study include: (a) Blockchain Configuration. We have successfully applied this framework on a 2-node permissioned blockchain network as a proof-of-concept prototype. In a real-world clinical data research network such as pSCANNER [41] , there could be more institutions willing to participate in image data sharing. Also, other permissioned blockchain platforms such as Hyperledger Fabric [42] could be adopted in place of Ethereum. Hence, simulation with more nodes and with different blockchain may warrant investigation. (b) Image Scope. We were able to store/load images up to 18 images are yet to be included in our framework. Our results support the use of permissioned blockchain as a solution to share images through onchain image storage to provide immutability, availability, and provenance to the images themselves while addressing the challenges of on-chain storage. All images, including large images up to 18 MB, were handled by our splitting and merging method. The preview feature effectively resolved the issue of high viewing latency. Specifically, because the patient-level experiments suit real world applications to conserve patient-level data, finding that patient-level with preview was successful reinforces using blockchain along with the preview feature. In addition to image immutability, availability and provenance, the smart contract ensured code consistency. Although we only worked with the clinical data consisting of chest x-ray images related to COVID- Overall, this study supports the functionalities in blockchain-based methods that store data onchain, which can suit various healthcare needs such as mass image sharing between multiple institutions. Our contributions can be summarized as: (a) designing an image sharing blockchain that provides immutability, availability, and provenance to the images; (b) handling large images through a split/merge method, improving viewing latency through a preview feature, and enhancing software consistency through use of a smart contract; and (c) creating a framework generalizable to other image types to improve sharing/interoperability. What was already known?  Centralized repositories can share images but possess a single-point-of-failure.  Decentralized blockchain-based frameworks can share images but may not provide immutability, provenance, and availability to the images themselves. What did this study add to our knowledge?  Storing images on the blockchain allows blockchain to directly provide immutability, availability, and provenance to the images.  Splitting and merging images addresses the large image issue.  A preview feature in the blockchain based image sharing framework reduces viewing latency.  Using a smart contract allows blockchain to directly provide immutability, availability, and provenance to the code to improve its consistency. Chest CT manifestations of new coronavirus disease 2019 (COVID-19): a pictorial review The Clinical and Chest CT Features Associated With Severe and Critical COVID-19 Pneumonia Deep learning approaches for COVID-19 detection based on chest X-ray images A Comprehensive Review of the COVID-19 Pandemic and the Role of Deep learning and medical image processing for coronavirus (COVID-19) pandemic: A survey COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches Amith Khandakar COVID-19 Radiography Disease Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks Can AI Help in Screening Viral and COVID-19 Pneumonia? A deep learning approach to detect Covid-19 coronavirus with X-Ray images A need for open public data standards and sharing in light of COVID-19 Data sharing for novel coronavirus (COVID-19) The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository On the Design of a Blockchain-Based System to Facilitate Healthcare Data Sharing Med-PPPHIS: Blockchain-Based Personal Healthcare Information System for National Physique Monitoring and Scientific Exercise Guiding MedChain: A Design of Blockchain-Based System for Medical Records Access and Permissions Management Blockchain as a Foundation for Sharing Healthcare Data Research Handbook on Digital Transformations Blockchain Genomic Data Access Logging, Homomorphic Encryption on GWAS, and DNA Segment Searching Fit-for-purpose?' -challenges and opportunities for applications of blockchain technology in the future of healthcare Privacy-Preserving Model Learning on Blockchain Network-of-networks EXpectation Propagation LOgistic REgRession on permissioned blockCHAIN (ExplorerChain): decentralized online healthcare/genomics predictive model learning The Anatomy of a Distributed Predictive Modeling Framework: Online Learning, Blockchain Network, and Consensus Algorithm Fair compute loads enabled by blockchain: sharing models by alternating client and server roles Blockchain distributed ledger technologies for biomedical and health care applications A next-generation smart contract and decentralized application platform What's the Maximum Ethereum Block Size? #:~:text=This%20limit%20is%20known%20as,to%2030%20kb%20in%20size Blockchain-enabled smart contracts: architecture, applications, and future trends Comparison of Smart Contract Blockchains for Healthcare Applications Comparison of blockchain platforms: a systematic review and healthcare examples Benchmarking Blockchain-Based Gene-Drug Interaction Data Sharing Methods: A Case Study from the iDASH 2019 Secure Genome Analysis Competition Blockchain Track Using Blockchain for Medical Data Access and Permission Management Two-Tier Permission-ed and Permission-Less Blockchain for Secure Data Sharing Pbft vs proof-of-authority: applying the cap theorem to permissioned blockchain The Solidity Contract-Oriented Programming Language Web3j: Web3 Java Ethereum Dapp Covid-19 image data collection: Prospective predictions are the future Covid-19 image data collection pSCANNER: patient-centered Scalable National Network for Effectiveness Research Hyperledger fabric: a distributed operating system for permissioned blockchains Storage media for computers in radiology. The Indian journal of radiology & imaging Semi-automatic tool for segmentation and volumetric analysis of medical images Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session Introduction to the DICOM standard Robust Encryption of Quantum Medical Images Selective encryption techniques of JPEG2000 codestream for medical images transmission Fragile watermarking for copyright authentication and tamper The authors T-TK and MML would like to thank Lucila Ohno-Machado, MD, PhD and Jejo Koola, MD, MS, for very helpful discussions. The authors would like to thank Cyd Burrows-Schilling, MS for the technical support of the UCSD Campus AWS cloud network. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The use of the UCSD Campus AWS cloud network was supported by Michael Hogarth, MD.