A Self-Supervised Machine Learning Approach for Objective Live Cell Segmentation and Analysis Michael C. Robitaille1, Jeff M. Byers1, Joseph A. Christodoulides1, Marc P. Raphael*1 1 Materials Science and Technology Division, U.S. Naval Research Laboratory, Washington D.C. * Corresponding author: Marc.Raphael@nrl.navy.mil Abstract Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm’s initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user- adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery’s optical modality, magnification or cell type. Key Words: live cell imaging, segmentation, phenotyping, machine learning, unsupervised, classification Introduction Live cell phenotyping is an information rich experimental approach, capable of providing mechanistic insights into cell biology1,2, guiding drug development3 and elucidating disease pathologies4,5. The wealth of information available from live cell microscopy results from the fact that there are numerous optical modalities that can be integrated within a given experiment – from fluorescence imaging which provides spatio-temporal information on specific signaling pathways and organelles to label-free techniques such as phase contrast and differential interference contrast (DIC) which enable the visualization of whole cellular morphologies and dynamics. Each of these modalities provides its own outcome measures which can be viewed as static snapshots or dynamic variations within the four-dimensional space of x, y, z and time6. However, compared to genotyping - its synergistic partner technique - live cell phenotyping remains a far more subjective science. The generation of genomic sequencing data and its analysis can now be achieved autonomously by employing a combination of robotics and microfluidics for sample preparation and 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 machine learning algorithms for data collection and interpretation. In contrast, the extraction of quantitative information from live cell imagery by manual means is still commonplace in live cell microscopy, a fact which speaks to the human visual system’s adeptness at detecting small changes and low contrast features with high fidelity. But with automated live cell microscopes now able to collect high resolution imagery for days on end, the resulting data files can quickly grow to tens of gigabytes, leaving the analyst with an overwhelming amount of imagery to work through.7 Furthermore, if the analyst is not blinded to the experimental design, unconscious bias can creep into the data extraction process. Enter computational algorithms capable of extracting the relevant outcome variables from the imagery in an automated fashion.8-10 Broadly speaking, the algorithms are often classified as model based approaches (e.g. Cell Profiler)11, and machine learning algorithms, (e.g. U-Net, ilastik)12-14. Neither approach is completely autonomous when it comes to cell segmentation: model-based approaches require the manual tuning of multiple parameters, while machine learning requires the user provide curated data from which the algorithm is trained. Once tuned or trained, the software is able to process far more imagery than could be achieved manually - but there is still a human-in-the-loop. It is just that the manual contribution has been moved to the front end for training purposes and is then continuously reapplied by the algorithm. Algorithms that are tuned or trained at the onset can problematically miss relevant features as the cellular phenotypes or background characteristics evolve, inadvertently skewing the analysis. For instance, variations in label intensity (e.g. photobleaching, quenching) or new morphological features that were not present during the initial training (e.g. differentiation, mitosis, blebbing) can go undetected if not retrained with a freshly curated data set or parameters that capture the offending features. In the same way, temporal variations in the background illumination intensity or homogeneity can also result in improper cell segmentation.7 Especially concerning is that the user-supervised training process is inherently subjective in nature and can cause unconscious biases to be effectively baked in to the extracted data by the training process. To optimize objectivity and efficiency, an essential goal is to develop software that can accept imagery from any optical modality, labeled or unlabeled, and extract the cellular features of interest without input from the user. As participants in a synthetic biology real-time reproducibility project administered by U.S. Defense Advanced Research Projects Agency (DARPA), referred to as Independent Verification & Validation (IV&V), we have recently experienced all of these algorithmic limitations and how they can result in large amounts of data either being incorrectly segmented, subjectively segmented, or left unanalyzed due to time constraints.15 The program involves a wide range of cell types (amoeboid to eukaryotic) from multiple cell biology laboratories; multiple imaging modalities – both fluorescent and label free; and objective magnifications ranging from 10X to 100X. The cumbersome process of retraining supervised machine learning software to match this variety of conditions proved impractical and a human-in-the loop training step was deemed too subjective. The challenge then was to develop a completely automated segmentation algorithm for live cell microscopy applications. In particular, the image analysis software should be ‘self-supervised’, meaning it trains itself to classify cells versus background and then regularly updates this training so that it can adapt to evolving intensities and morphologies. The software was required to segment a variety of cell types from live cell imagery given the most common imaging modalities as inputs - phase contrast, transmitted light, DIC, fluorescence and interference reflection microscopy (IRM) – and to do so without user-adjustable parameters or user-selected training imagery. It was additionally required that the generated models adapt to changing cell phenotypes and lighting conditions for long-term imaging applications (hours to days). 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 Methods To replace more manual model based and machine learning training approaches for segmenting cells with an automated, self-supervised algorithm, we took advantage of the one phenotypic feature which is present in live cell microscopy no matter what the modality: motion. From the nanoscale diffusion of proteins and vesicles to the migration of cells that are tens of microns in length, the ever present dynamics captured by live cell microscopy make it ideal for applying optical flow (OF) algorithms designed to identify not just spatial intensity features in a given frame but also the variation or ‘flow’ of those features from frame to frame. The central assumption in optical flow algorithms is that the overall image intensity will remain constant if the time difference between frames is reasonably small.16 This leads to the following time-derivative constraint equation:   ( , , ) 0 0 d dx I dy I I I x y t dt dt x dt y t u v ∂ ∂ ∂ = → + + = ∂ ∂ ∂  ( , , ) 0 0 d dx I dy I I I x y t dt dt x dt y t u v ∂ ∂ ∂ = → + + = ∂ ∂ ∂ where 𝐼𝐼(𝑥𝑥, 𝑦𝑦, 𝑡𝑡) is the in-plane image intensity at time 𝑡𝑡, 𝑢𝑢 and 𝑣𝑣 being the optical flow in the x and y directions, respectively. The methods used to solve this constraint equation are matched with the imaging goal, such as reducing jitter in imagery taken from helicopters, aligning medical imagery or, in the case of this study, cell motion segmentation. In testing a range of optical flow algorithms for cell segmentation, we found the Farnebäck method to be the most robust due to its sensitivity to object deformation – a natural fit for cells which are morphologically variable.17,18 OF assumptions may or may not be met for fluorescence time-lapse imagery applications in which extended time intervals are sometimes employed to avoid phototoxicity or photobleaching.6,19 For this reason, it was important that our technique be co-validated with label free techniques such as transmitted light and phase contrast which are minimally invasive. Overlays of less frequently accumulated fluorescence imagery with cells segmented using a label-free imaging channel is then straightforward. Furthermore, there has been an increased appreciation for the morphological information label-free approaches can provide as a result of algorithmic-based phenotyping.20-22 Our approach to self-supervised learning and automated model generation begins with utilizing the Farnebäck OF method as a means of classification bootstrapping (Fig 1). Typical segmentation strategies involve utilizing static information in a single image at time frame (t), which can have difficulty distinguishing ‘cell’ from ‘background’ pixels in a generalizable manner (Fig 1a). In contrast, our approach begins with an OF calculation based on images from consecutive time frames (t-1, t). This enables us to leverage the ubiquitous nature of intracellular motion and build a dynamics-based feature vector: pixels with the highest flow are automatically labeled as ‘cell’ pixels, those with the lowest flow are automatically labeled as ‘background’ pixels, and those that do not fit either category remain unlabeled (Fig 1b,c). We note that this automatic self-labeling is broadly applicable in that it is not dependent on principles of any specific optical modality, cell type, or phenotype. The OF-based self-labeling approach outputs a set of ‘cell’ and ‘background’ labeled pixels which are then used to generate additional entropy and gradient feature vectors at each time point. These static feature vectors are used to train and generate a classifier model which, in the final step, is applied to all pixels in the image for cell segmentation. The algorithm is written in stand-alone MATLAB script and utilizes functions from the Image Processing, Statistics and Machine Learning, and Computer Vision Toolboxes. 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 Fig. 1 Overview of the optical flow self-labeling strategy. (a) The vast majority of cell segmentation techniques utilize single image frames and the static information contained within as means to distinguish ‘cell’ from ‘background’, oftentimes represented in a histogram. The self-supervised algorithm utilizes optical flow as a means to self-label pixels in an automated fashion. (b) Due to the prevalence of intracellular dynamics in time-lapse live cell imagery, optical flow can be calculated for each pair of consecutive images (𝑡𝑡 − 1, 𝑡𝑡). The optical flow can then be represented as vectors associated with each pixel (right). (c) The magnitude of the optical flow then offers a means to distinguish cells from their background, as shown in the bivariate histogram which co-plots the pixel intensity of a single image at t to the optical flow vector magnitudes calculated between consecutive images (𝑡𝑡 − 1, 𝑡𝑡). Pixels with the highest flow can be automatically labeled ‘cell’ (left of the green dashed line) and those with the lowest flow can be labeled ‘background’ (right of the yellow dashed line). Pixels that do not meet either criteria remain unlabeled, while the self-labeled pixels are used to create a training data set for classification. Time increment: 600 sec, scale bar = 20 µm. The self-supervised training approach is illustrated in Fig 2 using time lapse DIC imagery of multiple (top) and a single highlighted (bottom) MDA-MB-231 cell. From the raw imagery (Fig 2a,b), many portions of individual cells appear to blend in with the background. However, when the OF self-labeling strategy is applied, the algorithm automatically identifies pixels with high flow magnitude, highlighted as green pixels (Fig 2c,d), which are selected as having the highest probability of correctly being labeled ‘cell’. To automatically label the background, the algorithm over segments, that is, a liberal (low) OF threshold is employed which captures motion from not only the cell but also from nearby background pixels as well. The algorithm sets these pixel values to zero and labels the pixels in which no significant motion was detected as ‘background’ (Fig 2c,d yellow pixels). Once labeled ‘cell’ or ‘background’ in this unsupervised manner by OF (dynamic features from image pair (𝑡𝑡 − 1, 𝑡𝑡) ), entropy and gradient feature vectors (static features from image at t) are generated for each of these training pixels using their local neighborhood of pixels (S.I., Fig S2). These additional feature vectors are then used train and generate a Naïve Bayesian classifier model which is applied to the entire image in a pixel-wise fashion. The information gained from 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 the entropy and gradient feature vectors enables pixels which were left unlabeled in the OF training steps (Fig 2c,d grey pixels) to be classified. The contrast enhanced image (Fig 2b) and model-generated segmentation (Fig 2f, teal pixels) show that the algorithm is able to segment the cell with high fidelity (DIC image/segmented boundary overlay, Fig 2g). Importantly, this labeling, training and classifying procedure occurs recursively on each successive pair of (𝑡𝑡 − 1, 𝑡𝑡) images, enabling the classifier model to adapt to changing backgrounds and phenotypes. By using optical flow to label the highest flow pixels as ‘cells’ and lowest flow pixels as ‘background’, the labeling process has become automated (or ‘self-supervised’) and no manual inputs or training images are needed. For extremely low contrast imagery there can be too few training pixels labeled ‘cell’ for robust segmentation to occur given the initial OF threshold setting. In such cases, the algorithm calculates the entropy associated with ‘cell’ pixels and iteratively reduces the OF threshold until the associated ‘cell’ entropy feature vector is well distinguished from that of the ‘background’ entropy feature vector. Fig. 2 Overview of the automated self-supervised learning algorithm. a. The contrast enhanced DIC image of several and b a single highlighted MDA-MB-231 cell illustrates the range of intensities inherent within the cells. (20X objective). c. & d. Unsupervised learning via OF: high threshold OF is used to select only those pixels exhibiting the highest flow magnitudes and labels them as ‘cell’ (green pixels). Similarly, low threshold OF is used to identify pixels with a much wider range of flow magnitudes than the high flow regime. The lowest flow magnitude pixels are labelled ‘background’ (yellow pixels). Pixels that exhibit OF in between these regimes remain unlabeled (gray pixels). e. & f. Supervised learning via self-labeled training data. The self-labeled pixels (green and yellow) are then used to generate static feature vectors, which are in turn used to train the classifier model. g. The blue outline is the resulting segmentation which outlines all pixels classified by the OF trained model as ‘cell’ and is also overlaid on the image in b. This process is repeated at every time step, thereby using the most recent imagery to update the training data. Scale bar: 25 µm (20X objective, time increment: 300 sec). Results 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 The Fig 3 imagery shows the generality of this approach and also demonstrates how the self-supervised algorithm additionally automates commonly required manual inputs such as size filtering and hole filling. The segmented cells were processed from imagery acquired from a range of cell types, imaging modalities, magnifications and time increments (S.I. Table S1). The OF algorithm enabled a straightforward approach to automated size filtering which is a common user adjustable parameter in supervised machine learning approaches. To accomplish this, a stand-alone application of OF was applied to the imagery which lacked the added steps of self-tuning and model building described above. While some cell features are missed, this simpler, faster approach was found to be more than precise enough to estimate average cell size and to exclude much smaller objects, thus automating the size filtering process. Because extraneous debris often lacked the motion of the live cells, this debris was also automatically labeled as background by the OF algorithm. Fig 3a and b demonstrate the self-supervised code’s ability to size filter, while also adapting to cell types of differing sizes, by comparing the segmentation of human fibroblasts (10X, phase contrast) to those of the much smaller Dictyostelium amoeboid cells (10X, transmitted light), respectively. Extraneous debris features in the Hs27 imagery (Fig 3a, white arrows) are correctly identified as ‘background’, even though similar in size and intensity to the Dictyostelium cells of Fig 3b. The background inhomogeneities observed in Fig 3a and 3b, which could potentially be mislabeled as ‘cell’, are correctly identified because they remain relatively constant from frame 𝑡𝑡 − 1 to frame 𝑡𝑡. The segmentation results of the MDA-MB-231 cells (10X, phase contrast) in Fig 3c illustrates the algorithm’s ability to adapt to a wide range of phenotypes, from rounded Fig 3c(i) to spread Fig 3c(ii), which is enabled without need for user input by continuously retraining the model on consecutive image pairs. The current instantiation of the software does not attempt to separate cells that are touching or close enough to be segmented as a single object. Well-developed approaches such as watershed transforms23 and levelset methods24 can be employed for such purposes. The algorithm works robustly for a range of optical modalities and magnifications as shown in Figs 3d-f. Figs 3d and 3e are segmentation results from IRM imagery (40X, Hs27 cell) and DIC imagery (20X, MDA- MB-231). As a fluorescence imaging example, a self-supervised segmentation of a GFP-actin labeled A549 cell at 100X magnification is shown in Fig 3f. As an additional option, OF can be applied not only as an algorithm labeling element, but also a measurement tool, as shown in the Fig 3f vector plot. The plotted OF vectors (blue) display the magnitude and direction of the measured GFP labelled actin flow between frames. Such measurements have been shown to be useful for quantifying intracellular protein and calcium signaling dynamics.25-27 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 Fig. 3 Self-supervised segmentation for a range of cell types, microscope modalities, time resolutions and magnifications. a. phase contrast of Hs27 fibroblasts (10X objective, time increment: 1200 sec) b. transmitted light of Dictyostelium (10X objective, time increment: 60 sec) c. phase contrast of MDA-MB-231 (10X objective, time increment: 600 sec) d. IRM image of a single Hs27 cell (40X objective, time increment: 600 sec) e. DIC image of MDA- MB-231 cells (20X objective, time increment: 120 sec ) f. fluorescence image of a single lifeAct (GFP-actin conjugate) transfected A549 cell (pseudo-colored) with the associated optical flow vector plot (100X objective, time increment: 10 sec). Insets i, ii, iii highlight boxed image regions. White arrows point to examples of debris that was correctly labelled ‘background’ due either to lack of motion or automated size filtering. Images have been contrast enhanced to highlight low contrast features and background inhomogeneities. DIC image (e) was additionally enhanced with a 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 sharpen filter to highlight interference induced shadowing of cell features. Scale bars: a, b, c: 50 µm; d, e: 25 µm; f: 10 µm. Hole filling, another often required manual input for model-based and machine learning algorithms, has also been automated by this approach. Common examples of when hole filling input is required include fluorescent labels that do not penetrate the nucleus or, for label-free microscopy modes such as phase contrast, large spread cells in which the algorithm has a difficult time associating the interference enhanced cell edges with the enclosed lamellipodia. We found that motion within cells was ubiquitously detected by OF, regardless of imaging modality or whether imaging the cell membrane, nucleus or cytoplasm. Because motion detection was far more common than not for a given pixel within an area labeled ‘cell’, a fixed morphological blurring tool (circular with a radius of 5 pixels) was found to robustly hole fill regardless of cell type or microscope configuration. The calculated cell area was found to be invariant for a range of blurring tool radii (Fig S2). In all cases, the use of optical flow to identify motion and the 5 pixel radius blurring tool was sufficient to correctly fill in the cell. By re-training on every pair of consecutive images the self-supervised algorithm remains accurate throughout long-term imaging applications, despite changes in background or cell phenotypes. This allows for a rich behavior of dynamic morphology and migration to readily be collected and analyzed – a key point given the known inter-relationship between cellular shape and function.2,28,29 Furthermore, the emerging role that not just cell shape, but cell shape dynamics play in fundamental biological processes is becoming increasing clear.30 Fig 4 demonstrates how such quantitative morphological information is readily mined in a long-term imaging application. Fig 4a-c shows the tracking of several MDA-MB-231 cells segmented via the self- supervised approach under 10X phase contrast microscopy on cRGD functionalized gold coverslips.31 Fig 4a shows the labeled tracks of the cells’ centroids over the course of 400 minutes, with the corresponding initial and final image shown in Figs 4b,c. The cell associated with track 2 undergoes mitosis at approximately 320 minutes, creating two new tracks (5 and 6) for the daughter cells. Because the self- supervised approach automatically re-trains continuously on consecutive frame pairs, the morphological changes from Fig 4b to Fig 4c are quantified with high fidelity, as can be seen by plotting the segmented boundaries as a function of time (Fig 4d). 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 Fig. 4 Tracking of MDA-MB-231 cells under 10X phase contrast microscopy and time evolution of cell morphology through mitosis. a. The resulting tracks of multiple segmented cells from a single field of view over the course of 400 minutes b. corresponding images at times t = 0 min and c. 400 min. Track 2 undergoes mitosis resulting in tracks 5 and 6 of the daughter cells (blue line). d. (left) Time evolution of segmented morphology of track 2 (black) with the centroid of each shape denoted by an open circle until mitosis, after which the track splits into 5 (green) and 6 (blue), with the cell separation event denoted by a single red open circle. d. (right) selected images showing raw data overlapped with the self-supervised segmentation throughout mitosis event. (10X objective, time increment: 600 sec) Scale bar: 100 μm. 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 Discussion & Conclusions There are numerous advantages to this self-supervised machine learning approach. The most obvious is that because the training data is generated by tracking motion, the approach can be used with any live cell imaging microscopy technique, whether labeled or label-free. Also unique is the use of the optical flow labeled pixels to self-supervise the building of a classifier model, which in turn is modular with regards to the incorporated feature vectors. While we have employed only two feature vectors in this current instantiation of the classification code (gradient and entropy), there are many additional image features that can be added based on the application. We have also shown that the incorporation of OF enables the straightforward automation of morphological operations such as size filtering and hole filling, eliminating the need for manually tuning these parameters. The automation described here is markedly different from machine learning approaches that require user assisted training. The most time consuming aspect of model-based tuning and machine learning approaches is the training process. The process is one of trial and error, requiring retraining if the model’s performance is not deemed adequate. The complete automation of both the training and segmentation algorithms not only saves time but also removes the chances of unconscious bias from entering the training process. Because the training is conducted recursively with each new image, evolutions in phenotype and background structure over extended time periods are accounted for without the need for preprocessing. The sum of all these advantages is segmentation under a wide range of magnifications, time resolutions, cell types and optical modalities that is both automated and robust. This results in the ability to track cells for hours or days and quantify a range morphological and phenotypic features without the need for user input, thus having broad applicability throughout live cell microscopy. The crux of the introduced self- supervised approach relies upon using the dynamic information embedded in each pixel – motion characterized via optical flow – as an elegant means to self-label cells versus background in time-lapse imagery. While cellular dynamics has long been appreciated as information rich with regards to understanding cell function, our approach demonstrates that it also provides the means for robust segmentation – a foundational step for achieving quantitative and objective live cell analysis. Acknowledgements The authors gratefully acknowledge the Devreotes laboratory of Johns Hopkins University for the Dictyostelim discoideum cell line. M.C.R. gratefully acknowledges support from the National Research Council Research Associateship Program and the Jerome and Isabella Karle Distinguished Scholar Fellowship Program. Funding for this project was provided by the Office of Naval Research through the Naval Research Laboratory’s Basic Research Program and by the Biological Technology Office of the Defense Advanced Research Program Agency. Author Contributions Michael C. Robitaille: conceptualization, methodology, investigation, data curation, software, visualization, and writing. Jeff M. Byers: conceptualization, methodology, formal analysis, and software. Joseph A. Christodoulides: Resources, validation, and writing. Marc P. Raphael: conceptualization, funding acquisition, methodology, investigation, software, visualization, and writing. 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 Financial Conflicts of Interest The authors do not have any conflict of interests with this work. References 1 Caicedo, J. C., Singh, S. & Carpenter, A. E. Applications in image-based profiling of perturbations. Current Opinion in Biotechnology 39, 134-142, doi:10.1016/j.copbio.2016.04.003 (2016). 2 Cadart, C., Zlotek-Zlotkiewicz, E., Le Berre, M., Piel, M. & Matthews, H. K. Exploring the Function of Cell Shape and Size during Mitosis. Developmental Cell 29, 159-169, doi:10.1016/j.devcel.2014.04.009 (2014). 3 Zhou, X. B. & Wong, S. T. C. High content cellular imaging for drug development. Ieee Signal Processing Magazine 23, 170-174, doi:10.1109/msp.2006.1598095 (2006). 4 Zhong, J. et al. Persistent hepatitis C virus infection in vitro: Coevolution of virus and host. Journal of Virology 80, 11082-11093, doi:10.1128/jvi.01307-06 (2006). 5 Zhu, N. et al. Morphogenesis and cytopathic effect of SARS-CoV-2 infection in human airway epithelial cells. Nature Communications 11, doi:10.1038/s41467-020-17796-z (2020). 6 Skylaki, S., Hilsenbeck, O. & Schroeder, T. Challenges in long-term imaging and quantification of single-cell dynamics. Nature Biotechnology 34, 1137-1144, doi:10.1038/nbt.3713 (2016). 7 Caicedo, J. C. et al. Data-analysis strategies for image-based cell profiling. Nature Methods 14, 849-863, doi:10.1038/nmeth.4397 (2017). 8 Deep learning gets scope time. Nature Methods 16, 1195-1195, doi:10.1038/s41592-019-0670-x (2019). 9 Grys, B. T. et al. Machine learning and computer vision approaches for phenotypic profiling. Journal of Cell Biology 216, 65-71, doi:10.1083/jcb.201610026 (2017). 10 Moen, E. et al. Deep learning for cellular image analysis. Nature Methods 16, 1233-1246, doi:10.1038/s41592-019-0403-1 (2019). 11 Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biology 7, doi:10.1186/gb-2006-7-10-r100 (2006). 12 Al-Kofahi, Y., Zaltsman, A., Graves, R., Marshall, W. & Rusu, M. A deep learning-based algorithm for 2-D cell segmentation in microscopy images. Bmc Bioinformatics 19, doi:10.1186/s12859-018- 2375-z (2018). 13 Falk, T. et al. U-Net: deep learning for cell counting, detection, and morphometry (vol 16, pg 67, 2019). Nature Methods 16, 351-351, doi:10.1038/s41592-019-0356-4 (2019). 14 Sommer, C., Straehle, C., Kothe, U., Hamprecht, F. A. & Ieee. in 2011 8th Ieee International Symposium on Biomedical Imaging: From Nano to Macro IEEE International Symposium on Biomedical Imaging 230-233 (2011). 15 Raphael, M. P., Sheehan, P. E. & Vora, G. J. A controlled trial for reproducibility. Nature 579, 190- 192, doi:10.1038/d41586-020-00672-7 (2020). 16 Beauchemin, S. S. & Barron, J. L. The computation of optical flow. ACM Comput. Surv. 27, 433- 467, doi:10.1145/212094.212141 (1995). 17 Farneback, G. in Image Analysis, Proceedings Vol. 2749 Lecture Notes in Computer Science (eds J. Bigun & T. Gustavsson) 363-370 (2003). 18 Robitaille, M. C., Byers, J. M., Christodoulides, J. A. & Raphael, M. P. Robust Optical Flow Algorithm for General, Label-free Cell Segmentation. bioRxiv, 2020.2010.2026.355958, doi:10.1101/2020.10.26.355958 (2020). 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773 19 Schroeder, T. Long-term single-cell imaging of mammalian stem cells. Nature Methods 8, S30-S35, doi:10.1038/nmeth.1577 (2011). 20 Jaccard, N. et al. Automated Method for the Rapid and Precise Estimation of Adherent Cell Culture Characteristics from Phase Contrast Microscopy Images. Biotechnol. Bioeng. 111, 504-517, doi:10.1002/bit.25115 (2014). 21 Ounkomol, C., Seshamani, S., Maleckar, M. M., Collman, F. & Johnson, G. R. Label-free prediction of three-dimensional fluorescence images from transmitted-light microscopy. Nature Methods 15, 917-+, doi:10.1038/s41592-018-0111-2 (2018). 22 Vicar, T. et al. Cell segmentation methods for label-free contrast microscopy: review and comprehensive comparison. Bmc Bioinformatics 20, 25, doi:10.1186/s12859-019-2880-8 (2019). 23 Wang, M. et al. Novel cell segmentation and online SVM for cell cycle phase identification in automated microscopy. Bioinformatics 24, 94-101, doi:10.1093/bioinformatics/btm530 (2008). 24 Nath, S. K., Palaniappan, K. & Bunyak, F. in Medical Image Computing and Computer-Assisted Intervention - Miccai 2006, Pt 1 Vol. 4190 Lecture Notes in Computer Science (eds R. Larsen, M. Nielsen, & J. Sporring) 101-108 (2006). 25 Buibas, M., Yu, D., Nizar, K. & Silva, G. A. Mapping the Spatiotemporal Dynamics of Calcium Signaling in Cellular Neural Networks Using Optical Flow. Annals of Biomedical Engineering 38, 2520-2531, doi:10.1007/s10439-010-0005-7 (2010). 26 Delpiano, J. et al. Performance of optical flow techniques for motion analysis of fluorescent point signals in confocal microscopy. Machine Vision and Applications 23, 675-689, doi:10.1007/s00138-011-0362-8 (2012). 27 Lee, R. M. et al. Quantifying topography-guided actin dynamics across scales using optical flow. Mol. Biol. Cell 31, 1753-1764, doi:10.1091/mbc.E19-11-0614 (2020). 28 Meyers, J., Craig, J. & Odde, D. J. Potential for control of signaling pathways via cell size and shape. Current Biology 16, 1685-1693, doi:10.1016/j.cub.2006.07.056 (2006). 29 Rangamani, P. et al. Decoding Information in Cell Shape. Cell 154, 1356-1369, doi:10.1016/j.cell.2013.08.026 (2013). 30 Akanuma, T., Chen, C., Sato, T., Merks, R. M. H. & Sato, T. N. Memory of cell shape biases stochastic fate decision-making despite mitotic rounding. Nature Communications 7, doi:10.1038/ncomms11963 (2016). 31 Robitaille, M. C. et al. Problem of Diminished cRGD Surface Activity and What Can Be Done about It. Acs Applied Materials & Interfaces 12, 19337-19344, doi:10.1021/acsami.0c04340 (2020). 105 and is also made available for use under a CC0 license. (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425773doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425773