key: cord-0865576-v079kur6 authors: Amaro, Rommie E.; Mulholland, Adrian J. title: A Community Letter Regarding Sharing Biomolecular Simulation Data for COVID-19 date: 2020-04-07 journal: J Chem Inf Model DOI: 10.1021/acs.jcim.0c00319 sha: 312953a329b3a1103beb30c8480d053e283ca769 doc_id: 865576 cord_uid: v079kur6 There is an urgent need to share our methods, models, and results openly and quickly to test findings, ensure reproducibility, test significance, eliminate dead-ends, and accelerate discovery. Sharing of data for COVID-19 applications will help connect scientists across the global biomolecular simulation community, and also improve connection and communication between simulation and experimental and clinical data and investigators. We, as a community, commit to the following principles and offer our support to others already working on open data efforts in the hope that others working on COVID-19 in biomolecular simulation and other areas will adopt similar best practices. • We commit to making results from our work on the SARS-CoV-2 virus available as preprints as quickly as possible, using preprint servers such as arXiv, bioRxiv, and ChemRxiv, and open access data repositories such as Zenodo. • We commit to making available the input files, model building/processing scripts (e.g., Jupyter notebooks) required to set up, run, and analyze the simulations, and data necessary to repeat analysis upon deposition to the preprint sites following the FAIR data principles (findable, accessible, interoperable, reusable). 1 Doing so will also enable others to test, extend, and augment developed models without duplicating efforts, delivering results more rapidly and developing and testing hypotheses. • We will make models and trajectories available as soon as possible through open data sharing platforms such as the SARS-CoV-2 Biomolecular Simulation Data and Algo-rithm Store, 2 the Open Science Framework, 3 and the European Open Science Cloud. 4 • Where appropriate, we will also share algorithms and methods in order to accelerate reuse and innovation. Well-validated and functional machine learning methods and heuristic property calculators would be especially desirable, as are Monte Carlo models of infectious disease spread and prediction of the impact of different NPI strategies. Custom code will be made rapidly available in appropriate repositories (e.g., GitHub). • We commit to applying thoughtful permissive (and open source) licensing strategies (such as those recommended by Reproducible Research Standard) 5 to ensure that our models and data can be maximally reused, modified, and redistributed to rapidly advance the field in developing new therapies, while appropriately recognizing and acknowledging original authors and contributors. In support of these efforts, the SARS-CoV-2 Biomolecular Simulation Data and Algorithm Store draws on the expertise and discussion of several recent workshops, as well as ongoing community discussions and emerging lists of research efforts and resources. 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 The NSF MolSSI has recently created a special call for Seed Fellowship applications for students and postdocs that focus on software development, data science, workflows, and machine learning challenges that are especially relevant to the ongoing COVID-19 research. Furthermore, MolSSI in collaboration with BioExcel 16 is setting up a centralized github and file sharing service to provide a centralized site for community data and is also working with Zenodo 17 and the Open Science Grid 18 to help store data and data analysis outcomes for this global initiative. Data storage and high performance computation can also be linked and integrated (e.g., with biomedical data) by e-infrastructures such as Fenix/ ICEI 19 developed for the Human Brain Project. Our community should be also aware of the high performance computing resources made available for COVID-19 research (through, e.g., the Partnership for Advanced Computing in Europe (PRACE) 20 , HECBioSim 7 , and CCP-BioSim 21 in the UK and the COVID-19 High Performance Computing Consortium in the United States, 22 and other similar initiatives worldwide, 23 including by supercomputer centers and cloud providers). We recognize that we represent only a cross section of our community and encourage others to follow these principles; all are welcome to join this effort. We offer our support to others already working on open data efforts in the hope that others working on COVID-19 in biomolecular simulation and other areas will adopt similar best practices. Signed, Email: ramaro@ucsd Notes The authors declare no competing financial interest