key: cord-0958685-1uxz6btq authors: Clyde, Austin title: AI for science and global citizens date: 2022-02-11 journal: Patterns (N Y) DOI: 10.1016/j.patter.2022.100446 sha: 0f5943927ca1308a48062dfb92ff07b38686a1f6 doc_id: 958685 cord_uid: 1uxz6btq Artificial intelligence (AI) for science is a growing area of interdisciplinary computer science research focused on solving some of the most pressing global issues. While many cite AI’s technical advances as the innovative force of the endeavor, I argue that interdisciplinarity, democratization, and cogent justification toward global citizens are driving forces to be fostered in the program’s development. Artificial intelligence (AI) for science is a growing area of interdisciplinary computer science research focused on solving some of the most pressing global issues. While many cite AI's technical advances as the innovative force of the endeavor, I argue that interdisciplinarity, democratization, and cogent justification toward global citizens are driving forces to be fostered in the program's development. AI for science (AI4Science) reimagines the scientific discovery process to incorporate advances in artificial intelligence (AI), machine learning, and high-performance computing (HPC). To meet the challenges of the current generation, discovery needs to be significantly accelerated to create new materials for climate goals or rapid drug development for emergent pandemics and diseases. But should sheer technological development through AI and HPC be imagined as the driving force toward this future? To meet AI4Science's goal, I argue three main hurdles need to be overcome and are necessary investment points: interdisciplinarity, democratization, and an orientation toward cogent justification. Any technical discoveries led by AI4Science must be supported by including disciplinary perspectives, the global scientific community, and a communicative stance to citizens. Without these hurdles met, discoveries will fail to reach the scale desired, the creativity needed, and the uptake sufficient for their success. The long path to democratizing science is based not only on principled FAIR data sharing or reproducibility currently underway but also requires investment in institutions and practices for interdisciplinary engagement where AI is an aide, not the focal point, for resource sharing and collective, open science. We should broadly interpret the AI as algorithmic thinking aimed at producing structures for collective intelligence-which science has long been known as from scientist Ludwik Fleck-rather than opaque algorithms that threaten interdisciplinarity, resource sharing, and most importantly, a communicative stance toward citizens and theory. AI4Science broadly represents ''the next generation of methods and scientific opportunities in computing, including the development and application of AI methods . to build models from data and to use these models . to advance scientific research.'' 1 The titular technology invokes the recent advances of AI. But scientific informatics laid the groundwork over 30 years ago through translational research. Informatic disciplines such as cheminformatics or bioinformatics translate fuzzy and competing scientific theories between computerized schematic models. The informatician must satisfy the scientific theorists' doubt against axiomatic assumptions and generalizations, challenge HPC workflows, and extract knowledge from the scientific community. For example, cheminformatics began by interpreting small molecules as graphs with nodes and edges as atoms and bonds. It is a reductive translation that loses out on many nuances of chemical theory. Interdisciplinary translation requires erasing, adding, modifying, hiding, mutating, and co-designing. Translation requires constant communication back and forth. This classical assumption about molecules as graphs has produced many results and is promising for accelerating the future of molecular dynamics simulations. At the same time, complacency with the model has led many computer scientists toward only performing in silico validation. State of the art benchmark performance, reproducibility studies, novel architectures, and so forth are essential pieces of computer science research but often fail to meet and engage other disciplines. This is not to put down such research (which I continue to work in) but rather to reorient focus toward interdisciplinary engagement, which benchmark tasks can lose sight of. Metrics in learning problems often conceal the underlying workflow and its values. For example, in drug discovery, the real measure of a good model is one that (1) identifies a molecule that is synthesizable, safe, and effective, (2) reduces experimental cost (false positives), and (3) comes packaged with a justification that is convincing to medicinal chemists, pharmaceutical companies, and the community at large to invest large sums of money into. The justification to the disciplines that control the gate for downstream drug design is often lost in pure benchmark machine learning tasks. One exemplary project that has taken interdisciplinarity seriously, at least internal to the project's development, is AlphaFold2. 2 AlphaFold2's protein folding capabilities have astounded the structural biology community. AlphaFold2 is deeply interdisciplinary, especially compared to other approaches. Rather than following a traditional machine learning paradigm based on constructing a deep neural network that predicts some measure from a set of features, AlphaFold2 builds on top of multiple sequence alignment (MSA) in an iterative algorithm. MSA is a technique for aligning protein sequences based on similarity to determine common structural motifs or understand phylogeny or homology and has a strong basis from the structural biology community. 3 The engagement with a rich tradition of structural biology and the integration with artificial intelligence has transformed structural biology and brought two communities exceptionally close together. By following the deep engagement with disciplinary knowledge and practice, AI4Science projects can open new discourse and terrains for discovery. The second challenge for a new scientific paradigm is democratization. Democratization of science requires accepting the values of inclusivity, decision-making equality, and deliberation-the latter factor demanding more than just open science alone. Computer science has, in some respects, produced great strides toward open science through opensource software. Regardless of someone's status or physical location, software contributors are welcome to fix bugs, add features, or improve efficiency. Furthermore, the data science ethos of FAIR data principles, 4 reportability, and public availability of computing resources, such as Google Colaboratory, has produced global communities of computational and data scientists. There is still progress to be made, especially regarding HPC systems, which cost upward of millions of dollars. On the other side, experimental scientific disciplines face complex challenges to democratize their capabilities. Experimental space, equipment, and materials require highly specialized technicians and resources, which exclude many global academics and universities from classes of experiments. Some might object that democratization is a mere virtue that can conflict with the urgency of certain discoveries or other competing interests. There are many ways to respond to a perceived conflict of values here. To this objection, I will argue that democratization is necessary to achieve the aims of discovery, and a commitment to sharing is already internal to scientific success. Because a group of diverse thinkers is more successful and creative than a few narrow-minded ones, 5 science done by many is likely to be more successful. At a large timescale, one can view the scientific community as a (somewhat) diverse and inclusive group that collaborates on scientific theories through peer review and conferences. Many philosophers of science attribute the disciplinary success of science to this community-driven ethos. 6 Democratization is essential, and it is already happening. So then, if science's success arises from diverse thought collectives, why slow the process down by siloing the deliberation, contestation, and groupthink to an afterthought? If AI4Science aims to accelerate discovery, it needs to also accelerate the development of a thought collective through inclusion and democratization. This may seem like a tall order: include as many people as possible in the discovery process, overcome resource inequality excluding many scientists, and go faster than the traditional process. Yet, the COVID-19 Moonshot Consortium has proved the paradigm possible (and successful). 7 The consortium created an open-source drug discovery platform to find an inhibitor of SARS-CoV-2 3CLmain protease. They solicited the submission of compounds from anyone with internet access who could draw one. From there, they ran a series of standard computational tests on all submitted compounds to determine basic chemical properties, compound availability, synthesis feasibility, and potential molecular pose. These initial standard tests determined which compounds to consider for purchase, assaying, and crystallography-a kind of initial filtering. Compounds from the community that met cutoffs were prioritized through free energy calculations and complex molecular dynamic simulations (funded by computer time donated through Folding@Home). From start to finish, the entire process was open, contestable, and transparent through data releases and a web portal. Their web forum showed lively conversation, improving the assay conditions and computational screening benchmarks. The project illustrates that even without massive reshaping of the scientific economy and global funding environment, the inclusion of scientists globally is not only possible but highly successful. While the primary technique of the Moonshot consortium did not involve AI and machine learning practice, computation and informatics underpin its infrastructure. Because the Moonshot consortium took algorithmic thinking head on from the start, the scientific process was democratized effectively. The AI community is well positioned to assist interdisciplinary projects to pursue this work style, as they already democratize benchmarks and submissions for significant scientific challenges. Second, the AI community, which comes from the open-source one, already has an ethos of sharing, acceptance, and portability. Third, developing AI models for further filtering and screening from extensive batch suggestions and ideas from the global community will allow more submissions, thus increasing the feasibility of inclusivity through the process. While this third area has the potential for bias and exclusion, the AI community has a largescale effort to detect and mitigate learned biases. Computational techniques are well positioned to democratize science through standard submissions to challenge problems, deliberative forums for assessing and tweaking problematic experimental conditions, open analysis and research sharing, and separation between resource ownership and capability access. AI for our sake Justification with opaque AI will be a great challenge for the AI4Science campaign. On the surface, scientists must compete with computer science aims such as computational scaling or algorithmic innovation while simultaneously appealing to domain justification such as experimental validation. At a larger scale, justification then must happen for the community and public at large for adoption to take place. Justification runs across the scientific spectrum from public appeals for funding large scientific projects to convincing the scientific community of results. The univocal justification of traditional scientific writing must be challenged as interdisciplinarity requires justification to many at different levels. Computer scientists in drug discovery may not be able to assess the assay conditions or existence of confounders to a perceived hit in a screen. On the contrary, chemists are unlikely to be moved by certain factors that effect computational scaling that in the long run determines the overall computational capabilities. Unless the computer scientists contextualize the workflow achievement in terms of a goal of the chemistry audience, and the chemist addresses possible presuppositions of the assay conditions that computer scientists may Opinion not be thinking about in assessing the experimental results, the fields will talk past each other. Furthermore, the ability for citizens to engage in the scientific project outside of the scientific community has immense potential for addressing issues of trust in science. From high school students working on cheminformatics problems to data scientists analyzing data in their spare time, citizens can develop a feeling of authorship toward science as a global and inclusive project. Political philosopher Thomas Cristiano writes alienation from decision making ''is like playing a game whose rules do not make any sense to one.'' 8 For citizens to first make sense of and second identify with the rules and practices of science, science needs to orient itself toward mutual justification. Citizen identification with science is part of the long road of developing trust in science, which is ultimately paramount to the political adoption of any global-scale scientific recommendations that emerge from AI4Science's campaigns. Therefore, the role of mutual justification needs to be taken seriously, not just as a solution to the current conflict but as a sustainable end to pursue new and novel technologies for the next generation of global crises. AI4Science is a promising endeavor not only for its premise of accelerated discovery and a new scientific revolution. AI4Science has a greater potential at ex-panding the collective intelligence of scientists through tightly integrated interdisciplinary research, inclusion of the global community through a communitydriven ethos, and reorientation of justification and cogency. AI4Science is really accelerating the very factor that philosophers of science attribute to the core success of science thought collectives. Drug discovery during the COVID-19 pandemic has offered a glimpse into the future of tightly integrating computation and science through various collective and consortium projects, which ought to serve as a model for future grand challenges. With new technological paradigms on the horizon, we should step back and look at the ways in which these technologies not only have potential to accelerate discoveries in a vacuum but ultimately create and foster the social conditions necessary for the success of science, attainment of global inclusion, and mutual justification to all. Without these conditions at the forefront, a myopic focus on results themselves will not achieve the science we desperately need. The author has no competing interests to declare. AI for Science: Report on the Department of Energy (DOE) Town Halls on Artificial Intelligence (AI) for Science Highly accurate protein structure prediction with AlphaFold GeneSilico protein structure prediction metaserver The FAIR Guiding Principles for scientific data management and stewardship Evidence for a collective intelligence factor in the performance of human groups Science as Social Knowledge: Values and Objectivity in Scientific Inquiry COVID Moonshot: open science discovery of SARS-CoV-2 main protease inhibitors by combining crowdsourcing, high-throughput experiments, computational simulations, and machine learning The Constitution of Equality: Democratic Authority and its Limits Austin Clyde is a PhD candidate at the University of Chicago and an assistant computational scientist at Argonne National Laboratory. His research focuses on scaling drug discovery techniques to exascale computing infrastructure using artificial intelligence. He is a visiting research fellow at the Harvard Kennedy School's Program on Science