key: cord-0773994-lxfc4gsg authors: Kobayashi, Rika title: Technological Advances in Remote Collaborations date: 2021-10-15 journal: Top Curr Chem (Cham) DOI: 10.1007/s41061-021-00354-6 sha: 64cc77932740ed44d74bd6e33c3e781ec912c843 doc_id: 773994 cord_uid: lxfc4gsg Sustainable scientific software needs a strong collaboration framework to ensure continuity by passing on the tools, skills and knowledge needed to the next generation. The COVID-19 pandemic triggered the unexpected effect of accelerating the development of remote platforms and tools to open up collaborations to a wider global community. In this article we outline the elements needed for such a framework, such as education, tools and community building, and discuss the current advances in technology with a nod to the future. Probably the foremost issue for the continued development of scientific software is the shortage of programmers. This is recognised worldwide, as reflected in the many articles decrying the shortage of software engineers, e.g. Lee [12, 13] estimates that by the year 2024 that number is expected to reach 1 million. This problem is perceived as being one of education, spawning many government initiatives. In Australia, the government introduced the Coding Across the Curriculum initiative with the aim to promote the teaching of digital technologies, including coding, across the different year levels in Australian schools [14] . In 2016, President Obama proposed a CS For All initiative with a US$4 billion dollar budget for computer science education in the United States that did not get approved [15] . The European Commission produced a digital education report for Europe in 2019 [16] and a general worldwide overview can be found in a 2019 UNESCO report [17] . Most of these initiatives concern the Tech and IT sector and target languages, such as JavaScript, Java and Python [18] . Few address the shortage of scientific programmers, except in the rapidly growing field of data science. However, when it comes to software in the applied sciences, certainly in high-performance computing (HPC), the majority of programs are written in Fortran and C/C + + [19] . From a survey in 2015, Rouson (Rouson, personal communication) reported on the programming languages used at NERSC (National Energy Research Scientific Computing); Fortran accounted for close to 60%, followed by C ++ and C, at about 35% and 31%, respectively. Fortran was found to be the primary language for 23 of the 36 top codes, yet it is a language that is no longer widely taught. See also the entertaining talk by Roland Lindh in this series [20] . The technical skills shortage has been partly addressed by the rise in online learning [21] -a natural extension of distance education-suited particularly to programming and computer-related courses because of its basis in a digital environment. Worldwide government cuts in funding for education have made distance education more appealing as an easy source of revenue, leading to a highly competitive, and growing, education industry. At the turn of the century, the market for higher education through distance learning was estimated at US$ 300 billion. In 2019, pre-COVID times, Renub Research estimated that the online education market would reach US$ 350 billion by 2025 [22] . Renub's report highlighted online course providers Coursera [23] and Udacity [24] and indeed their most popular offerings are computer science related: programming, machine learning/artificial intelligence (ML/AI), data science. The majority of courses, as mentioned above, do not target scientific programmers. To date, the only online Fortran programming courses that could be found online were provided through Udemy [25] and Tutorialspoint [26] , though there is further material available online in the form of university lecture notes and tutorial handouts. In fact, many online courses just provide lecture material, often in the form of videos and handouts. The more sophisticated online programming courses leverage the digital environment by embedding interactive exercises, as with the Tutorialspoint Fortran course, assessment and a discussion forum. At this point in time until this skills shortage is redressed, as has been done for the IT sector, passing on these advanced programming skills is in the hands of the scientific community. The global COVID-19 lockdowns found many technical platforms and tools coming into being or reinventing themselves to adapt to the remote working landscape as can be seen in Fig. 1 (a composite adapted and updated from several sources [27] [28] [29] ). Video conferencing platforms became schools, fitness centers, places of worship; social apps, such as for chat and gaming, turned into tools for remote work. The relevant tools from a programming perspective without a doubt would start with code hosting platforms or version control repositories. These have been in use for many years, starting with early version control systems [30] , such as CVS [31] and Subversion [32] , which were essentially a mechanism for tracking code revisions. However, as software development expanded and became more complex, often involving many authors working concurrently on different parts of code there was a need to have some form of coordination giving rise to distributed revision control, led by Git from Linus Torvalds in 2005 for development of the Linux kernel [33] . The majority of software packages are probably now hosted on such a code repository, the main ones in the scientific community being Github [34] and Gitlab [35] . This is often coupled with a Continuous Integration system. The idea behind the workflow is to keep a master copy of the code on the repository which can be downloaded individually and worked on. Changes from the various developers' working copies are then merged back, with the platform providing a mechanism for tracking and resolving dependencies and conflicts. This is the most problematic part of collaborative software development, where independent developers can introduce changes incompatible with each other, e.g. reusing the same variable or changing the underlying structure. One way to mitigate this is to check out the code regularly to keep as close to the master copy as possible, the ideal being to spend less time merging the change than making the change itself-"integration hell". Continuous integration automates this merge to a frequent basis, at least daily, together with running a set of unit tests. For the extended complicated software suites that make up the bulk of computational chemistry software, new developments can be a long time in the making, rendering "integration hell" unavoidable. Recent times, perhaps in mitigation of this or in conjunction with the popularity of "pair programming", have seen the springing up of real-time collaborative coding platforms where developers work on the same piece of code. This is already something that has been in place for a while with document sharing such as Google Docs [36] and Overleaf [37]. There will probably still be a need for a set of privileged developers to approve code changes but it is believed that the main advantage will be being able to see concurrent development as it happens, making potential conflicts more noticeable. Again, of the most popular collaborative coding tools [38] , few target Fortran and it is still early days so not clear whether this approach can scale to a complicated software suite with a number of modules and developers. The biggest concern during the COVID-19 global shutdown was arguably what could be termed the human factor, described variously as lack of connectedness, the need for "real" interactions, and is the main argument for "return to the office" (see next section). There was already a movement towards the formation of global research communities through the rise of collaborative hubs, also known as virtual research environments or science gateways [39] . The idea behind these is to provide a technological infrastructure of shared computational resources: software, data, tools, workflows, HPC access, through a web portal or apps, thus enabling research to a broader scientific community. There now exist many well-established communities, as can be seen in the Special Issue on International Science Gateways 2017 [39] , and these have proved successful in some disciplines, notably the Galaxy Project [40] , started in 2005, which, according to their latest report [41] , had "served hundreds of thousands of users, been used in >5700 scientific publications, and provided 500+ developers with a framework provisioning accessible, transparent and reproducible data analysis". There is already some online community through the various code repository platforms but a true collaborative hub should have elements of: • venue-a central website accessible to all and possibly distributed regional hubs to serve a more localized community; • repositories for collecting and sharing software, tools, data, workflows; • active community that can communicate synchronously and asynchronously for the exchange of ideas and training; • (optionally) access to high-end HPC facilities as part of the workflow. In the computational chemistry world there are existing initiatives, such as Nano-Hub [42], and nascent hubs, such as AiiDA [43] and Edison [44] . Nanohub, having been established in 2002, is the oldest of these and describes itself as a science gateway supporting a global Network for Computational Nanotechnology community within a cloud environment. It provides a wealth of resources, including training courses, discussion forums, simulation and modeling tools, and a computing environment in which to run them [45] . Its main usage and success appears, however, to be in delivering courses and enabling simulations-supporting an application community rather than a programming and development one. On the other hand, AiiDA started from the Quantum Espresso [46] developers community, primarily for building an infrastructure providing tools for designing, deploying and analysing materials science simulations, integrated into HPC environments. However, simultaneous efforts since into education and collaboration have expanded its range into Materials Cloud-a web platform for computational materials scientists to "share their work and promote open science" [47] . Furthermore, the Materials Cloud community have begun taking steps towards defining standards, such as file formats and metadata, to facilitate interoperability. The concept of interoperability, the ability for different groups to exchange data consistently, is recognized as an important part of software sustainability, but as yet has been addressed comparatively little in computational chemistry. A final quick mention should be given to the talk that initiated and possibly inspired this symposium series "A Web Platform for Scientific Collaborations", lectured by Cheol Ho Choi [48] . Leveraging the principles of modular environments, they have created a web platform based on sharing and running computational chemistry modules and workflows via a graphical pipeline and opened it up to the wider quantum chemistry community in the hopes of establishing a scientific software ecosystem. However, these collaboration hubs are still rooted in two-dimensional screens and do not authentically fill the gap of the lamented missing "real" human interactions-the body language, corridor conversations and serendipitous interactions. Continued TFOM activities have allowed us to explore further these social aspects, especially whether extended reality (XR) and immersive technology can help add a social human factor into our virtual offices and conferences. To this end, we have held a variety of events in virtual reality (VR) platforms such as Altspace [49] , NEOS [50] and Glue [51]. These were engaging and fully immersive, but the immaturity of the platforms and the technological requirements do not make them practical today. We were able to explore this aspect further by being given the opportunity to discuss "The Future of Meetings: Working in XR?" with the XR Developer Community through a Birds of a Feather Session at SIGGRAPH 2021 [52] . As part of the session, the attendees were polled informally on a variety of questions concerning the state of XR (for the complete set see Ref. [53] ). The poll was not a rigorously conducted exercise so not too much can be read into it, but it was indicative that the albeit small sample felt XR was able to substitute "real" human interactions and that the industry believed that we will be seeing meaningful change in the near future (Fig. 2) . In the early days of the COVID-19 pandemic an article appeared in Nature headed "A year without conferences?" [54] discussing the impact on researchers and raising the prospect of a need to rethink the concept of meetings. This article was very quickly countered by examples of successful virtual conferences from around the world, spanning many disciplines, notably one from the Virtual Winter School on Computational Chemistry, which has been running annually since 2015 [55] . Traditionally, conferences have been a means for scientific communities to meet and share knowledge, but these stemmed from a time when communication was slow and travel not so easy. However, technological advances as described here have blurred the need to travel to achieve these outcomes. The pros and cons of virtual conferences were explored in depth in the aforementioned Future of Meetings symposium [10, 11] and continue to be explored through various initiatives by the TFOM community [56]. The definite "pro" of virtual conferences has, without doubt, been their accessibility, inclusivity and sustainability. On the whole, virtual attendance figures have been much higher than their in-person equivalents as the cost of travel is no longer a barrier to attendance. Junior researchers have reporting increased confidence and feeling of safety in the virtual environment and there is significantly less harm to the environment. TFOM calculated the symposium produced 1420 kg of CO 2 equivalent compared to an in-person equivalent of 280,000 kg of CO 2 . The biggest disadvantage, certainly for Australia, has been juggling timezones, coupled with the difficulty in separating conference and domestic duties as few international conferences overlap with normal working hours. For developing countries technological accessibility is the biggest problem and a recent a poll of 900 Nature readers [57] cited "poor networking opportunities" as the biggest drawback. With the effects of the COVID-19 pandemic still being felt around the world today, it is probable that there will be long-lasting changes to the way we work and meet. Major companies, especially the tech giants, are reducing their office space with some going fully remote [58] following the lead in May 2020 of Twitter and Facebook who announced that they would give staff the option to work remotely permanently [59] . Together with the rise in the globally competitive Distance Education industry, online learning is becoming more accepted. The COVID-19 pandemic provided momentum for overcoming the potential barrier to adoption of remote teaching practices that had been considered niche, such as flipped teaching. Such practices have become mainstream, especially as more educators are recognising their effectiveness in this digital age. The perception that online degrees were not "real" degrees is diminishing now that many students have been able to experience direct comparisons. Similarly, Virtual Winter School, which had been motivated by the desire to make accessible to a wider audience experts in the field they would not normally be able to hear live or interact with, was attended regularly by participants from less advantaged countries. The 2021 School had noticeably more attendance, typically reaching about 200 for most sessions, more than double the usual attendance, and especially from the more established community. The level of engagement demonstrated that the Virtual School is a viable format for fruitful scientific exchange and hopefully this participation level will continue. There is evidence that scientists want virtual meetings to stay after the COVID pandemic from the Nature poll [57] and reflected in our own survey from TFOM shown in Fig. 3 . The concept of "hybrid" is gaining in popularity and it could be that the future will be some mixture of in-person and virtual. However, to be done well, virtual collaborations, whether conferences, meetings or teaching, need more effort, from planning to delivery. TFOM activities have shown how virtual conferences can be effective and subsequently we have been continually called on to advise on best practice for a variety of virtual initiatives. The obvious benefits of accessibility, inclusivity and sustainability, as highlighted in our conference experiences, are still competing with the drawbacks of time-zones, technological accessibility and the human factor, though XR may soon be able to provide a solution for that. TFOM is seeing disheartening signs of people wanting to take what they think is the easy option, i.e. go back to the way things were before. The future will be determined by the people who can see what virtual can do versus the people who see what it can't. Sustainable scientific software needs a strong collaboration framework. With the lifetime of computational chemistry software packages exceeding that of the developers who began them, there needs to be a mechanism to pass down the generations the software, tools, skills and knowledge to maintain continuity. The COVID-19 pandemic opened up the world to a potentially global community of developers, whether through the imaginative creation of remote collaboration platforms and tools or just by forcing people to take the digital plunge. It is still early days to know what work in the post-pandemic world will look like, whether we just go back to how we used to do things or whether these remote innovations will be embraced and developed further. There is now a wealth of tools out there to help us meet and work, and possibly even develop software, better virtually. Through TFOM, we have demonstrated how these tools can be used to return to a better normal. Now is the time to keep the momentum going and make use of them to build the foundation of a solid and global software development community. New developments in molecular orbital theory Polyatom: a general computer program for ab initio calculations ATMOL Expanding the limits of computational chemistry A scientist's perspective on sustainable scientific software Fundamentals of software sustainability How do scientists develop and use scientific software? MolSSI-The Molecular Sciences Software Institute The future of meetings: outcomes and recommendations Forging a path to a better normal for conferences and collaboration How to close the tech skills gap Act: The App Association 14 The United States of coding European Commission/EACEA/Eurydice (2019) Digital education at school in Europe Coding, programming and the changing curriculum for computing in schools These are the programming languages most in-demand with companies hiring Why are climate models written in programming languages from 1950? PARTEE Strategies for the OpenMolcas Legacy codes-fifty shades of Fortran The rise of online learning Online education market & global forecast, by end user, learning mode (selfpaced, instructor led), technology, country Remote work (WFH) tech landscape pauaventures 65+ startups helping you work from home The source code control system Rcs -a system for version control Version control with subversion, 1st edn Pro git The top 5 online IDEs for in-browser development developer International science gateways 2017 special issue The Galaxy platform for accessible, reproducible and collaborative biomedical analyses AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance nanoHUB.org: cloud-based services for nanoscale modeling, simulation, and education QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials Materials cloud, a platform for open computational science A web platform for scientific collaborations The future of meetings: working in XR? A year without conferences? How the coronavirus pandemic could change research Online conferences-towards a new (virtual) reality Scientists want virtual meetings to stay after the COVID pandemic The future of offices and workspaces, post-pandemic Twitter, Square announce work from home forever option: what are the risks Acknowledgements Thanks to the TFOM community, especially Vanessa Moss, Glen Rees and Patrice Rey for accompanying me on this journey, and Aidan Hotan, Chenoa Tremblay, Claire Trenham, Roger Amos and "Earthmark" for cheering us on (and providing invaluable material). Thanks also to Damian Rouson of Berkeley Lab for his excellent "Why Fortran persists" presentation, and the Australian Government Department of Education, Skills and Employment for providing a copy of their Coding Across the Curriculum report. The author has no conflicts of interest or relevant financial interests to declare.