key: cord-0018872-rd4elom8 authors: Duvaud, Séverine; Gabella, Chiara; Lisacek, Frédérique; Stockinger, Heinz; Ioannidis, Vassilios; Durinx, Christine title: Expasy, the Swiss Bioinformatics Resource Portal, as designed by its users date: 2021-04-13 journal: Nucleic Acids Res DOI: 10.1093/nar/gkab225 sha: 94b7e486f8041aed7cfc094172a22da05bfb7d67 doc_id: 18872 cord_uid: rd4elom8 The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss) creates, maintains and disseminates a portfolio of reliable and state-of-the-art bioinformatics services and resources for the storage, analysis and interpretation of biological data. Through Expasy (https://www.expasy.org), the Swiss Bioinformatics Resource Portal, the scientific community worldwide, freely accesses more than 160 SIB resources supporting a wide range of life science and biomedical research areas. In 2020, Expasy was redesigned through a user-centric approach, known as User-Centred Design (UCD), whose aim is to create user interfaces that are easy-to-use, efficient and targeting the intended community. This approach, widely used in other fields such as marketing, e-commerce, and design of mobile applications, is still scarcely explored in bioinformatics. In total, around 50 people were actively involved, including internal stakeholders and end-users. In addition to an optimised interface that meets users' needs and expectations, the new version of Expasy provides an up-to-date and accurate description of high-quality resources based on a standardised ontology, allowing to connect functionally-related resources. Databases, software tools and on-line services are essential resources in the daily work of most life scientists. They are key to the long-term preservation of scientific data and the reproducibility of life science studies. As on-line bioinformatics usage spread and diversified, portals offering single access points were created (1, 2) and later on, structured catalogues were proposed (3, 4) . In this landscape, the Ex-PASy portal pioneered. It was created in 1993 referred to as 'the Expert Protein Analysis System' with a primary focus on protein knowledge (5) . It was the first life science website -and among the 150 very first websites in the world. In the past 27 years, the portal has evolved. It was redesigned several times and the penultimate version was released in 2011, when it became the 'ExPASy SIB Bioinformatics Resource Portal', a catalogue of bioinformatics resources (6) . As detailed in (7) SIB Swiss Institute of Bioinformatics federates a Swiss bioinformatics community now close to 800 scientists. In this context, hundreds Table 1 . Examples of user-types (personas) and their use cases of cutting-edge resources are developed and made available to the life science community. Among those, twelve key resources--the so-called SIB resources--receive specific support from the Institute and form the SIB Resource portfolio that ranges from specialised knowledge bases such as UniProtKB/Swiss-Prot (8) (part of the UniProt consortium), neXtProt (9) , STRING (10) and Bgee (11) , to online tools such as SWISS-MODEL (12) and SwissDrugDesign (13) . In this article, we describe the latest version of Expasy together with its redesign process that relied on rules of User-Centred Design (UCD). UCD combines User Research (UR), which addresses the question 'Who is the target audience?', with User Experience (UX), which addresses the question 'How does the user interact with the application?'. Originating from the computer industry, UCD is emerging in science, as its value has been acknowledged (14, 15) . Expasy was improved to maintain access to SIB's high-quality scientific resources and to offer up-to-date and accurate information on each of them, with a more efficient and easierto-use search engine. Additionally, a networked organisation of resource descriptions was implemented, to broaden the scope of the initial search and explore the diversity of the SIB resources. The last new aspect to the portal is the involvement of resource providers in the content update of Expasy as a crucial step toward guaranteeing quality information. With this approach, the new version focuses specifically on resources developed by or in collaboration with SIB groups, in contrast with previous editions. Expasy is an extensible and integrative portal including >160 databases and software tools developed by SIB groups. It covers a wide range of fields in life sciences and biomedical research, spanning genomics, proteomics, struc-tural biology, evolution, phylogeny, systems biology and medicinal chemistry. Expasy is an essential tool for a large variety of users, from beginners interested in discovering bioinformatics, to advanced scientists looking for specific biological answers. Table 1 illustrates three examples of typical users of the portal (so-called personas, see below) and their corresponding use-cases. Given its usefulness for the life science community, Expasy has undergone a major overhaul in 2020. The new user interface is the result of in-depth work with users to meet their needs and expectations. Its content is a long-term commitment of the resource provider community to describe each resource accurately and in a standardised manner. Figure 1 shows the homepage of the new Expasy portal, including its new logo. A search bar is directly accessible from the top of each page. A filter panel is proposed on the left side of the page. It reflects the categories defined to filter the resources (see definition below). All resources are listed as cards with the resource's name, a short description, type (database and/or tool) and categories. The portal provides a single search bar allowing simultaneously two types of search: a 'regular search' and a 'crossresource search'. The results are displayed jointly on one page. Regular search: browsing the resources. Expasy allows users to search for resources by name, keyword, category or description. When a user, for example, types 'virus' in the search bar, this results in a list of virus-related databases and software tools, such as V-pipe (16) , ViralZone (17), OpenFlu (18) or COVID-19 Scenarios (19) . We will refer to 'regular search' in the rest of the article to designate this type of search. All the resource-specific information is stored in a relational database back-end and queried using Elastic Search (https://www.elastic.co/elasticsearch/, (20)), with fuzzy search or approximate string-matching search enabled. Resources are linked one-to-the-other through ontology-based terms. As a result, they form a network of seamlessly connected resources (more details below). In addition to the regular search, Expasy offers a 'crossresource search', which allows to query a subset of webaccessible databases in parallel. Although most databases offer their own search functionality, the simultaneous query of a set of databases from a single hub is a convenient option. In this way, search results may include resources unknown to the user, yet relevant and potentially useful. This functionality, already introduced in the previous version of ExPASy (6) , has been revised with the help of the respective web service providers. It now includes 19 SIB databases, such as ENZYME (21) , MyHits (22) , STRING(10), UniProtKB (23), ViralZone, PROSITE (24) and SWISS-MODEL Repository (see Table 2 ). In addition to the default behaviour (i.e., full-text search), the search engine automatically recognises certain types of formatted data, such as UniProtKB accession numbers, PDB IDs, or Ensembl IDs. As a result, it sends the query only to resources that support the specified query type. This not only optimises search time, but also provides more relevant results. Figure 2 shows the result page obtained with 'COVID-19' as input in the search box. This page consists of two parts: • Top: the cross-resource search result (full-text mode) • Bottom: the regular search result (resource names, keywords, categories and descriptions). In the above example, the cross-resource search indicates that 'COVID-19' was found in two entries in Cellosaurus (25) and 3078 entries in UniProtKB. By clicking on the number of hits, the user is redirected to the query results in the respective databases. In case no result is returned (e.g. the format is not supported, or a resource server is temporarily down), an error icon replaces the number of results (see Figure 2 , top). The regular search indicates that 19 resources relate to COVID-19, such as PROSITE, SWISS-MODEL Repository, ViralZone, Nextstrain (26), GlyConnect (27) , SIB COVID-19 Integrated Knowledgebase (https://covid-19-sparql.expasy.org/), to name a few. Clicking on a card on the home page or on the result page shows the detailed view of the corresponding resource. Nucleic Acids Research, 2021, Vol. 49, Web Server issue W219 The two grey boxes display the EDAM (28) terms related to Operations [1] and Topics, Data and Formats [2] . The bottom of the page [3] lists the suggestions of resources that share at least one term of the Topic type with ViralZone. Figure 3 shows a detailed view of ViralZone, a web resource for all virus genera and families. Overall, the following information describes each resource: • Resources of interest for the user shown as cards Clicking on a keyword in the right box triggers a new query matching that keyword. Keywords in the old Ex-PASy were not standardised or following any commonly used ontology (those keywords will be designated as 'inhouse keywords' in the rest of the article). In the new ver-sion, keywords comply with EDAM (28), a comprehensive ontology of well-established concepts in bioinformatics and computational biology. EDAM applicability to searching, categorising and automatic handling of resources has been validated by implementations in eSysbio (https://nels. bioinfo.no/), Bio-jETI (29) and EMBOSS (30), demonstrating its relevance to resource catalogues. The ontology is actively developed, maintained and supported by a team of specialists. The four sub ontologies of EDAM are used: Operation, Topic, Data and Format. In EDAM, 'Operation' is 'A function that processes a set of inputs and results in a set of outputs'. Thus, all EDAM terms of the 'Operation' type are displayed in the box 'What you can do with this resource?'. By clicking on one of the EDAM terms in this box, the user can retrieve all resources that perform the same task. 'Topic', 'Data' and 'Format' define, respectively, the domain of application, the type of information and the data format used or output by the resource. Those terms are shown in the box 'Browse these keywords in Expasy'. The list of resources that share at least one EDAM term of the 'Topic' type is indicated in the 'You might also be interested in' section, which allows the user to explore SIB resources (see Figure 3 , bottom, for details). The redesign process started in 2019. Its aim was to improve user-friendliness, visual identity and content, as well as its Nucleic Acids Research, 2021, Vol. 49, Web Server issue W221 responsiveness on mobile devices. Additionally, we seized the opportunity to change the casing of the word ExPASy to Expasy to distinguish it from the original purpose of the website (the study of proteins) while keeping a worldrenowned brand in bioinformatics. We followed a UCD approach for the redesign, anticipating that optimised user interfaces and efficient organisation of information are more conducive to scientific discoveries. Yet, professionals usually tend to wrongly assume that end-users share the same view and behave in a similar way in a given situation. In reality, the development of high scientific quality resources used to their full potential and performing adequately, must include end-users in the design process from the very beginning. With this in mind, the redesign of Expasy was carried out through four developmental phases that are described below (see Figure 4 for an illustration of the process). In User Experience (UX), the preliminary phase is essential for making informed decisions. To this end, we first assessed the old ExPASy according to usability, usage statistics of the website as well as user behaviour. We also evaluated a series of well-established portals in the world of life sciences, the so-called 'competitors'. This preliminary phase allowed us to gain significant insight into best practices and user experience improvement according to the following four criteria: usability, usage, user behaviour and benchmarking. The outcome of our evaluation is summarised in Table 3 . Usability. Overall, we identified two major usability issues in the old ExPASy website ( Figure 5 ). Firstly, users did not distinguish search types ( Figure 5A ). Prior to launching a search, the user was expected to select the type of search (regular or cross-resource) from a drop-down menu. Our analyses and user tests showed that very few users were aware of this option. As a consequence, users typically ran the default search (cross-resource search in 77% of cases, according to the server logs). Secondly, the website was not responsive, that is, not optimized for mobile devices (Figure 5B ). Not only did this lead to very few users accessing the platform through mobile devices (around 6%), it also penalised indexing by the major search engines. Usage. The study of the old ExPASy usage figures were based on Google Analytics spanning the year 2018. Overall, we observed a bounce rate of 50% which is considered as good. A bounce is when the user opens a single page on a website and exits without triggering any other action. There was an almost exclusive use of the website through desktops (94%), a huge number of acquisitions by direct access (32%), an average number of acquisitions by organic (search engine) searches (56%), and a slightly lower acquisition by referral from other, external websites (21%). Usage figures are comprehensively provided in Supplementary File S1. User behaviour. A close look at the user behaviour flows revealed the low level of navigation between resources. From this observation we identified the need for further development to support navigation and exploration. Supplementary File S1 also highlights this trend. Competitor benchmarking. Benchmarking is an efficient way to analyse user interfaces and community standards. In this work, we targeted well established websites to (i) evaluate ongoing practices in the field of bioinformatics, (ii) benefit from the familiarity factor, (iii) validate or disprove assumptions and (iv) assess the added value of Expasy compared to other bioinformatics portals. For this study, we focused on four portals: The details of this benchmarking can be found in Supplementary File S2. Examples can be found in Table 3 . The analysis of the old ExPASy usage numbers as well as the user behaviour flows led to the creation of personas, that is, fictional characters, which are stand-ins for the different Expasy user types with characteristics, needs and expectations. Creating personas helps software developers to (i) assess the extent of different needs and expectations, and (ii) optimize the user experience (31) . Three personas were created: a bioinformatician, a junior, and a senior biologist (Supplementary File S3 represents the personas and illustrates to what extend the outcome of the study supports the features of each persona). Finally, the new Expasy was drafted as a wireframe as shown in Figure 6 . A wireframe is a visual guide that represents the skeleton of a website. It depicts the layout of the user interface elements and their interactions. It lacks ty-pographical style, colour or graphics, as the main focus is functionality, behaviour and content. In other words, it focuses on what a screen does, not on its appearance. A wireframe is employed in user tests, allowing for early feedback on products. Usually, a wireframe is iteratively enhanced at each user test to produce an optimised prototype used by developers for implementation. The first wireframe consisted of (i) a single search bar, performing the two types of search in parallel (regular and cross-resource search) with no prior selection from a drop-down list, (ii) a homepage showing all resources listed as small cards, with a short description and additional information (website, contact, type, etc.), (iii) different types of filters (categories, resource type, etc.) to explore SIB resources according to various criteria, (iv) a detailed view of each resource providing the user with more information, and a 'You might also be interested in. . . ' section allowing to broaden the scope to other SIB resources. At this stage, the pending issues were: • What information should the cards contain? • What information should the detailed views contain? • How to ensure links between resources with common features? • Which filters should be applied? • Which resource categories should be used to cover the diversity of SIB resources? • How to monitor revision and long-term update of resource description? For the elaboration phase, we organised an interactive workshop, which brought together 17 members of the SIB community, including PIs of major SIB resources, software developers, scientists, communication and training specialists. Indeed, the SIB community brings together both the developers or providers of the resources, and the Expasy end-users. Furthermore, the workshop aimed to ease the adoption of the future version of Expasy, which is crucial for the long-term revision of resource descriptions, and more generally to sustainable content quality. Various exercises were carried out during the workshop, and their outcome was processed in order to refine the wireframe. More specifically, the audience unanimously confirmed the following prerequisites: • A unique search box performing the two search types (regular and cross-resource search), • The exclusion of citations, contact and website icon in the resource detailed view, and a clear description of the resource purpose, • The replacement of in-house keywords by EDAM terms, • The only indispensable filter is by category. The field of bioinformatics evolves rapidly, therefore, through a 'card-sorting' exercise, the categories and subcategories were updated, as shown in Table 4 . At the end of the workshop, a new version of the wireframe was released and validated by the participants. This wireframe, resulting from a consensus of the SIB community, was then proposed to a set of external users. The iterative phase is about meeting the users, showing them the wireframe and improving it based on their feedback. Five users were recruited either through a mailing campaign across the authors of scientific articles citing Expasy or in person at various scientific meetings. It was established that testing a wireframe on no more than five users is sufficient to identify the major usability issues (65-85%) (32) . In practise, the redundancy of user feedback is such that all important issues are brought out with the first five users and very few new ones arise when more users are consulted. The 'law of diminishing returns' (33) applies. Conversely, testing a wireframe on a single user is risky: a single person may perform actions by accident or in an unrepresentative manner and this may skew the test results. During the tests, W224 Nucleic Acids Research, 2021, Vol. 49, Web Server issue At the end of the iterative phase, the outcome that led to the final wireframe version (Figure 7 ) was the following: • Overall, the users liked the unified search box and confirmed their unawareness of the regular search in the old implementation. • The added value of Expasy lies in the classification of the resources by category. Besides, users preferred the panel on the left, over the initially proposed dropdown menu at the top. • The result page, with its two types of results ( Figure 7C and D), was well understood. More specifically, in the cross-resource search results, the users appreciated the classification of results by category and the short description of each queried database. The implementation phase spans stages from the revision of the pre-existing ExPASy content to the development of the new product before going live. To update the pre-existing data, we started by removing decommissioned resources as well as non-SIB resources. In the old ExPASy, in-house keywords were used to describe each resource main features. Manual mapping of those keywords to EDAM terms was performed and applied to each resource. The mapping result was then sent to the resource providers along with additional information such as the long and short descriptions, the contact email or the URL. Each of the resource descriptions was reviewed and enriched by the resource providers. A total of 154 resources were re-evaluated before the new Expasy implementation went live. As for the development of the new release, a few key points had to be considered: • Knowing that about 49% of global internet traffic can be attributed to mobile devices (source: https://gs.statcounter.com/platform-market-share/ desktop-mobile-tablet/worldwide, October 2020), Expasy needed to be responsive, not only to attract more mobile device users, but also to improve indexing by the major search engines, To increase speed of development and ensure user satisfaction, we opted for an agile methodology (https:// agilemanifesto.org). In software development, agile refers to discovering requirements and developing solutions through a high level of engagement of cross-functional teams, enabling early delivery and continuous improvement. Eleven people (developers, business analysts, product owner, SEO expert, user interface designer, graphic designer, system administrators) participated in the implementation phase, and the new Expasy went live on 15 October 2020. The resource providers were granted access to the Expasy administration interface and provided with a video tutorial. As of January 2021, 15 new resources have been added, and 50 have been updated. Involving resource providers in the continuous content update of Expasy is crucial for ensuring accurate and relevant information. With this procedure, along with the yearly update reminder sent to resource providers, we have optimised access to up-todate and reliable data. The new version of Expasy was released in October 2020. Since then, an increase of 15% in the number of daily users, compared to the same period last year, is observed. The traffic from mobile devices has increased from 6 to 10%. In the future, the SIB community will strive to provide more resources to Expasy users while ensuring that pre-existing information remains up to date. The new version of Expasy was intended as an evolving web application with a grow-W226 Nucleic Acids Research, 2021, Vol. 49, Web Server issue ing number of high-quality interconnected resources that forms a solid ground on which the life science community can stand. The portal is a gateway to bioinformatics, available to expert users, beginner researchers, teachers, students and more. The new version of Expasy aims to create a satisfactory user experience. Our user-centric approach was successfully applied in all phases of development and will remain our strategy in future implementations. Expasy is available at www.expasy.org. This website is free and open to all users and there is no login requirement. Database resources of the National Center for Biotechnology Information The European Bioinformatics Institute in 2018: Tools, infrastructure and training The bio.tools registry of software tools and data resources for the life sciences BioCatalogue: A universal catalogue of web services for the life sciences A new generation of information retrieval tools for biologists: the example of the ExPASy WWW server ExPASy: SIB bioinformatics resource portal The SIB Swiss Institute of bioinformatics' resources: focus on curated databases On expert curation and scalability: UniProtKB/Swiss-Prot as a case study The neXtProt knowledgebase in 2020: Data, tools and usability improvements STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals The SWISS-MODEL repository-new features and functionality Application of the SwissDrugDesign online resources in virtual screening Bioinformatics meets user-centred design: a perspective Designing an intuitive web application for drug discovery scientists V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data ViralZone: a knowledge resource to understand virus diversity OpenFluDB, a database for human and animal influenza virus COVID-19 scenarios: an interactive tool to explore the spread and associated morbidity and mortality of SARS-CoV-2 Elasticsearch: the definitive guide The ENZYME database in 2000 MyHits: improvements to an interactive resource for analyzing protein sequences UniProt: a worldwide hub of protein knowledge New and continuing developments at PROSITE The cellosaurus, a cell-line knowledge resource Nextstrain: real-time tracking of pathogen evolution GlyConnect: glycoproteomics goes visual, interactive, and analytical EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats Bio-jETI: a framework for semantics-based service composition EMBOSS: the European Molecular Biology Open Software Suite Applying User Experience (UX) methodologies to science: the example of the BioSODA user interface A Mathematical model of the finding of usability problems The law of diminishing returns A literature review of the anchoring effect We are particularly grateful to the SIB-cofounders Ron Appel and Amos Bairoch who created the first version of Ex-PASy almost three decades ago. Expasy would not exist without the resource providers: we take this opportunity to thank them for their immense effort in developing, maintaining and revising the resources. We also want to thank all the people who contributed to the design, implementation and operation of this new version of Expasy. In particular, we thank the participants of the workshop in 2019 and the internal groups at SIB (Core-IT, LTTO and Communications and Scientific Events departments). Lastly, we would like to warmly thank the users who participated in the user testing sessions and those who have been faithful for decades. Supplementary Data are available at NAR Online.