Solving SEO Issues in DSpace-based Digital Repositories: A Case Study and Assessment of Worldwide Repositories ARTICLE Solving SEO Issues in DSpace-based Digital Repositories A Case Study and Assessment of Worldwide Repositories Matúš Formanek INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2021 https://doi.org/10.6017/ital.v40i1.12529 Matúš Formanek (matus.formanek@fhv.uniza.sk) is Assistant Professor in the Department of Mediamatics and Cultural Heritage, Faculty of Humanities, University of Zilina, Slovakia. © 2021. ABSTRACT This paper discusses the importance of search engine optimization (SEO) for digital repositories. We first describe the importance of SEO in the academic environment. Online systems, such as institutional digital repositories, are established and used to disseminate scientific information. Next, we present a case study of our own institution’s DSpace repository, performing several SEO tests and identifying the potential SEO issues through a group of three independent audit tools. In this case study, we attempt to resolve most of the SEO problems that appeared within our research and propose solutions to them. After making the necessary adjustments, we were able to improve the quality of SEO variables by more than 59% compared to the non-optimized state (a fresh installation of DSpace). Finally, we apply the same software audit tools to a sample of global institutional repositories also based on DSpace. In the discussion, we compare the SEO sample results with the average score of the semi-optimized DSpace repository (from the case study) and make conclusions. INTRODUCTION AND STATE OF ART Search engine optimization (SEO) is a crucial part of the academic electronic environment. All their users attempt to process too much information and need to retrieve information fast and effectively. Making academic information findable is essential. Digital institutional repository systems, used to disseminate scientific information, must present their content in ways that make it easy for researchers elsewhere to find. In this paper, we describe work conducted in the Department of Mediamatics and Cultural Heritage at Faculty of Humanities, University of Zilina to improve the discoverability of materials contained within its DSpace institutional repository. In the literature review, we examine definitions of website quality and discuss audit tools. Then, beginning our case study, we describe the tools applied at our institution. We next describe the selection process of a suitable set of testing tools, focused on the optimization of SEO variables of the selected institutional repository running with DSpace software, that will be applied later in the case study. The remainder of the article focuses on the identification and resolution of potential SEO issues using the three independent online tools we selected. We aim to resolve as many problems as possible and compare the level of achieved improvement with the default installation of DSpace 6.3 software which our digital repository is based on. The primary goal is not only to improve the SEO parameters of the discussed system but also to increase the searchability of scientific website content disseminated by DSpace-based digital repositories. Next, we offer insights into worldwide DSpace-based repositories. We will show that DSpace is currently one of the most widely used software packages to support and run digital repositories. Unfortunately, there are many major SEO issues that will be discussed later. The secondary objective of this paper is to use the same set of tools to evaluate the current state of the sample of worldwide digital repositories also based on DSpace. We will provide the report based on our own findings. In the discussion, the SEO score of the optimized DSpace (from th e case study) will be mailto:matus.formanek@fhv.uniza.sk INFORMATION TECHNOLOGY AND LIBRARIES MARCH 2021 SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 2 compared with the results of the current state of SEO parameters from the worldwide DSpace repositories. Finally, our work also carries out many relatively innovative approaches related to digital repositories that have not been extensively debated anywhere in the literature yet. LITERATURE REVIEW To achieve our goal, we started with a review of existing academic papers. Drawing from those papers we describe the current state of academic institutions’ presentation through the Internet and search engines. In this sense, we focus on website optimization. The Internet, as a medium, is still rapidly expanding. A massive amount of data is communicated, shared, and available online, as noted by Christos Ziakos: As a result, billions of websites were created, which made it hard for the average (or even advanced) user to extract useful information from the web efficiently for a specific search. The need for an easier, more efficient way to search for information led to the development of search engines. Gradually, search engines began to assess the relevance of every website on their indexes compared to the queries provided to them by the users. They took into consideration several website characteristics and metrics and calculated the value of each website using complex algorithms. The enormous number of websites being indexed from search engines, along with the increasing competition for the first search results, led to studying and implementing various techniques in order for websites to appear more valuable in search engines.1 That description applies equally to academic websites as well as commercial ones. A review of relevant literature suggests that it is very important for academic institutions to carefully consider and apply website optimization. There were around 28,000 universities worldwide in 2010, according to one study that monitored research in the field of worldwide academic webometrics.2 The actual number of universities seems to be very similar in 2020. Baka and Leyni affirm in their working paper that the success or failure of an academic institution depends on its website: “The work of each university exists only when it encounters and interacts with society. Their popularity with the public is steadily growing.” What is directly connected with the institution’s presence in the World Wide Web.3 Many authors define the term search engine optimization (SEO) as a series of processes that are conducted systematically to improve the volume and quality of traffic from search engines to a specific site by utilizing the working mechanism or algorithm of the search engine. It is a technique of optimization a website’s structure and content to achieve a higher position in search results. The aim is to make increase the website’s ranking in a web search results.4 After an extensive information retrieval in the relevant literature, we can conclude that although SEO is currently a widely discussed topic, there is very little accessible scientific literature related to SEO applications in the field of digital repositories in general, and none at all in the particular subset of DSpace-based repositories. INFORMATION TECHNOLOGY AND LIBRARIES MARCH 2021 SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 3 Website Quality Many authors generally affirm that there is a positive correlation between academic excellence and the complex web presence of an institution. It confirms that website quality is a factor that can give us a predictive or causal relationship with SEO performance.5 Numerous tools could be employed to measure the quality of websites, test them closely and produce an SEO performance ranking websites ability to properly promote their content through the search engines. For example, the Academic Ranking of World Universities (The Shanghai Ranking, http://www.shanghairanking.com) has been established for the top 1,000 universities in the world. The website quality is considered by the authors as the quality of institution’s online presence, its ability to properly promote digital content in search engines and finally, in combination, its overall web presence. According to the Shanghai Ranking list, this is a factor for some “prospective students to decide on whether they will enroll in a specific institute or not. ” 6 A number of recent studies have also attempted to examine the online presence of academic institutions from various points of view. One of the older studies mentioned that the quality of academic websites is very important for students in the process of enrollment.7 Another key aspect is the optimized website performance as well as SEO and website security.8 Audit Tools If we want to perform any optimization, we need an appropriate software tool to check a current website’s ranking. According to G2, the world’s largest technology online marketplace, SEO software is designed to improve the ranking of websites in search engine results pages without paying the search engine provider for placement. These tools provide SEO insights to companies through a variety of different features, helping identify the best strategies to improve a website’s search relevance.9 SEO audit software could be used by SEO specialists or system administrators, as well. Audit software performs one or more of the following functions in relation to SEO: content optimization, keyword research, rank tracking, link building, or backlink monitoring. The software then provides reports on the optimization-related metrics.10 Many authors stress the importance of a holistic approach to SEO factors (24 factors were tested), but it depends on the most effective ones: for example, the quantity and quality of the backlinks, the SSL certificate and so on, which will be described later in this paper.11 The quality of academic websites is very important for researchers, too. They need to disseminate scientific information and communicate it in effective ways. According to some authors, the topic of academic SEO (ASEO) has been gaining attention in recent years.12 ASEO applies SEO principles to the search for academic documents in academic search engines such as Google Scholar and Microsoft Academic. In another scientific paper, ASEO is considered as very similar to traditional SEO, where institutions want to make good use of a SEO to promote digital scientific content on the Internet. Beel, Gipp, and Wilde emphasize the importance for researchers to ensure that their publications will receive a high rank on academic search engines.13 By making good use of ASEO, researchers will have a higher chance of improving the visibility of their publications and have their work read and cited by more researchers. In recent years, digital institutional repositories (as the academic systems) have been used as modern ways of promotion and dissemination of digital scientific objects through the Internet. Digital objects need to reach a wider audience—digital repositories have a form of website interface, interact with students, teachers, or researchers on a daily basis and use the number of citations, articles, theses or other research objects. Institutional repositories are affected by search http://www.shanghairanking.com/ INFORMATION TECHNOLOGY AND LIBRARIES MARCH 2021 SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 4 engines too, so some improvements on repositories’ SEO parameters are needed. These factors contribute to a system’s rankings. SEO on institutional repositories is not considered an absolutely new scientific topic. Kelly stressed eight years ago that Google is critical in driving traffic to repositories. He analyzed results from a survey describing the summarization of SEO findings for the 24 institutional repositories in the United Kingdom. The survey results showed that referring platforms were primarily responsible for driving traffic to those institutional repositories—thanks to many hypertext links in referring domains.14 Since then, SEO analyses of digital repositories have not been a widely discussed topic in the literature. It is a relatively unique topic to discuss SEO on a specific type of digital repository software—DSpace, as the most used and popular software for running digital libraries and repositories.15 Consequently, this paper focuses on that topic since the DSpace-based digital repository is a complex online computer system where some SEO parameters could be adjusted. SEO audit tools help to identify areas of potential adjustments of those website properties that could help produce higher rankings in search engines (and improve the whole system visibility). AUDIT TOOLS SELECTION PROCESS Website variables that affect SEO can be tested using specialized online software tools. This topic is discussed in detail on a semi-professional level on specialized websites that provide a number of recommendations regarding the use of specific tools as well as evaluations of the tools.16 These tools can keep track of changes in many SEO variables. We want to use this approach in our study. However, first we need to choose the appropriate set of these tools. We have found that many SEO audit tools mentioned in professional online sources are narrowly specialized.17 For example, they may be focused only on keyword analysis, backlink analysis (for example, Ahrefs’ Free Backlink Checker), and so on. In our study, we intend to describe a greater number of SEO parameters to monitor rather than emphasize only a few selected ones. We also need tools that are fully available online for free. Based on these criteria, we immediately excluded several tools from the selection, because they provide only austere, simple, or restricted information. Many tools were excluded because they were limited to a single test with the requirement of registration or provision of an email address. A number of testing tools were also available only in paid versions. We wanted a set of tools that focus on several aspects of SEO analyses and evaluate the quality of websites’ SEO variables comprehensively. It is important to add that the selected tools results must be comparable, too. After careful consideration of all possibilities, we finally decided to choose three independent SEO audit tools in order to make the approach more transparent. The selected tools met most of the criteria mentioned above. However, it is very important to note that many other software tools surely meet the criteria and could also be suitable for testing purposes. Based on the scientific literature review, we were not able to identify specific recommendations in this regard; therefore, we have been inspired by the advice offered in the websites and blogs previously mentioned that are focused primarily on SEO. Our tools selection is as follows (listed in alphabetical order): 1. SEO Checker (https://suite.seotesteronline.com/seo-checker ) is part of a complex audit software suite called SEO Tester Online Suite. SEO Checker provides tests in the following categories: base, content, speed, and connections to social media. It tracks, among many other parameters, title coherence, text/code ratio, accessibility of microdata, OpenGraph https://suite.seotesteronline.com/seo-checker INFORMATION TECHNOLOGY AND LIBRARIES MARCH 2021 SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 5 metadata, social plugins, in-page and off-page links, quality of links, mobile friendliness of the page and many other SEO and technical website attributes. Regarding restrictions, only two sites can be tested within a 24-hour period. The limit increases to four sites per day after free registration with a valid email address. Moreover, there is a 14-day trial period during which all hidden functionalities work. In the free version that we used, a complete report can be viewed only, not downloaded or saved. 2. SEO Site Checkup (https://seositecheckup.com/) was selected based on many positive recommendations from the technically oriented expert website Traffic Radius.18 SEO Site Checkup is described as “a great SEO tool that offers more than 40 checks in 6 different categories (common SEO issues like missing metadata, keywords, issues related with absence of connections to social media, semantic web, etc.) to serve up a comprehensive report that you can use to improve results and the website’s organic traffic. It also gives recommendations to fix critical issues in just a few minutes. As a tool, it is very fast and provides in-depth information about the various SEO opportunities and accurate results.”19 SEO Site Checkup is appreciated and recognized as number one among other audit tools ranked by the Geekflare website.20 Another reason we selected this tool for our testing scenario is the fact that the Google search engine will offer a link to this tool as the first after entry the search query “seo testing tool” (excluding paid links). SEO Site Checkup is also the fastest of the selected audit tools, which could be considered as another advantage. Its disadvantages include the ability to test only one website within 24 hours from one public IP address. 3. WooRank (https://woorank.com) is recommended by Traffic Radius: “WooRank offers an in-depth analysis that covers the performance of existing SEO strategies, social media and more. The comprehensive report analysis is classified into eight sections for improved readability quotient, and you may also download the report as branded PDF.”21 WooRank has obtained the third position among the recommended software tools. TrustRadius gives it a score of 9.2 out of 10 and users rate it of 4.67 out of 5 stars based on 51 reviews .22 On the one hand, some results are hidden in the free version, but the final score will be shown. On the other hand, WooRank has no limit to the number of websites tested per day, but it is the slowest of the selected testing tools. We selected these three SEO audit tools because they work independently, their results are comparable to each other, and they offer a quick way to get comprehensive SEO analysis results for a tested site. It should be noted that results of some performed tests are hidden, but there is general guidance on how to fix some issues. However, the solution always depends on the specif ic site and used technology. Using three different tools adds objectivity because we do not rely on just one tool and a one-sided view of the SEO issue. The three selected testers all display results in the same way—test results are always shown as a summarized score in the range of 0 to 100 points (100 represents the best result). A very large set of SEO parameters and technical website properties is evaluated in all three cases. These tests are usually divided into several categories (for example, common SEO issues, performance, security issues, and social media integration). Although similar parameters https://seositecheckup.com/ https://woorank.com/ INFORMATION TECHNOLOGY AND LIBRARIES MARCH 2021 SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 6 are assessed in all three audit tools, there are still some differences between them. Each of the testing tools is unique in a certain area because it also tests a parameter that the others do not deal with or evaluates a website by a different methodology. Still, the fact remains that the evaluated SEO parameters overlap between the tools. We will not overload this paper with detailed information and technical details of individual partial tests, because they can be easily found on the website of the given test tools (SEO Site Checkup, SEO Checker Online, WooRank). We will just mention the common core of main tests: CSS Minification test, Favicon test, Google Search Results Preview test, Google Analytics test, H1 Heading Tags test, HTML Page Size test, Image Alt test, JavaScript Minification Test, JavaScript Error Test, Keywords Usage Test, Meta Description Test, Meta Title test, SEO friendly URL test, Sitemap test, Social Media test, Robots.txt test, URL Canonicalization test, and URL Redirects test. Another specific group consists of tests related to a particular audit tool. Thanks to them we can get a more comprehensive view of the tested area of a website’s SEO characteristics. For example, SEO Checker features the following specific tests: Title Coherence test, Unique Key Words test, H1 Coherence test, H2 Heading Tags test and Facebook Popularity test. WooRank as the second tool extends the basic set of tests with the following: Title tag length test, In-page links test, Off-page links test, Language test, Twitter account test, Instagram account test, Traffic estimations and Traffic rank. Of course, there is also a set of tests that are parts of two audit tools, but the third one does not deal with them since it is specialized in another area. As we have mentioned, the tools offer a list of suggestions for potential improvement of SEO characteristics. The user is informed about an issue, but no instructions or solutions are provided on how to resolve it. The main benefit of this paper lies with its objective to solve specific SEO issues. This work may improve the visibility and searchability of DSpace-based institutional repositories. A set of the three audit tools described above will be used in the following section. We attempt to identify possible SEO issues of the selected institutional repository in the form of a case study. Then we aim to fix the identified SEO issues and increase its quality of SEO parameters as well as demonstrate the potential impact on website traffic caused by performed repairs. All traffic measurements will be based on Google analytics data. THE INSTITUTIONAL REPOSITORY OF THE DEPARTMENT OF MEDIAMATICS AND CULTURAL HERITAGE (SEO CASE STUDY) Background Information An older version of our digital repository (based on DSpace v5.5) was launched by the Department of Cultural Heritage and Mediamatics in April 2017. Now, in 2021, the repository makes available online over 180 digital objects, most of them open access under Creative Commons licenses. The first attempts to create and establish a similar virtual space for digital objects started long ago. Several software solutions had been tested for this purpose—for example, Invenio and Eprints, along with DSpace. According to OpenDOAR’s statistics, Eprints and DSpace have always been the most popular tools for running digital repositories.23 A few years ago, DSpace was chosen as the primary software for running a digital repository. Since then, the usage of open-source software has been raising. For example, Ubuntu server LTS (long term support) is used as an operating system, Tomcat 8 is used as a web server, PostgreSQL INFORMATION TECHNOLOGY AND LIBRARIES MARCH 2021 SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 7 assumes the role of a database system, etc. All of those software components are part of a complex digital system and are orchestrated in a virtual environment that is built on an open-source virtualization solution called XCP-ng (in version 8.2). Some software components have been switched for others during the development period. Based on our experience, the digital repository’s regular visitors were mostly from the staff and students of the department. We initially did not feel a need to improve the visibility of this system to search engines, an oversight that turned out to be a mistake in the long run. We did not perform any search engine optimization on this repository until November 2019, when we coincidentally discovered several scientific articles dealing with SEO in the academic environment. After studying the theoretical background, we initiated the practical application process. We applied theory and our experience with DSpace software into an SEO troubleshooting process within our local repository. Most of the optimizing actions related to solving the major SEO issues were performed before November 10, 2019. We will describe the SEO adjustments we made and derive a list of recommendations for other institutions based on our own experience. Initial Testing of a Clean DSpace 6 Installation In order to formulate any recommendations related to SEO and the administration of DSpace digital repositories, it is important to determine and test a starting point. For this purpose, we chose a clean instance of DSpace v6.3 with an XML user interface (XMLUI)—the latest commonly available stable version. This is the same version that we use in this case study and in our production environment. (A newer version, DSpace 7 Beta 4, was released by Atmire on October 13, 2020).24 No other customization edits were made except a base configuration and necessary URL settings. This installation of DSpace v6.3 has been tested by the same set of tools mentioned previously. The tests we performed are summarized in table 1, where they are divided into four main SEO sections in the first column: common SEO issues, social, speed and security. A test name is shown in the second column. The third column is marked as “Default installation,” where we display the test results on our clean DSpace 6.3 installation. If the tested instance met the criteria of the given test, the green pictogram occurs. When the particular test fails, the red cross is used. The improved state is shown in the fourth column marked as semi-optimized. It is a consequence caused by many important technical changes and SEO issues solving process. Th is issue will be discussed and described later in this paper; however, a short note about the considered issue is displayed in each row. These notes were retrieved by reports on results. We have used the prefix semi- in the last column because we were not able to resolve all detected SEO issues—only most of them. All related reasons will be described briefly in the discussion section. When the improving change between states has been made, we have changed a status pictogram (from the red cross to the green correct tick) and set the row color to yellow. The changes leading to improvement (e.g., the yellow rows) will be discussed in detail later, too. Recall that we have no need to overload the main text of this paper with detailed technical information about partial tests, because it can be easily found on the websites of the given test tools. Table 1 shows the compared results between the non-optimized and semi-optimized states of the DSpace repository. Based on table 1, the default instance of DSpace with basic HTTP and other INFORMATION TECHNOLOGY AND LIBRARIES MARCH 2021 SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 8 default settings received only 58 points out of 100 in SEO Site Checkup, 50.1 points in SEO Checker and 32 points in WooRank. The average final score is 46.7 points out of 100. Although this gained score could be considered as low, the DSpace default instance still meets certain basic criteria of SEO. In addition, many repository administrators usually do not rely only on a default installation, but they make at least some changes in configuration immediately after the initial installation. Inter alia, the first thing to do should be an implementation of HTTPS protocol, adding a connection with Google analytics services and so on. The improved state is shown in the last column of table 1. Whenever we solved an issue, the overall score raised. The semi-optimized repository has obtained a higher score compared to the previous column (default installation). The last column represents the final (however semi- optimized) state of technical and SEO attributes which we were able to reach at this moment. As shown, many SEO issues have been solved. We highlighted them in yellow. On the one hand, some issues remain unsolved. On the other hand, the overall SEO improvement is more than noticeable although the final average gained score has not reached the maximum value (100 points). INFORMATION TECHNOLOGY AND LIBRARIES MARCH 2021 SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 9 Table 1. Comparison of results between the non-optimized and semi-optimized states of DSpace repository. Test name State Default installation (before optimization) Semi-optimized (after a few optimization steps) Meta Title test, Title tag length The title tag is set, but the meta title of the webpage (DSpace Home) has a length of 11 characters. It is too low. The title tag has been set to “Digitálny repozitár Katedry mediamatiky a kultúrneho dedičstva” (note: in Slovak language). Title coherence test The keywords in the title tag are included in the body of the page The title of the page seems optimized. Meta Description test No Meta-description tag is set. Meta-description tag has been set. (121 characters) Google Search Results Preview test “DSpace Home” is too general. The title of the page has been changed. Keywords Usage test The keywords are not included in Title and Meta-description tags. A set of appropriate keywords has been added. Unique key words test The textual content is not optimized on the page. There is an excellent concentration of keywords in the page. This page includes 382 words of which 58 are unique. H1 Heading Tags test 8 H1 tags, 6 H2 tags The H1 tags of the page seem not to be optimized. There are too many H1 tags. H1 Coherence test The keywords present in the tag h1 are included in the body of the page. Some of the keywords of the tag h1 are not included in the body of the page. H2 Heading Tags Test The keywords present in the tag