Solving SEO Issues in DSpace-based Digital Repositories: A Case Study and Assessment of Worldwide Repositories


ARTICLE 

Solving SEO Issues in DSpace-based Digital Repositories 
A Case Study and Assessment of Worldwide Repositories 
Matúš Formanek 

 

INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2021  
https://doi.org/10.6017/ital.v40i1.12529 

Matúš Formanek (matus.formanek@fhv.uniza.sk) is Assistant Professor in the Department of 
Mediamatics and Cultural Heritage, Faculty of Humanities, University of Zilina, Slovakia. © 2021. 

ABSTRACT 

This paper discusses the importance of search engine optimization (SEO) for digital repositories. We 
first describe the importance of SEO in the academic environment. Online systems, such as 
institutional digital repositories, are established and used to disseminate scientific information. Next, 
we present a case study of our own institution’s DSpace repository, performing several SEO tests and 
identifying the potential SEO issues through a group of three independent audit tools. In this case 
study, we attempt to resolve most of the SEO problems that appeared within our research and 
propose solutions to them. After making the necessary adjustments, we were able to improve the 
quality of SEO variables by more than 59% compared to the non-optimized state (a fresh installation 
of DSpace). Finally, we apply the same software audit tools to a sample of global institutional 
repositories also based on DSpace. In the discussion, we compare the SEO sample results with the 
average score of the semi-optimized DSpace repository (from the case study) and make conclusions. 

INTRODUCTION AND STATE OF ART 

Search engine optimization (SEO) is a crucial part of the academic electronic environment. All 
their users attempt to process too much information and need to retrieve information fast and 
effectively. Making academic information findable is essential. Digital institutional repository 
systems, used to disseminate scientific information, must present their content in ways that make 
it easy for researchers elsewhere to find. In this paper, we describe work conducted in the 
Department of Mediamatics and Cultural Heritage at Faculty of Humanities, University of Zilina to 
improve the discoverability of materials contained within its DSpace institutional repository.  

In the literature review, we examine definitions of website quality and discuss audit tools. Then, 
beginning our case study, we describe the tools applied at our institution. We next describe the 
selection process of a suitable set of testing tools, focused on the optimization of SEO variables of 
the selected institutional repository running with DSpace software, that will be applied later in the 
case study. The remainder of the article focuses on the identification and resolution of potential 
SEO issues using the three independent online tools we selected. We aim to resolve as many 
problems as possible and compare the level of achieved improvement with the default installation 
of DSpace 6.3 software which our digital repository is based on. The primary goal is not only to 
improve the SEO parameters of the discussed system but also to increase the searchability of 
scientific website content disseminated by DSpace-based digital repositories. 

Next, we offer insights into worldwide DSpace-based repositories. We will show that DSpace is 
currently one of the most widely used software packages to support and run digital repositories. 
Unfortunately, there are many major SEO issues that will be discussed later. The secondary 
objective of this paper is to use the same set of tools to evaluate the current state of the sample of 
worldwide digital repositories also based on DSpace. We will provide the report based on our own 
findings. In the discussion, the SEO score of the optimized DSpace (from th e case study) will be 

mailto:matus.formanek@fhv.uniza.sk


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 2 

compared with the results of the current state of SEO parameters from the worldwide DSpace 
repositories. 

Finally, our work also carries out many relatively innovative approaches related to digital 
repositories that have not been extensively debated anywhere in the literature yet. 

LITERATURE REVIEW 

To achieve our goal, we started with a review of existing academic papers. Drawing from those 
papers we describe the current state of academic institutions’ presentation through the Internet 
and search engines. In this sense, we focus on website optimization. 

The Internet, as a medium, is still rapidly expanding. A massive amount of data is communicated, 
shared, and available online, as noted by Christos Ziakos: 

As a result, billions of websites were created, which made it hard for the average (or 
even advanced) user to extract useful information from the web efficiently for a 
specific search. The need for an easier, more efficient way to search for information 
led to the development of search engines. Gradually, search engines began to assess 
the relevance of every website on their indexes compared to the queries provided to 
them by the users. They took into consideration several website characteristics and 
metrics and calculated the value of each website using complex algorithms. The 
enormous number of websites being indexed from search engines, along with the 
increasing competition for the first search results, led to studying and implementing 
various techniques in order for websites to appear more valuable in search engines.1 

That description applies equally to academic websites as well as commercial ones. A review of 
relevant literature suggests that it is very important for academic institutions to carefully consider 
and apply website optimization. There were around 28,000 universities worldwide in 2010, 
according to one study that monitored research in the field of worldwide academic webometrics.2 
The actual number of universities seems to be very similar in 2020. 

Baka and Leyni affirm in their working paper that the success or failure of an academic institution 
depends on its website: “The work of each university exists only when it encounters and interacts 
with society. Their popularity with the public is steadily growing.” What is directly connected with 
the institution’s presence in the World Wide Web.3  

Many authors define the term search engine optimization (SEO) as a series of processes that are 
conducted systematically to improve the volume and quality of traffic from search engines to a 
specific site by utilizing the working mechanism or algorithm of the search engine. It is a 
technique of optimization a website’s structure and content to achieve a higher position in search 
results. The aim is to make increase the website’s ranking in a web search results.4  

After an extensive information retrieval in the relevant literature, we can conclude that although 
SEO is currently a widely discussed topic, there is very little accessible scientific literature related 
to SEO applications in the field of digital repositories in general, and none at all in the particular 
subset of DSpace-based repositories. 



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 3 

Website Quality 
Many authors generally affirm that there is a positive correlation between academic excellence 
and the complex web presence of an institution. It confirms that website quality is a factor that can 
give us a predictive or causal relationship with SEO performance.5 Numerous tools could be 
employed to measure the quality of websites, test them closely and produce an SEO performance 
ranking websites ability to properly promote their content through the search engines. For 
example, the Academic Ranking of World Universities (The Shanghai Ranking, 
http://www.shanghairanking.com) has been established for the top 1,000 universities in the 
world. The website quality is considered by the authors as the quality of institution’s online 
presence, its ability to properly promote digital content in search engines and finally, in 
combination, its overall web presence. According to the Shanghai Ranking list, this is a factor for 
some “prospective students to decide on whether they will enroll in a specific institute or not. ” 6 A 
number of recent studies have also attempted to examine the online presence of academic 
institutions from various points of view. One of the older studies mentioned that the quality of 
academic websites is very important for students in the process of enrollment.7 Another key 
aspect is the optimized website performance as well as SEO and website security.8 

Audit Tools 
If we want to perform any optimization, we need an appropriate software tool to check a current 
website’s ranking. According to G2, the world’s largest technology online marketplace, SEO 
software is designed to improve the ranking of websites in search engine results pages without 
paying the search engine provider for placement. These tools provide SEO insights to companies 
through a variety of different features, helping identify the best strategies to improve a website’s 
search relevance.9 SEO audit software could be used by SEO specialists or system administrators, 
as well. Audit software performs one or more of the following functions in relation to SEO: content 
optimization, keyword research, rank tracking, link building, or backlink monitoring. The software 
then provides reports on the optimization-related metrics.10 Many authors stress the importance 
of a holistic approach to SEO factors (24 factors were tested), but it depends on the most effective 
ones: for example, the quantity and quality of the backlinks, the SSL certificate and so on, which 
will be described later in this paper.11 

The quality of academic websites is very important for researchers, too. They need to disseminate 
scientific information and communicate it in effective ways. According to some authors, the topic 
of academic SEO (ASEO) has been gaining attention in recent years.12 ASEO applies SEO principles 
to the search for academic documents in academic search engines such as Google Scholar and 
Microsoft Academic. In another scientific paper, ASEO is considered as very similar to traditional 
SEO, where institutions want to make good use of a SEO to promote digital scientific content on 
the Internet. Beel, Gipp, and Wilde emphasize the importance for researchers to ensure that their 
publications will receive a high rank on academic search engines.13 By making good use of ASEO, 
researchers will have a higher chance of improving the visibility of their publications and have 
their work read and cited by more researchers. 

In recent years, digital institutional repositories (as the academic systems) have been used as 
modern ways of promotion and dissemination of digital scientific objects through the Internet.  

Digital objects need to reach a wider audience—digital repositories have a form of website 
interface, interact with students, teachers, or researchers on a daily basis and use the number of 
citations, articles, theses or other research objects. Institutional repositories are affected by search 

http://www.shanghairanking.com/


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 4 

engines too, so some improvements on repositories’ SEO parameters are needed. These factors 
contribute to a system’s rankings. 

SEO on institutional repositories is not considered an absolutely new scientific topic. Kelly 
stressed eight years ago that Google is critical in driving traffic to repositories. He analyzed results 
from a survey describing the summarization of SEO findings for the 24 institutional repositories in 
the United Kingdom. The survey results showed that referring platforms were primarily 
responsible for driving traffic to those institutional repositories—thanks to many hypertext links 
in referring domains.14 Since then, SEO analyses of digital repositories have not been a widely 
discussed topic in the literature. It is a relatively unique topic to discuss SEO on a specific type of 
digital repository software—DSpace, as the most used and popular software for running digital 
libraries and repositories.15 Consequently, this paper focuses on that topic since the DSpace-based 
digital repository is a complex online computer system where some SEO parameters could be 
adjusted. SEO audit tools help to identify areas of potential adjustments of those website 
properties that could help produce higher rankings in search engines (and improve the whole 
system visibility). 

AUDIT TOOLS SELECTION PROCESS 

Website variables that affect SEO can be tested using specialized online software tools. This topic 
is discussed in detail on a semi-professional level on specialized websites that provide a number 
of recommendations regarding the use of specific tools as well as evaluations of the tools.16 These 
tools can keep track of changes in many SEO variables. We want to use this approach in our study. 
However, first we need to choose the appropriate set of these tools. 

We have found that many SEO audit tools mentioned in professional online sources are narrowly 
specialized.17 For example, they may be focused only on keyword analysis, backlink analysis (for 
example, Ahrefs’ Free Backlink Checker), and so on. In our study, we intend to describe a greater 
number of SEO parameters to monitor rather than emphasize only a few selected ones. We also 
need tools that are fully available online for free. Based on these criteria, we immediately excluded 
several tools from the selection, because they provide only austere, simple, or restricted 
information. Many tools were excluded because they were limited to a single test with the 
requirement of registration or provision of an email address. A number of testing tools were also 
available only in paid versions. We wanted a set of tools that focus on several aspects of SEO 
analyses and evaluate the quality of websites’ SEO variables comprehensively. It is important to 
add that the selected tools results must be comparable, too. 

After careful consideration of all possibilities, we finally decided to choose three independent SEO 
audit tools in order to make the approach more transparent. The selected tools met most of the 
criteria mentioned above. However, it is very important to note that many other software tools 
surely meet the criteria and could also be suitable for testing purposes. Based on the scientific 
literature review, we were not able to identify specific recommendations in this regard; therefore, 
we have been inspired by the advice offered in the websites and blogs previously mentioned that 
are focused primarily on SEO. Our tools selection is as follows (listed in alphabetical order): 

1. SEO Checker (https://suite.seotesteronline.com/seo-checker ) is part of a complex audit 
software suite called SEO Tester Online Suite. SEO Checker provides tests in the following 
categories: base, content, speed, and connections to social media. It tracks, among many 
other parameters, title coherence, text/code ratio, accessibility of microdata, OpenGraph 

https://suite.seotesteronline.com/seo-checker


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 5 

metadata, social plugins, in-page and off-page links, quality of links, mobile friendliness of 
the page and many other SEO and technical website attributes.  
 
Regarding restrictions, only two sites can be tested within a 24-hour period. The limit 
increases to four sites per day after free registration with a valid email address. Moreover, 
there is a 14-day trial period during which all hidden functionalities work. In the free 
version that we used, a complete report can be viewed only, not downloaded or saved.  

2. SEO Site Checkup (https://seositecheckup.com/) was selected based on many positive 
recommendations from the technically oriented expert website Traffic Radius.18 SEO Site 
Checkup is described as “a great SEO tool that offers more than 40 checks in 6 different 
categories (common SEO issues like missing metadata, keywords, issues related with 
absence of connections to social media, semantic web, etc.) to serve up a comprehensive 
report that you can use to improve results and the website’s organic traffic. It also gives 
recommendations to fix critical issues in just a few minutes. As a tool, it is very fast and 
provides in-depth information about the various SEO opportunities and accurate results.”19 
 
SEO Site Checkup is appreciated and recognized as number one among other audit tools 
ranked by the Geekflare website.20 Another reason we selected this tool for our testing 
scenario is the fact that the Google search engine will offer a link to this tool as the first 
after entry the search query “seo testing tool” (excluding paid links). SEO Site Checkup is 
also the fastest of the selected audit tools, which could be considered as another advantage. 
Its disadvantages include the ability to test only one website within 24 hours from one 
public IP address. 

3. WooRank (https://woorank.com) is recommended by Traffic Radius: “WooRank offers an 
in-depth analysis that covers the performance of existing SEO strategies, social media and 
more. The comprehensive report analysis is classified into eight sections for improved 
readability quotient, and you may also download the report as branded PDF.”21 WooRank 
has obtained the third position among the recommended software tools. TrustRadius gives 
it a score of 9.2 out of 10 and users rate it of 4.67 out of 5 stars based on 51 reviews .22 
 
On the one hand, some results are hidden in the free version, but the final score will be 
shown. On the other hand, WooRank has no limit to the number of websites tested per day, 
but it is the slowest of the selected testing tools. 

We selected these three SEO audit tools because they work independently, their results are 
comparable to each other, and they offer a quick way to get comprehensive SEO analysis results 
for a tested site. It should be noted that results of some performed tests are hidden, but there is 
general guidance on how to fix some issues. However, the solution always depends on the specif ic 
site and used technology. 

Using three different tools adds objectivity because we do not rely on just one tool and a one-sided 
view of the SEO issue. The three selected testers all display results in the same way—test results 
are always shown as a summarized score in the range of 0 to 100 points (100 represents the best 
result). A very large set of SEO parameters and technical website properties is evaluated in all 
three cases. These tests are usually divided into several categories (for example, common SEO 
issues, performance, security issues, and social media integration). Although similar parameters 

https://seositecheckup.com/
https://woorank.com/


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 6 

are assessed in all three audit tools, there are still some differences between them. Each of the 
testing tools is unique in a certain area because it also tests a parameter that the others do not 
deal with or evaluates a website by a different methodology. Still, the fact remains that the 
evaluated SEO parameters overlap between the tools. 

We will not overload this paper with detailed information and technical details of individual 
partial tests, because they can be easily found on the website of the given test tools (SEO Site 
Checkup, SEO Checker Online, WooRank). We will just mention the common core of main tests: 
CSS Minification test, Favicon test, Google Search Results Preview test, Google Analytics test, H1 
Heading Tags test, HTML Page Size test, Image Alt test, JavaScript Minification Test, JavaScript 
Error Test, Keywords Usage Test, Meta Description Test, Meta Title test, SEO friendly URL test, 
Sitemap test, Social Media test, Robots.txt test, URL Canonicalization test, and URL Redirects test. 

Another specific group consists of tests related to a particular audit tool. Thanks to them we can 
get a more comprehensive view of the tested area of a website’s SEO characteristics. For example, 
SEO Checker features the following specific tests: Title Coherence test, Unique Key Words test, H1 
Coherence test, H2 Heading Tags test and Facebook Popularity test. WooRank as the second tool 
extends the basic set of tests with the following: Title tag length test, In-page links test, Off-page 
links test, Language test, Twitter account test, Instagram account test, Traffic estimations and 
Traffic rank. Of course, there is also a set of tests that are parts of two audit tools, but the third one 
does not deal with them since it is specialized in another area. 

As we have mentioned, the tools offer a list of suggestions for potential improvement of SEO 
characteristics. The user is informed about an issue, but no instructions or solutions are provided 
on how to resolve it. The main benefit of this paper lies with its objective to solve specific SEO 
issues. This work may improve the visibility and searchability of DSpace-based institutional 
repositories. 

A set of the three audit tools described above will be used in the following section. We attempt to 
identify possible SEO issues of the selected institutional repository in the form of a case study. 
Then we aim to fix the identified SEO issues and increase its quality of SEO parameters as well as 
demonstrate the potential impact on website traffic caused by performed repairs. All traffic 
measurements will be based on Google analytics data. 

THE INSTITUTIONAL REPOSITORY OF THE DEPARTMENT OF MEDIAMATICS AND CULTURAL 
HERITAGE (SEO CASE STUDY)  

Background Information 
An older version of our digital repository (based on DSpace v5.5) was launched by the Department 
of Cultural Heritage and Mediamatics in April 2017. Now, in 2021, the repository makes available 
online over 180 digital objects, most of them open access under Creative Commons licenses. The 
first attempts to create and establish a similar virtual space for digital objects started long ago. 
Several software solutions had been tested for this purpose—for example, Invenio and Eprints, 
along with DSpace. According to OpenDOAR’s statistics, Eprints and DSpace have always been the 
most popular tools for running digital repositories.23  

A few years ago, DSpace was chosen as the primary software for running a digital repository. Since 
then, the usage of open-source software has been raising. For example, Ubuntu server LTS (long 
term support) is used as an operating system, Tomcat 8 is used as a web server, PostgreSQL 



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 7 

assumes the role of a database system, etc. All of those software components are part of a complex 
digital system and are orchestrated in a virtual environment that is built on an open-source 
virtualization solution called XCP-ng (in version 8.2). Some software components have been 
switched for others during the development period. 

Based on our experience, the digital repository’s regular visitors were mostly from the staff and 
students of the department. We initially did not feel a need to improve the visibility of this system 
to search engines, an oversight that turned out to be a mistake in the long run. We did not perform 
any search engine optimization on this repository until November 2019, when we coincidentally 
discovered several scientific articles dealing with SEO in the academic environment. After 
studying the theoretical background, we initiated the practical application process. We applied 
theory and our experience with DSpace software into an SEO troubleshooting process within our 
local repository. Most of the optimizing actions related to solving the major SEO issues were 
performed before November 10, 2019. We will describe the SEO adjustments we made and derive 
a list of recommendations for other institutions based on our own experience.  

Initial Testing of a Clean DSpace 6 Installation  
In order to formulate any recommendations related to SEO and the administration of DSpace 
digital repositories, it is important to determine and test a starting point. For this purpose, we 
chose a clean instance of DSpace v6.3 with an XML user interface (XMLUI)—the latest commonly 
available stable version. This is the same version that we use in this case study and in our 
production environment. (A newer version, DSpace 7 Beta 4, was released by Atmire on October 
13, 2020).24 No other customization edits were made except a base configuration and necessary 
URL settings. This installation of DSpace v6.3 has been tested by the same set of tools mentioned 
previously. 

The tests we performed are summarized in table 1, where they are divided into four main SEO 
sections in the first column: common SEO issues, social, speed and security. A test name is shown 
in the second column. The third column is marked as “Default installation,” where we display the 
test results on our clean DSpace 6.3 installation. If the tested instance met the criteria of the given 
test, the green pictogram occurs. When the particular test fails, the red cross is used.  

The improved state is shown in the fourth column marked as semi-optimized. It is a consequence 
caused by many important technical changes and SEO issues solving process. Th is issue will be 
discussed and described later in this paper; however, a short note about the considered issue is 
displayed in each row. These notes were retrieved by reports on results. 

We have used the prefix semi- in the last column because we were not able to resolve all detected 
SEO issues—only most of them. All related reasons will be described briefly in the discussion 
section. When the improving change between states has been made, we have changed a status 
pictogram (from the red cross to the green correct tick) and set the row color to yellow. The 
changes leading to improvement (e.g., the yellow rows) will be discussed in detail later, too. 

Recall that we have no need to overload the main text of this paper with detailed technical 
information about partial tests, because it can be easily found on the websites of the given test 
tools. 

Table 1 shows the compared results between the non-optimized and semi-optimized states of the 
DSpace repository. Based on table 1, the default instance of DSpace with basic HTTP and other 



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 8 

default settings received only 58 points out of 100 in SEO Site Checkup, 50.1 points in SEO Checker 
and 32 points in WooRank. The average final score is 46.7 points out of 100. Although this gained 
score could be considered as low, the DSpace default instance still meets certain basic criteria of 
SEO. In addition, many repository administrators usually do not rely only on a default installation, 
but they make at least some changes in configuration immediately after the initial installation. 
Inter alia, the first thing to do should be an implementation of HTTPS protocol, adding a 
connection with Google analytics services and so on. 

The improved state is shown in the last column of table 1. Whenever we solved an issue, the 
overall score raised. The semi-optimized repository has obtained a higher score compared to the 
previous column (default installation). The last column represents the final (however semi-
optimized) state of technical and SEO attributes which we were able to reach at this moment. As 
shown, many SEO issues have been solved. We highlighted them in yellow. On the one hand, some 
issues remain unsolved. On the other hand, the overall SEO improvement is more than noticeable 
although the final average gained score has not reached the maximum value (100 points). 

  



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 9 

Table 1. Comparison of results between the non-optimized and semi-optimized states of DSpace 
repository. 

Test name State 
 Default installation (before 

optimization) 
Semi-optimized (after a few 
optimization steps) 

Meta Title test, Title 
tag length 

 
The title tag is set, but the meta 
title of the webpage (DSpace 
Home) has a length of 11 
characters. It is too low. 

 
The title tag has been set to 
“Digitálny repozitár Katedry 
mediamatiky a kultúrneho 
dedičstva” (note: in Slovak 
language). 

Title coherence test 
 

The keywords in the title tag are 
included in the body of the page 

 
The title of the page seems 
optimized. 

Meta Description test 
 

No Meta-description tag is set. 
 

Meta-description tag has been set. 
(121 characters) 

Google Search Results 
Preview test 

 
“DSpace Home” is too general. 

 
The title of the page has been 
changed. 

Keywords Usage test 
 

The keywords are not included in 
Title and Meta-description tags. 

 
A set of appropriate keywords has 
been added. 

Unique key words test 
 

The textual content is not 
optimized on the page. 

 
There is an excellent concentration 
of keywords in the page. 
 
This page includes 382 words of 
which 58 are unique. 

H1 Heading Tags test 
 

8 H1 tags, 6 H2 tags 
 

The H1 tags of the page seem not 
to be optimized. There are too 
many H1 tags. 

H1 Coherence test 
 

The keywords present in the tag 
h1 are included in the body of the 
page. 

 
Some of the keywords of the tag h1 
are not included in the body of the 
page. 

H2 Heading Tags Test 
 

The keywords present in the tag 
<h1> are included in the body of 
page. 

 



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 10 

Test name State 
 Default installation (before 

optimization) 
Semi-optimized (after a few 
optimization steps) 

Language test 
 

Detected: Slovak 
Declared: Missing 

 
A missed language tag has been 
implemented. 

Robots.txt test 
 

No “robots.txt” file has been found. 
 

“Robots.txt” file has been enabled. 
Sitemap test 

 
No sitemap has been found. 

 
Sitemap has been enabled. 

SEO friendly URL test 
 

Webpage contains URLs that are 
not SEO friendly! 

 
Webpage contains URLs that are 
not SEO friendly. 

Image Alt test 
 

The webpage does not use “img” 
tags. It is optimized. 

 

Inline CSS test 
 

The webpage uses inline CSS 
styles. 

 
The webpage uses inline CSS 
styles. 

Deprecated HTML 
Tags test 

 
The webpage does not use HTML 
deprecated tags. 

 

Google Analytics (GA) 
test 

 
GA is not in use. 

 
GA has been implemented. 

Favicon test 
 

Default DSpace favicon is used. 
 

The favicon has been customized. 
JS Error test 

 
No severe JavaScript errors were 
detected. 

 
No severe JavaScript errors were 
detected. 

Social Media test 
 

No connection with social media 
has been detected. 

 
The website is successfully 
connected with social media (using 
Facebook). 

Facebook account test 
 

 
 

Information about Facebook page 
has been added by schema.org 
metadata. 

Facebook popularity 
 

(low) 
 

The webpage is promoted enough 
on Facebook. 



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 11 

Test name State 
 Default installation (before 

optimization) 
Semi-optimized (after a few 
optimization steps) 

Twitter account test 
 

No connection with Twitter has 
been detected. 

 
Information about Twitter account 
has been added by schema.org 
metadata. 

Twittercard test 
 

No twittercard is implemented. 
 

Metainformation about twittercard 
has been added by OpenGraph 
metadata. 

Instagram account 
test 

 
No connection with Instagram has 
been detected. 

 
Information about Instagram 
account has been added by 
schema.org metadata. 

Microdata 
(OpenGraph, 
Schema.org) test 

 
There is no microdata or 
OpenGraph/schema.org metadata 
on the website. 

 
Some OpenGraph and schema.org 
matadata has been added. 

HTML Page Size test 
 

The size of the page is excellent. 
(23.65 KB) 

 
The size of the page is excellent. 
(28.84 KB) 

Text/code ratio test 
 

10.71% (excellent) 
 

15.45% (excellent) 
HTML 
Compression/GZIP 

 
(no compression is enabled) The 
size of HTML could be reduced up 
to 79%. 

 
The webpage is successfully 
compressed using gzip 
compression 
on your code. Your HTML is 
compressed with 78% size savings. 

Site Loading Speed 
test 

 
Loading time is around 1.86s 

 
Loading time is around 2.39s 

Page Objects test 
 

The webpage has fewer than 20 
http requests. 

 
The webpage has fewer than 20 
http requests. 

Page Cache test 
(server-side caching) 

 
The pages are not cached. 

 
The pages are not cached. 

Flash test 
 

Website does not include flash 
objects. 

 



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 12 

Test name State 
 Default installation (before 

optimization) 
Semi-optimized (after a few 
optimization steps) 

CDN Usage test 
 

Your webpage is not serving all 
resources (images, javascript and 
css) from CDNs. 

 
Your webpage is not serving all 
resources (images, javascript and 
css) from CDNs. 

Image, Javascript, CSS 
Caching tests 

 
Data are not cached. 

 
Data are not cached. 

Javascript 
Minification test 

 
Javascripts are not minified. 

 
JavaScript files’ minification has 
been enabled in Tomcat 
configuration. 

CSS Minification test 
 

Some of your webpage’s CSS 
resources are not minified. 

 
Some of your webpage’s CSS 
resources are not minified. 

Nested Tables test 
 

The webpage does not use nested 
tables. 

 

Frameset test 
 

The webpage does not use frames. 
 

Doctype test 
 

The website has a valid doctype 
declaration. 

 
 

URL redirects test 
 

1 URL redirect has been detected. 
It is acceptable. 

 

URL Canonicalization 
test 

 
The webpage URLs are not 
canonized. 

 
https://repozitar.kmkd.uniza.sk/x
mlui and 
https://www.repozitar.kmkd.uniz
a.sk/xmlui 
should resolve to the same URL, 
but currently do not. 

Canonical Tag test 
 

No canonical tag has been 
detected. 

 
The webpage is using a canonical 
link tag. 

HTTPS test 
 

Website is not SSL secured. 
 

HTTPS has been implemented. 



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 13 

Test name State 
 Default installation (before 

optimization) 
Semi-optimized (after a few 
optimization steps) 

Safe Browsing test 
 

No malware or phishing activity 
found. 

 

Server signature test 
 

Server self-signature for HTTPS is 
off. 

 

Directory Browsing 
test 

 
Server has disabled directory 
browsing. 

 

Plaintext Emails test 
 

The webpage does not include 
email addresses in plain text. 

 

Mobile friendliness 
(includes tap targets, 
no plugin content, 
font size legibility, 
mobile viewport) 

 
The webpage is optimized for 
mobile visitors. 

 
 

SEO Site Checkup final 
score 

58/100 81/100 

SEO Checker online 
final score 

50.1/100 78.0/100 

WooRank final score 32/100 65/100 
Average final score 46.7/100 74.66/100 

 

Resolving Major SEO Issues 

This section will look at how we resolved the major SEO issues that the tools detected. This is the 
key technical part because most of mentioned issues highlighted in table 1 were solved and 
described. The following technical and SEO adjustments have been implemented and tested in 
order to improve the average final score by 59.87% (by 27.96 points, from 46.7 to 74.66 points)—
comparing the fresh installation of DSpace against the semi-optimized one. 

All the following solution procedures are based on our own experience, experiments, and research 
carried out in the area of digital repositories and their optimization as virtual spaces. During the 
solving process, we follow the order of issues stated in table 1 and describe them in more details 
in the DSpace v6.3 environment and an XML user interface (XMLUI). The following procedures 
may differ slightly if you are using a different version of DSpace or another graphic interface (for 
example JSPUI). Examples of code are given in monospaced font. 

Title, description, and keywords tags in a website header 

This criterion requires filling in the specific metadata (e.g., metacontent) fields in the page’s HTML 
code. The search engines process them automatically to find out what the website is about. 



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 14 

To solve these SEO issues change a website title (in default “DSpace Home”) located in the 
language translations config files in the folder path. 
/dspace/webapps/xmlui/i18n/messages_en.xml.  
Find the appropriate key and change the value. All content in this file is fully customizable. 

Next, edit DSpace’s page structure config file (in path 
/themes/Mirage/lib/xsl/core/page-structure.xsl)  
in order to add the metadata content: 

• a meta-description tag  
• a keywords tag  
• an author tag 

with a carefully selected content and length just below the main <head> tag, as shown in the 
example: 

<head> 

<meta content="The Digital Repository receives, stores, indexes, preserves and disseminates 

the digital content created by the Department of Media and Cultural Heritage” 

name="description" /> 

 

<meta name="keywords" content="digital repository, The Department of Mediamatics and Cultural 

heritage, KMKD, MKD, Faculty of humanities, digital archive, DSpace"/> 

 

<meta content="Matúš Formanek" name="author" /> 

Note: Do not forget the termination characters />. The keywords should be included in title and 
meta–description tags. 

Several other SEO parameters are affected by performing those steps, for example, Google Search 
Results Preview test, Keywords Usage test, Unique key words test and keywords concentration 
test. 

Language declaration 
The language declaration is very important for search engines to identify the primary language of 
the website content. 

If a declared language is missing in a website, you can define it by adding the following line into 
the page-structure.xsl file (the process is similar to adding keywords and description tag as 
explained above). Edit the page-structure.xsl file (with VIM or another text editor, for example) 
and add a statement like the following above the main <head> tag: 

<html xml:lang="sk" xmlns="http://www.w3.org/1999/xhtml"></html> 

Note: “sk” is an abbreviation for “Slovak language” as stated in W3 namespaces. More information 
is available at https://www.w3.org/TR/xml/ . 

Google Analytics, robots.txt and sitemap implementation 

The connection between a website and Google Analytics services enables Google Analytics to track 
users’ behavior and understand as they interact with this site. It is the basis of web analysis. The 
“robots.txt” and “sitemap.xml” files are simple text files which are required for search engines to 
specify the website structure and additional information about it. 

https://www.w3.org/TR/xml/


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 15 

To enable Google Analytics services, insert a UA code identifier (ID is a string), obtained from 
Google Analytics, into the DSpace.cfg config files located in the DSpace home folder. In that file find 
the key/row named “xmlui.google.analytics.key=” and insert the corresponding UA identifier there. 
Next, it is needed to uncomment the row with the key  
“xmlui.controlpanel.activity.max = 250” in the same “DSpace.cfg” file. 

Finally, uncomment the row below in the “xmlui.xconf”file located in the path 
/DSpace/config/ and restart the Tomcat service: 

<aspect name="StatisticsGoogleAnalytics" path="resource://aspects/StatisticsGoogleAnalytics/" 

/> 

The “robots.txt” file is commonly used and enabled in DSpace, but many SEO audit tools are not 
able to detect it successfully because this file is located in path other than the expected default one. 
To enable robots.txt file detection, copy the file 
/DSpace/webapps/xmlui/static/robots.txt to the root of the Tomcat folder (usually 

located in path /var/lib/tomcat8/webapps/ROOT). Finally, restart the Tomcat web service. 

A sitemap for a currently running DSpace instance is available in the “robots.txt” file mentioned 
above. Edit this file and set an appropriate URL for the sitemap location. 

Enabling connections with social media 

This criterion detects a hyperlink (or other metadata) connection between a website and popular 
social media, such as Facebook, Twitter, etc. The primary goal is to promote the digital content.  

This subsection deals with social media connections with a DSpace-based repository. A simple 
creation of a profile or a site on a social network related to a repository is considered an essential 
example of good practice. However, an appropriate form of connection between sites must be 
created, too. Naturally, further endorsement of this system through social networks is another key 
step. 

Social media-oriented tests are performed by each SEO audit tool nowadays. The detected 
connection with social media could have a big impact on the site’s popularity, as well as on the 
gained SEO final score. There are many ways how to establish these connections: 

Connection with Facebook, Instagram and Twitter—a direct link from the homepage, for 
example: To add a link to a Facebook site profile, edit the page-structure file 
(/DSpace/webapps/xmlui/themes/Mirage/lib/xsl/core/page-structure.xsl) 
just below a div tag with ID “ds-footer-wrapper”. For example: 

<div id="ds-footer-wrapper"> 

  <div id="ds-footer"> 

  <div id="ds-footer-left"> 

    <a href=https://www.facebook.com/digitalnyrepozitar/ target="_blank">Facebook page</a> 

  </div> 

A direct link to other media could be similarly added, too, if needed. After accomplishing this 
procedure, the test for social media (e.g., Facebook) passes correctly. However, it should be done 
by adding microdata or other inserted structured metadata into the html code. Please, do not 
forget to add the appropriate links in the schema.org metadata, too (see below). It is advisable to 
periodically promote the repository through the posts on Facebook and other social media.  

https://www.facebook.com/digitalnyrepozitar/


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 16 

Twitter optimization through a Twittercard insertion—Some audit tools perform a specific 
test focused on a presence of so-called Twittercard HTML markup on the website to optimize 
future Tweets. Only one card type per-page is supported. Here is an example for Twittercard 
consisting of four parts (card, title, image, description). 

After customizing the following code to reflect your institution’s information, insert the following 
code just below the <head> tag in the page-structure.xsl file:  

<meta name="twitter:card" content="Digital repository of The Department of Mediamatics and 

cultural heritage"/> 

<meta name="twitter:title" content=" Digital repository of The Department of Mediamatics and 

cultural heritage "/> 

<meta name="twitter:image" content="https://opensenselabs.com/sites/default/files/inline-

images/Screenshot%202018-07-02%2019.25.45.png"/> 

<meta name="twitter:description" content=" The Digital Repository receives, stores, indexes, 

preserves and disseminates the digital content created by the Department of Media and 

Cultural Heritage "/> 

Note: The URL of the twitter:image must be absolute. 

Placing a repository link into other (especially high-ranked) websites is also highly recommended. 
The gained score as well as website traffic will surely rise by implementing those links. 

OpenGraph protocol integration 

This criterion refers to the presence of specific metadata object (OpenGraph element) in the 
website. It is very important for website objects’ visibility in social networks. Many website 
objects could be described through OpenGraph metadata protocol tags.25 

The WooRank audit tool verifies whether OpenGraph tags on your webpage have been detected or 
not. OpenGraph protocol adoption is a way to enable the integration of any website with social 
media or other platforms. You will be able to control how your websites are presented when they 
(e.g., their links) are shared across social media with metadata stored in OpenGraph protocol tags 
(all documentation is available at https://ogp.me/). 

To adopt the main OpenGraph elements, insert the following code (updating it for your 
institution) just below the <head> tag in the page-structure.xsl file: 

<meta property="og:title" content="Digital repository of The Department of mediamatics and 

cultural heritage"/> 

<meta property="og:url" content="https://repozitar.kmkd.uniza.sk/xmlui"/> 

<meta property="og:description" content=" The Digital Repository receives, stores, indexes, 

preserves and disseminates the digital content created by the Department of Media and 

Cultural Heritage "/> 

<meta property="og:locale" content="sk_SK"/> 

<meta property="og:locale:alternate" content="en_US"/> 

<meta property="og:type" content="website"/> 

<meta property="og:image:type" content="text/html"/> 

<meta property="og:image" content="https://opensenselabs.com/sites/default/files/inline-

images/Screenshot%202018-07-02%2019.25.45.png"/> 

Note: The URL of the tag “og:image” must be absolute. 

Structured data integration (schema.org) 
This criterion analogously deals with the presence of objects described by another standard for 
structured metadata, schema.org:  

https://ogp.me/


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 17 

Schema.org is a collaborative, community activity with a mission to create, maintain, and 
promote schemas for structured data on the Internet. Schema.org uses vocabularies that 
can be used with many different encodings, including RDFa, Microdata and JSON -LD.26  

Google, Microsoft. and others already use these vocabularies to power rich, extensible experiences 
with a shared collection of schemas.27 A lot of information included in a DSpace repository can be 
described by schema.org vocabulary. There are three main ways how to do that—through JSON-
LD, RDFa, and Microdata. 

JSON-LD is a JavaScript notation embedded in a <script> tag in the page head or body. The 
markup is not interleaved with the user-visible text, which makes nested data items easier to 
express, such as the Country of a PostalAddress of a MusicVenue of an Event. Also, Google 
can read JSON-LD data when it is dynamically injected into the page's contents. 

RDFa is an HTML5 extension that supports linked data by introducing HTML tag attributes 
that correspond to the user-visible content that you want to describe for search engines. 
RDFa is commonly used in both the head and body sections of the HTML page. 

Microdata is an open-community HTML specification used to nest structured data within 
HTML content. Like RDFa, it uses HTML tag attributes to name the properties you want to 
expose as structured data. It is typically used in the page body, but can be used in the head. 28 

Google recommends using JSON-LD for structured data whenever possible.29 

We also recommend using JSON-LD (an extension of original JSON suitable for linking data) 
vocabulary to express additional information about entities related to a DSpace repository, for 
example information about the organization and many other entities. 

To enter schema.org elements expressed in JSON-LD in order to increase the searching impact of a 
DSpace instance on the search engines, insert a script like following into the page-structure.xsl file 
(/DSpace/webapps/xmlui/themes/Mirage/lib/xsl/core/page-structure.xsl) 

just below a div tag marked with ID “ds-footer-wrapper”. 

<script type="application/ld+json"> 

{ 

  "@context" : "http://schema.org", 

  "@type" : "EducationalOrganization", 

  "name" : "Digital repository of The Department of mediamatics and cultural heritage", 

 "department" : "The Department of mediamatics and cultural heritage",  

 "url" : "https://repozitar.kmkd.uniza.sk/xmlui", 

 "sameAs" : [ 

   “https://www.facebook.com/digitalnyrepozitar/” 

   "https://www.instagram.com/katedramkd/", 

   "https://twitter.com/MediamatikaKD" 

 

   ], 

  "address": { 

    "@type": "PostalAddress", 

    "streetAddress": "Univerzitna 8215/1", 

    "addressRegion": "Zilina", 

    "postalCode": "01026", 

    "addressCountry": "SK" 

  } 

} 

</script> 

https://www.facebook.com/digitalnyrepozitar/


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 18 

Many elements like the one above could be added, too. See the documentation available at 
https://schema.org/. A free online tool for testing structured data is available from Google at 
https://search.google.com/structured-data/testing-tool?hl=en. 

Reducing repository website size during transfer 
The primary goal of this criterion is to measure a reduction of website size which is conducted by 
the enabled compression of website code parts. 

This reduction can be achieved by enabling compression methods for HTML and other file formats 
when they are transmitted from a server to a client. The Tomcat webserver (which is an essential 
website component for DSpace repositories) allows turning on GZIP compression and so -called 
JavaScript minification. 

To enable GZIP on the Tomcat webserver, edit the Tomcat’s configuration file “server.xml” located 
in its home directory. Under the tag “<Service name="Catalina">” edit a corresponding connector 
tag so it looks like the following example. Changes in the code are shown in bold: 

<Connector port="8080" maxHttpHeaderSize="8192" 

           maxThreads="150" minSpareThreads="25" maxSpareThreads="75" 

           enableLookups="false" redirectPort="8443" acceptCount="100" 

           connectionTimeout="20000" disableUploadTimeout="true" 

           compression="on" 

           compressionMinSize="1024" 

           noCompressionUserAgents="gozilla, traviata"  

compressableMimeType="text/html,text/xml,text/plain,text/javascript,text/css,application/

x-javascript,application/javascript"/> 

The CompressableMimeType contains the formats you want to compress. 

Important note: If you deal with HTTPS (and corresponding port number 443 instead 8080), you 
must set the options stated above into the corresponding connector (443), too. Otherwise, the 
compression will be enabled only in simple HTTP (running on port number 8080). 

JavaScript minification can be enabled in a “DSpace.cfg” configuration file located usually in a 
DSpace home directory (/DSpace/config/). Change the key value from false to true in the 

following rows: 

xmlui.theme.enableMinification = true 

xmlui.theme.enableConcatenation = true 

Setting a canonical link 
This requirement deals with the presence of a canonical link used by search engines. “A canonical 
link is included in the HTML code of a webpage to indicate the original source of content. This 
markup is used to address SEO problems with duplicate content which arise when different pages 
with different URLs contain identical or nearly identical content.”30 The problem with duplicated 
content can arise, for example, when a webpage is accessible with or without a www prefix in its 
URL or a webpage is accessible via HTTP and HTTPS protocols. “For SEO purposes, the canonical 
link shows Google and other search engines which URL corresponds to the original source of 
content and should be shown in search results. It is added as a meta tag to every URL version of a 
given webpage and indicates the canonical URL.”31  

https://schema.org/
https://search.google.com/structured-data/testing-tool?hl=en


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 19 

After a necessary customization, insert the following row just below the <head> tag in DSpace’s 
page-structure.xsl file 
(/DSpace/webapps/xmlui/themes/Mirage/lib/xsl/core/page-structure.xsl) : 

<link rel="canonical" href="https://repozitar.kmkd.uniza.sk/xmlui/" /> 

HTTPS adoption 
The adoption of HTTPS is required for a secure data transfer. This criterion inspects if HTTPS is 
enabled and what quality it displays.  

HTTPS is an essential component that supports website security for sites available via the 
Internet. We pointed out the importance of HTTP adoption in a DSpace respiratory interface in our 
previous research papers.32  

Firstly, you should prepare a file called the Certificate Signing Request, or CSR, that will be used by 
the Certificate Authority of your choice to generate the Certificate SSL. The process of HTTPS 
configuration on the Tomcat webserver (used natively in DSpace repositories) is widely described 
online (for example available at https://www.mulesoft.com/tcat/tomcat-ssl). Secondly, you 
should configure a corresponding connector for HTTPS (port 443) in Tomcat´s configuration file. 
We strongly recommend following those instructions and to use DSpace instance only with 
HTTPS, among other major security risks, because dealing with simple HTTP has surely a very 
negative impact on SEO final score. Google and other search engines strongly prefer websites with 
HTTPS enabled. 

Discussion about the SEO Issues Solving Process 
In the previous subsections, we have offered solutions of selected major SEO issues that can be 
relatively easily resolved in systems based on DSpace and its website technologies. However, in 
practice, it is unrealistic to expect a 100% optimization level and final solutions for all detected 
problems. Therefore, we intentionally did not mark the second state of the system (shown in 
table 1) as fully optimized but only semi-optimized. Some of the issues we detected remain 
unsolved despite all our efforts. There are several reasons. One of the most important of them is 
the fact that DSpace software, like many complex systems, cannot be easily modified without 
programming experience. Therefore, resolving some complicated issues is beyond the scope of 
this article. Another significant reason is that we lacked knowledge about some issues at the time 
of writing this paper and therefore we could not solve them. This situation creates an opportunity 
for further research and proposals for solutions of unsolved issues in this specific area, which the 
professional public would certainly like to welcome. 

Taken together, it could be said that the changes we have made, helped to objectively increase the 
average SEO score by 59 percent compared to the default installation. All the successfully 
performed actions improved the search results of our repository and rapidly increased its.  

We suppose that all related SEO actions can affect website traffic. Most major issues discussed in 
this case study were resolved before November 10, 2019. Therefore, we prepared an analysis of 
the repository traffic which involved 30-day period before and after this date (one from October 
11 until the change, the other from the change until December 10, 2019). We determined the 
impact of performed SEO actions on website traffic. The results are satisfactory because the 
number of established relations has significantly raised. The impact of organic search (through 
Google, for example) has increased traffic by 47.67% (from 86 to 127 sessions). The number of so-

https://www.mulesoft.com/tcat/tomcat-ssl


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 20 

called referral sessions (sessions initiated from social media and other referral sites) has 
increased by 193.75% (from 16 to 47 sessions). Users spent much more time on the website and 
viewed more pages on average (an increase of up to 159%). 

We view the significant traffic increase as a proof that the SEO changes we implemented helped to 
promote use of the digital repository’s content. 

In the next section, we want to compare the quantitative improvement of SEO parameters, which 
we have been able to achieve to this point, with the results achieved in global testing of worldwide 
DSpace-based repositories by the same set of tests. Next, we can easily compare the results gained 
in the local case study with the current state determined in the worldwide area of DSpace 
repositories. 

TESTING SEO PARAMETERS OF WORLDWIDE DSPACE-BASED REPOSITORIES 

There are several thousand digital repositories around the world. Most of them (41.1% according 
to ROAR registry data and over 39% according to the OpenDOAR registry) are based on DSpace 
software.33 Therefore, we also focus our research exclusively on DSpace-based repositories in this 
study.  

As we have pointed out in the methodology, the second objective of this paper is to briefly 
describe a current state of SEO parameters related to worldwide DSpace-based digital 
repositories. Next, we will discuss the comparison of results obtained from the case study and 
exploration of worldwide repositories. 

Methodology 
According to the facts stated above, we would like to know more details about the quality of SEO 
parameters related to worldwide repositories running with DSpace. We decided to use one of the 
two most authoritative registries of digital repositories: the Registry of Open Access Repositories 
(ROAR) and the Directory of Open Access Repositories (OpenDOAR). ROAR is hosted at the 
University of Southampton in the United Kingdom and is available online at 
http://roar.eprints.org/. OpenDOAR is available at https://v2.sherpa.ac.uk/opendoar/. Both are 
quality-assured global directories of academic open access repositories. They “enable the 
identification, browsing and search for repositories, based on a range of features, such as location, 
software or type of material held.”34 We decided to utilize the ROAR registry as the source for a 
sample list because it is possible to filter systems based on specific criteria. We applied these three 
filters on March 11, 2020: any country, any repository type, and DSpace software. We downloaded 
the raw data in a text/CSV file with 1,977 records. Each record had a separate row for each 
repository. Each row has a sequence number and includes many columns with much additional 
information. Only a few columns were necessary for our purpose—the columns marked as “title” 
and “home_page”. Other columns were removed. All changes in the list were performed using 
Microsoft Excel. 

For further evaluation, we selected a random sample from this file. We used a sample size online 
calculator (available at https://www.calculator.net/sample-size-calculator.html ) to do that. We 
had set the following values for statistical parameters: 

  

http://roar.eprints.org/
https://v2.sherpa.ac.uk/opendoar/
https://www.calculator.net/sample-size-calculator.html


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 21 

• Population size: 1,977 (the total count of DSpace repositories in ROAR) 
• Confidence level: 95%  
• Margin of error: 10% 

A sample size of 92 was automatically calculated for these values of statistical parameters. Next, 
we used a random number generating function integrated in Excel (randbetween(1,1977)) that 
generated 92 random numbers from the strictly defined range. Each randomly generated number 
corresponds with the matching row number in the table of repositories downloaded from the 
ROAR. We could choose 92 DSpace repositories for testing purposes. In this way, objectivity in the 
selection of the research sample was ensured. 

We also tested the sample for duplicate entries, to ensure that no repository was selected twice. 
We had to do so, because the random generating function does not guarantee that only unique 
integer values will be generated. Figure 1 shows the distribution histogram of randomly generated 
values from 1 to 1,977. 

 

Figure 1. Distribution histogram of randomly generated values. 

Then, we attempted to test each of 92 selected repositories with three audit tools. The results are 
discussed in the next section. 

  



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 22 

Test Results 
Table 2 shows a part of the table with results. This table does not contain any URLs or titles to 
ensure anonymity, however we can provide this information upon request. A second-level domain 
name is only displayed in each row as well as the corresponding scores gained in tests. The 
maximum value is 100 points in every case. The rows in the table were sorted by the calculated 
average score from high to low. Many rows are omitted due to the table length (one row for the 
each from 92 repositories). The last tested repository has a sequence number equal to 65. The 
repositories with a higher sequence number have no gained score (N/A state), due to 
inaccessibility. 

Table 2. Test results 

The 
repository 
sequence 
number 

The first 
and second-
level 
domain 
name 

SEO Site 
Checkup 

SEO 
Checker 

WooRank Average 

1 econstor.eu 76 65.9 69 70.30 

2 
datadryad.or
g 

73 54.9 54 60.63 

3 edu.ar 66 54.5 61 60.50 

4 cuni.cz 60 52.8 65 59.27 

5 edu.co 65 55.5 56 58.83 

. . . . . . . . . . . . . . . . . . 

65 ac.cn 36 21.7 33 33.23 

66 
Scholarporta
l.info 

N/A N/A N/A N/A 

. . . . . . . . . . . . . . . . . . 

89 org.br N/A N/A N/A N/A 

90 mapfig.com N/A N/A N/A N/A 

91 edu.ec N/A N/A N/A N/A 

92 edu.co N/A N/A N/A N/A 

Average score gained from 
particular tests 

53.47 48.08 49.22 50.26 

Standard deviation 9.31 9.29 10.27 9.62 

Median 54 46.7 52 50.90 

Modus 52 40 54 48.67 

The testing process started on March 11, 2020, and finished on April 6, 2020. It took a lot of time, 
because we were limited by the reuse restrictions (described above) in the audit tools’ free 
accounts. These restrictions meant that only a few tests could be performed daily even though we 
used several public IP addresses to speed up the overall testing process. 

Among other items, we identified a startling problem related to nonfunctional repository URLs. 
Thirty one out of 92 tested repositories were unavailable between March and April 2020 (in table 
2, they are shown with N/A status). On April 6, 2020, at the end of testing period, we tried to test 
the unavailable systems once again. Four of them had become functional, so the final number of 



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 23 

really tested repositories rose to 65 (out of 92). The remaining 27 (29.35 percent of the total) 
repositories were still offline or unavailable. It is possible that the URLs stated in ROAR’s records 
have been out-of-date. N/A values were ignored in all calculations and had no impact on the final 
average score or other statistical parameters. Only 65 fully functional DSpace-based worldwide 
repositories were involved and were used for testing purposes. 

For better visualization of the partial as well as summarized results, we have decided to use a 
graph instead of a table. Figure 2 shows the results of 65 repositories sorted by an average gained 
score (from highest to lowest) that was calculated from three partial scores gained in SEO Site 
Checkup, SEO Checker and WooRank testing tools. So, there are three corresponding partial 
discrete values (colored dots) shown for each repository in figure 2. The calculated average score 
for each one is marked in red color. The red dotted line provided the most valuable results for this 
partial section. 

 

Figure 2. Results of 65 involved repositories. 

The repositories that gained a higher score (e.g., better SEO results) are, logically, situated on the 
left side of figure 2. On the right side are systems with lower scores. Non-functional systems (N/A) 
are not displayed at all. 

The underlying frequency distribution graph of average score (the red dotted line in the previous 
figure) is available in figure 3. 



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 24 

 

Figure 3. Underlying frequency distribution graph of average score 

Based on the submitted results manifested in the previous figures we can make the conclusion 
with a relatively high degree of reliability: 

The large part of DSpace-based repositories registered in ROAR (over 29%) were unavailable at 
the time of writing the article. It is alarming, because ROAR is still considered as an authoritative 
registry for open access repositories and should not contain any invalid data. 

An average score of functional repositories, gained during the testing period, is very similar 
between audit tools: 53.47 points in SEO Site Checkup, 48.08 points in SEO Checker and 49.22 
points in WooRank. Standard deviations of population are comparable, too. Finally, most of the 
tested repositories (19) gained a score from the interval (55.60) as is shown in fig ure 3; however, 
the average SEO score of all tested DSpace-based repositories was only 50.26 points out of 100 
(data from March/April 2020), which is an adequate value for a relatively low level of search 
engine optimization of those systems. 

Results and Discussion 
We have obtained complete insights on the SEO parameters of worldwide DSpace-based digital 
repositories in the previous section. Now, we can compare this data with the results gained during 
the case study solving process described above. The situation is briefly pointed out in table 3. 

  



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 25 

Table 3. Comparison of fresh installation, semi-optimized installation and average worldwide score 

 SEOsitechechup 
SEO 

checker 
Woorank 

Total 
average 

score 

Calculated 
improvement 

(%) 

Fresh DSpace 
installation 

58 50.1 32 46.7 

100 

- a reference 
point 

Semi-optimized state 
of the Institutional 
Repository of The 
Department of 
Mediamatics and 
Cultural Heritage 

81 78 59 74.66 +59.87 

The average score of 
worldwide DSpace-
based repositories 

53.47 48.08 49.22 50.26 +7.62 

 

Based on table 3, it is proposed that the fresh, non-optimized DSpace obtained a slightly worse 
score than the worldwide average. Although a few SEO issues still remain in our semi-optimized 
DSpace instance, the state of SEO parameters is much better than the score gained in any other 
discussed cases. If we considered a fresh DSpace installation as a reference point (100 percent), 
the improvement level would be shown in the last column of table 3. Semi-optimized DSpace 
offers an improvement up to 59.87% compared to fresh (non-optimized) DSpace installation. 
There is no significant difference (up to 7.62%) in SEO quality between the worldwide average 
repository and non-optimized DSpace instance. The results they have obtained are very similar. 

As we have mentioned at the beginning of our paper, a higher score obtained in tests is not the 
primary objective. The main goal is to improve visibility and the content searchability of digital 
repositories, as well as to improve their security and ways of promotion through the social/new 
media. 

CONCLUSION 

This study exposed a serious research in the field of digital repositories running DSpace 
software—as the most popular tool for this purpose. We have shown that significant SEO 
improvement of more than 59% can be achieved thanks to a few simple modifications within the 
DSpace configuration and associated used application layers (Tomcat webserver, etc.). Some of 
those technical optimization steps can be performed in a relatively simple way, using previously 
described solving procedures and a wide theoretical background. We have publicly presented the 
reports and solving explanations of the most common and major SEO problems that DSpace 
repositories usually face. This paper is one of the first academic studies to deal with SEO issues 
related to digital repositories, especially those that are running DSpace software. We realize that 
we have not been able to solve all of the identified problems completely. Therefore, the following 
SEO issues remain unresolved: 

  



INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 26 

• H1 Heading Tags test 
• H1 Coherence test 
• SEO friendly URL test 
• Inline CSS test 
• Page Cache test (Server-side Caching) 
• CDN Usage test 
• Image, Javascript, CSS Caching tests 
• CSS Minification test 
• URL Canonicalization test 

Some of these could probably be solved more easily than others; however, the system URL cannot 
be changed without difficulty to be considered as SEO friendly. 

In conclusion, all of the above presents a great opportunity for further discussions and research in 
this field. The current state of SEO parameters related to DSpace repositories has been presented 
as unsatisfactory, as shown in the test results. Conclusively, the results of our research indicate 
that there is a small difference in SEO quality between the average results obtained by global, 
worldwide DSpace repositories and the non-optimized installation of DSpace v6.3 (the difference 
is approximately 7% in global repositories’ favor). It seems that the most of these systems are not 
currently optimized in terms of SEO and other technical website parameters. The second major 
finding indicates that the metadata records stored in the ROAR are not always accurate and may 
be incorrect or obsolete. In order to make this finding more objective we must note that the 
ROAR’s storage had a major failure, which could lead to the harvesting service failing. (More 
information about the failure is available at http://roar.eprints.org/.) 

Finally, we recommend periodically re-testing the level of search engine optimization on digital 
repositories. The “search engine algorithms tend to change often, and new factors are added while 
outdated or not effective factors are excluded. This is why web developers must check the 
algorithm changes and adjust their websites in order to not only achieve but also maintain high 
ranking in search engines.”35  

We believe that our work will also contribute to the initiation of cooperation among other experts 
in order to resolve remaining SEO problems. Ultimately, we hope that all presented efforts and 
recommendations will help repository administrators, users, scientists, researchers, teachers as 
well as students and other members of the general public to find what they need in the virtual 
spaces like digital repositories more quickly and efficiently. 

ENDNOTES 
 

1 Christos Ziakis et al., “Important Factors for Improving Google Search Rank,” Future Internet 11, 
no. 2 (January 2019): 2–3, https://doi.org/10.3390/fi11020032. 

2 F. Insidro Aguillo et al., ”Comparing university rankings,” Scientometrics 85 (February 2010): 
243–56, https://doi.org/10.1007/s11192-010-0190-z. 

3 Ahmad Bakeri Abu Baka and Nur Leyni, ”Webometric Study of World Class Universities 
Websites,” Qualitative and Quantitative Methods in Libraries (July 2017): 105–15, http://qqml-
journal.net/index.php/qqml/article/view/367; Andreas Giannakoulopoulos et al., ”Academic 

 

http://roar.eprints.org/
https://doi.org/10.3390/fi11020032
https://doi.org/10.1007/s11192-010-0190-z
http://qqml-journal.net/index.php/qqml/article/view/367
http://qqml-journal.net/index.php/qqml/article/view/367


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 27 

 

Excellence, Website Quality, SEO Performance: Is there a Correlation?” Future Internet 11, no. 
11 (November 2019): 242, https://doi.org/10.3390/fi11110242. 

4 Dwi Budi Santoso, “Pemanfaatan Teknologi Search Engine Optimazion sebagai Media untuk 
Meningkatkan Popularitas Blog WordPress,” Dinamik 14, no. 2 (2009): 12–33, 
https://www.unisbank.ac.id/ojs/index.php/fti1/article/view/100; M. Iskandar and D. 
Komara, “Application Marketing Strategy Search Engine Optimization (SEO),” IOP Conference 
Series: Materials Science and Engineering 407 (2018), 
https://iopscience.iop.org/article/10.1088/1757-899X/407/1/012011/pdf. 

5 Giannakoulopoulos et al., ”Academic Excellence, Website Quality, SEO Performance.” 

6 Giannakoulopoulos et al., ”Academic Excellence, Website Quality, SEO Performance.” 

7 Thomas Abrahamson, “Life and Death on the Internet: To Web or Not to Web is No Longer a 
Question,” Journal of College Admission 168 (2000): 6–11. 

8 Sukhpuneet Kaur, Kulwant Kaur, and Parminder Kaur, “An Empirical Performance Evaluation of 
Universities Website,” International Journal of Computer Applications 146, no. 15 (July 2016): 
10–16, https://doi.org/10.5120/ijca2016910922. 

9 “Best SEO Software,” G2, last modified 2020, https://www.g2.com/categories/seo. 

10 “Best SEO Software.” 

11 Ziakis et al., “Important Factors for Improving Google Search Rank.” 

12 Giannakoulopoulos et al., “Academic Excellence, Website Quality, SEO Performance.” 

13 Joeran Beel, Bela Gipp, and Eric Wilde, “Academic Search Engine Optimization (ASEO): 
Optimizing Scholarly Literature for Google Scholar & Co.,” Journal of Scholarly Publishing 41, 
no. 2 (January 2010): 176–90, http://dx.doi.org/10.3138/jsp.41.2.176. 

14 Brian Kelly, “MajesticSEO Analysis of Russell Group University Repositories,” UK Web Focus 
(blog), August 29, 2012, http://ukwebfocus.wordpress.com/2012/08/29/majesticseo-
analysis-of-russell-group-university-repositories/. 

15 “OpenDOAR Statistics,” Jisc, last modified September 2020, 
https://v2.sherpa.ac.uk/view/repository_visualisations/1.html. 

16 Si Ong Quan, “44 Best Free SEO Tools (Tried & Tested),” last modified May 28, 2020, 
https://ahrefs.com/blog/free-seo-tools/; Navneet Kaushal, ”Top 15 Most Recommended SEO 
Tools,” last modified September 2020, https://www.pagetraffic.com/blog/top-15-most-
recommended-seo-tools/. 

17 Quan, “44 Best Free SEO Tools (Tried & Tested)”; Kaushal, “Top 15 Most Recommended SEO 
Tools.” 

 

https://doi.org/10.3390/fi11110242
https://www.unisbank.ac.id/ojs/index.php/fti1/article/view/100;%20M
https://iopscience.iop.org/article/10.1088/1757-899X/407/1/012011/pdf
https://doi.org/10.5120/ijca2016910922
https://www.g2.com/categories/seo
http://dx.doi.org/10.3138/jsp.41.2.176
http://ukwebfocus.wordpress.com/2012/08/29/majesticseo-analysis-of-russell-group-university-repositories/
http://ukwebfocus.wordpress.com/2012/08/29/majesticseo-analysis-of-russell-group-university-repositories/
https://v2.sherpa.ac.uk/view/repository_visualisations/1.html
https://ahrefs.com/blog/free-seo-tools/
https://www.pagetraffic.com/blog/top-15-most-recommended-seo-tools/
https://www.pagetraffic.com/blog/top-15-most-recommended-seo-tools/


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2021 

SOLVING SEO ISSUES IN DSPACE-BASED DIGITAL REPOSITORIES | FORMANEK 28 

 

18 “28 Top SEO Site Checkup Tools,” Traffic Radius, accessed March 29, 2020, 
https://trafficradius.com.au/seo-site-checkup-tools/. 

19 “28 Top SEO Site Checkup Tools,” Traffic Radius. 

20 Chandan Kumar, “13 Online Tools to Analyse Website SEO for Better Search Ranking,” last 
modified April 11, 2020, https://geekflare.com/online-tool-to-analyze-seo/#SEO-Tester-
Online. 

21 “28 Top SEO Site Checkup Tools,” Traffic Radius. 

22 Kumar, “13 Online Tools to Analyse Website SEO for Better Search Ranking.” 

23 “OpenDOAR Statistics,” Jisc. 

24 “DSpace 7.0 Beta 4 Release Announcement,” Lyrasis, October 13, 2020, 
https://duraspace.org/dspace-7-0-beta-4-release-announcement/. 

25 “The Open Graph protocol,” OGP, accessed January 25, 2021, https://ogp.me/.  

26 “Welcome to Schema.org,” Schema, accessed May 1, 2020, https://schema.org/. 

27 “Welcome to Schema.org.” 

28 “Understand how structured data works,” Google, accessed May 2, 2020, 
https://developers.google.com/search/docs/guides/intro-structured-data. 

29 “Understand how structured data works,” Google. 

30 “Canonical Tag,” Seobility, accessed March 20, 2020, 
https://www.seobility.net/en/wiki/Canonical_Tag. 

31 “Canonical Tag,” Seobility.  

32 Matus Formanek and Martin Zaborsky, “Web Interface Security Vulnerabilities of European 
Academic Repositories,” LIBER Quarterly 27, no. 1 (February 2017): 45–57, 
http://doi.org/10.18352/lq.10178; Matus Formanek, Vladimir Filip, and Erika Sustekova, “The 
Progress of Web Security Level Related to European Open Access LIS Repositories between 
2016 and 2018,” JLIS.it 10, no. 2 (May 2019): 107–15, http://dx.doi.org/10.4403/jlis.it-12545. 

33 “OpenDOAR Statistics,” Jisc. 

34 “About OpenDOAR,” Jisc, last modified September 2020, https://www.jisc.ac.uk/opendoar. 

35 Ziakis et al., “Important Factors for Improving Google Search Rank,” 2. 

https://trafficradius.com.au/seo-site-checkup-tools/
https://geekflare.com/online-tool-to-analyze-seo/#SEO-Tester-Online
https://geekflare.com/online-tool-to-analyze-seo/#SEO-Tester-Online
https://ogp.me/
https://schema.org/
https://developers.google.com/search/docs/guides/intro-structured-data
https://www.seobility.net/en/wiki/Canonical_Tag
http://doi.org/10.18352/lq.10178
http://dx.doi.org/10.4403/jlis.it-12545
https://www.jisc.ac.uk/opendoar

	Abstract
	Introduction and State of Art
	Literature Review
	Website Quality
	Audit Tools

	Audit Tools Selection Process
	The Institutional Repository of the Department of Mediamatics and Cultural Heritage (SEO Case Study)
	Background Information
	Initial Testing of a Clean DSpace 6 Installation
	Resolving Major SEO Issues
	Title, description, and keywords tags in a website header
	Language declaration
	Google Analytics, robots.txt and sitemap implementation
	Enabling connections with social media
	OpenGraph protocol integration
	Structured data integration (schema.org)
	Reducing repository website size during transfer
	Setting a canonical link
	HTTPS adoption

	Discussion about the SEO Issues Solving Process

	Testing SEO parameters of Worldwide DSpace-based repositories
	Methodology
	Test Results
	Results and Discussion

	Conclusion
	Endnotes