Balancing Community and Local Needs: Releasing, Maintaining, and Rearchitecting the Institutional Repository


ARTICLE 

Balancing Community and Local Needs 
Releasing, Maintaining, and Rearchitecting the Institutional Repository 
Daniel Coughlin 

 
INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2022  
https://doi.org/10.6017/ital.v41i1.14073 

Daniel Coughlin (dmc186@psu.edu) is Head Libraries Strategic Technologies, Penn State 
University. © 2022. 

ABSTRACT 

This paper examines the decision points over the course of ten years of development of an 
institutional repository. Specifically, the focus is on the impact and influence from the open-source 
community, the needs of the local institution, the role that team dynamics plays, and the chosen 
platform. Frequently, the discussion revolves around the technology stack and its limitations and 
capabilities. Inherently, any technology will have several features and limitations, and these are 
important in determining a solution that will work for your institution. However, the people running 
the system and developing the software, and their enthusiasm to continue work within the existing 
software environment in order to provide features for your campus and the larger open-source 
community, will play a bigger role than the technical platform. These lenses are analyzed through 
three points in time: the initial roll out of our institutional repository, our long-term running and 
maintenance, and eventual new development and why we made the decisions we made at each of 
those points in time. 

THE INSTITUTIONAL REPOSITORY (IR) 

A university institutional repository (IR) provides long-term access to the scholarship and 
research outputs of an institution.1 The outputs can be in the form of scholarly publications, data 
sets to support publications or other research, electronic theses and dissertations, and other 
digital assets that have value to the university to preserve and to the research community an d 
beyond to disseminate. There is additional value in keeping these otherwise scattered resources 
collected in a single repository to showcase the scholarly accomplishments of an institution.2 
There is value to the university to collect and disseminate the scholarly outputs of the university 
to understand the strengths of the university and promote that research to outside audiences, 
attract new faculty, and provide opportunities for new faculty where fields may be emergent or 
void of an institutional presence. Furthermore, there is value to the research community to be able 
to find peer research without having to pay publisher access fees.  

Reducing the burden on faculty to meet various policy demands from a federal, publisher, and 
institutional perspective provides another motivation for IRs. Federal policies can require making 
anonymized research data and scholarship publicly available because it is publicly funded through 
tax dollars; publishers can make authors provide access to the data that supports the research that 
is being published.3 In the United States, a growing number of academic institutions, from 2005 to 
2021, have adopted an open-access policy that requires researchers to provide a copy of any 
published scholarly article in a publicly accessible repository. The institutional repository is a way 
for a university to meet this increasing demand from research organizations and funding 
institutions for their researchers.4 

As the size of a campus grows in disciplines, it inherently grows in complexity and a diversity of 
digital needs and use cases from its researchers. For example, high-resolution images or 

mailto:dmc186@psu.edu


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 2 

atmospheric data are likely to create a higher demand in storage needs than a discipline that relies 
largely on text. Performance-based research may require multimedia resources and streaming 
capabilities while other large files can be shared in a more asynchronous manner. The diversity of 
needs contributes to the complexity of finding a solution for an institutional repository that meets 
all, or many, of the needs on a campus from a file storage, discovery, and access perspective.  

This paper broadly addresses Penn State University’s development of its IR at three distinct points 
in time: (1) choosing a platform for our IR and its initial release; (2) maintaining an IR; and finally, 
(3) our current solution nearly 10 years later. At each point in time, we analyze our decision 
process through four lenses. These lenses provided a thorough examination for us to decide on 
how to proceed; they are community needs and potential tension that exists with local needs, our 
team dynamics, and finally the platform we built our software on and the infrastructure required 
to maintain it. We discuss why we made the decisions we made through these four lenses, the 
benefits and drawbacks, and what we have learned along the way. Penn State is the state of 
Pennsylvania’s land grant university in the United States. The University has 24 campuses 
physically located in the Commonwealth of Pennsylvania, the World Campus which is online, two 
law schools, and a medical school. In the fall of 2021, Penn State had 73,476 enrolled 
undergraduate students and 13,873 graduate students, with research expenditures totaling over 
$850 million for the last four years.5 Penn State is a large, public research institution with a 
diverse set of needs. This is significant because when the university is considering developing a 
large system such as an institutional repository, we need to meet the needs of a broad set of 
disciplines and domains. We are fortunate enough to have software developer and system 
administration resources that smaller institutions may not have. This provides a bit of context into 
our considerations for an institutional repository. 

SELECTING A REPOSITORY 

In January 2012, Penn State University Libraries and Penn State’s central Information Technology 
department collaborated on developing an institutional repository for the University’s growing 
data management needs. The University Libraries was interested in becoming more involved in 
open-source software community development efforts. At that point, many universities that we 
had spoken with had an existing IR solution in place, and we had a lot of freedom to choose a 
platform without the burden of data migration. We considered investigating (1) off-the-shelf, 
turnkey solutions such as DSpace, (2) a prototype we had just built called Curation Architecture 
Prototype Services (CAPS) using a microservices approach, or (3) building on top of an existing 
platform. Ultimately, we decided to build on top of an existing platform, Samvera (named Hydra at 
the time).6 We did not want a turnkey solution, because we felt that we had distinct needs that 
would require a level of customization that these solutions would not be able to offer. Based on 
discussions with others, we decided to develop something of our own. We wanted to leverage the 
experience of others in the repository development domain. The microservices approach at the 
time was more of a conceptual approach towards development than an existing software solution. 
The ability to build on an existing platform was a happy middle ground for us and we evaluated 
this decision through several lenses that led us to our selection at that time. 

Community Involvement 
We did not want to develop a solution in a vacuum and thought a group with a (relatively) 
common set of problems would be helpful to problem solve. The Samvera community was a small 
but growing community working towards repository solutions like what we were trying to 


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 3 

achieve. Members of the community were both managerial and technical. This was valuable to us 
for understanding the strategic direction for the community and the ability to collaborate and 
problem solve on technical implementations. Some of the key partners for our early work were 
University of Hull (UK), Stanford University, University of Virginia, and Notre Dame. There was 
communication throughout the year over community email, chat platforms, and phone calls; 
however, the quarterly partner meetings were the most valuable time for collaboration. These 
quarterly meetings were a couple days in length, typically at a partner institution’s campus 
(physically) attended by managers and software/systems developers. This provided the ability to 
work together on specific problems, showcase our work, and get to know each other more closely 
at lunch and after-hours meetups. Working within the community would also get our team 
increased exposure and help with recruiting future colleagues. Working in the open-source 
software community has been seen to benefit both candidates and employers in future job 
recruitment.7 We were excited by the promise of working with and contributing toward a larger 
community. Our team had apprehension about building this alone, and we were happy to be 
working with the support of a community and within their set of processes. 

Local Needs 
Early on in our requirements for the repository we created a MOSCOW chart that provided our 
“must have,” “should have,” “could have,” and “won’t have” features.8 The platform we were 
choosing was going to provide us with a significant set of these features for our repository with 
very little work on our end. These features were built in and included search, discovery, and basic 
file edit functionality. Essentially, we were going to quickly meet the needs of our stakeholders by 
using this software. This was important for a couple of reasons. First, providing features to our 
stakeholders quickly gave them ample time to provide feedback so that we could make necessary 
customizations for their specific needs. A less quantitative benefit was gaining the trust of 
colleagues at the start of a new project and new initiative. Rather than continually suggesting “that 
feature will be done next week,” we were able to deliver results quickly and get feedback. For 
example, our repository integrated with our campus authentication system, restricting access. We 
were able to deliver these features and get feedback on both the functionality as well as 
terminology to improve the usability. In particular, the way our developers described permissions 
was initially too confusing for our users and we were able to make necessary adjustments prior to 
a production release of the IR. 

Team Dynamics 
We believed it was a significant professional development opportunity for many people on the 
team to work with a larger community and learn from and with those in the open-source 
community. The team working on the IR consisted of three full-time, or near full-time, developers 
(one joining after we started the project), and a systems administrator. This project was our first 
large project that included a project manager invested in Agile project management 
methodologies and with a systems administrator in place at the beginning of the project.  

Platform and Infrastructure Stability 
There was a desire to get to a common solution to easily set up other repositories for various 
needs within the Libraries and we hoped there would be an ability to plug and play various 
components or features. The three common components of this system were Fedora Commons to 
store both metadata and our digital assets; Solr as an index for fast search; and Blacklight as a web 
interface that sits on top of Solr. One of the primary components, active-fedora, would sync 
content between the Fedora and Solr persistence layers. Our hope was that with this model, we 


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 4 

would be able to write code that could be used in other repositories, and we could use the code 
that other institutions had written for our repository needs to build other applications more 
quickly. 

The Samvera community was initially called Hydra because of the relationship with the mythical 
creature that has several heads (see figure 1). We were considering the potential of running a core 
storage infrastructure and discovery infrastructure, while developing several heads for our 
various applications. We knew this was a lofty expectation, but also thought that it was a good 
design principle for us to advance. Additionally, the pilot that we developed on microservices 
(CAPS) seemed to have a relatively large storage service and we could not determine how to get 
away from that. Although this was a bit of a shift in our philosophy, it was less of a shift based on 
our practical experience. 

 
Figure 1. Aspirational intentions of running many applications on one access and discovery system. 

Initial Release 
The initial release of our IR, ScholarSphere, was for research data, scholarly articles, and 
presentations. We considered the repository file agnostic and left the definitions of scholarly 
materials up to the depositor. The self-deposit process made very few assumptions to limit the 
barriers to deposit—there were few mandated fields for deposit in ScholarSphere.  

The initial rollout of ScholarSphere had met the “must have” and many of the “should have” needs 
that we had defined initially in our development requirements. The list of “must haves” included 
upload files via the web, create and assign metadata to the uploaded files, set three access levels to 
the files, search for files, display files, etc. The list of “should haves” were faceted browse, faceted 
search results, share files with a group, etc. The benefit of working on a community-developed 
platform provided some of these features for us (search, faceted browse), and gave us the 
flexibility to customize where necessary. For example, we had our own data model of metadata to 
assign to files based on our users’ needs. We were able to update the existing metadata that was 


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 5 

provided out of the box, to accommodate that. This was a tremendous win for us to leverage 
community-provided solutions and local needs. Additionally, the platform provided a search index 
with Solr. This enabled our infrastructure to have a common solution with community support on 
configuration questions. Using the Blacklight UI on top of Solr created another opportunity for us 
to customize where desired and ease of development efforts. 

Community: Following the initial release, we worked with other members of the community to 
pull out some of the core functionality and place it in a separate Ruby library. This library (Sufia) 
could then be leveraged as a default set of repository features for other developers. The release of 
a new IR, and this library, provided us with a lot of positive exposure at various community 
events. 

Local Needs: Locally, we used this library to develop a repository for our digital archives. It 
previously took two to three developers nearly nine months to develop ScholarSphere; however, 
we used the Sufia module to roll out a separate repository in six months with a single developer. 
This was another successful production rollout and a successful use of a product created by and  
for the community.  

Team Dynamics: We had a successful release and were getting support for new developers to 
hire. We continued to move more of our projects toward an Agile approach and permanently 
embed systems administrators into our development projects.  

Infrastructure: We had not released a new system for archives on the exact same system that 
ScholarSphere was developed on, but we were happy that our projects were relatively 
homogeneous technology stacks and provided a familiarity to run.  

MAINTAINING THE IR 

Over the next several years we released three major updates to ScholarSphere: 

1. Migrating the data object store to a major version 
2. Overhauling the user interface  
3. Migrating our data model to the Portland Common Data Model (PCDM) 

Simultaneously, the Sufia library that we developed had also grown in usage by other institutions 
and contributions from other developers. We were excited to have additional contributors, and 
with that came an understandable sense of competing priorities within our community’s 
development roadmap. We were building ScholarSphere features and functionality to meet the 
needs of our local institution and managing the tension between community direction and local 
needs. Again, we look at these lenses as evaluating the period during maintenance, upgrades, and 
feature adds.  

Community Involvement 
Two of the major releases mentioned above were largely community driven. In one case—
migrating the data object store—we were one of the initial repositories within our open-source 
community to migrate our data storage system. We anticipated that doing this work early would 
prevent us from having to rewrite any code that relied on the data storage layer. Ultimately, this 
may have been a bit early for us, because we never were able to create the momentum for others 
in the community to make this same migration. This created a bit of a divergence, but at this layer 
in our technology it did not prevent us from continuing to work closely with the community.  


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 6 

We were able to add locally developed features for managing files and uploads, community 
components that allowed for controlled vocabularies, and Cloud provider uploads.9 In all, from 
2012 to 2019 we were an active member of the community: we provided technical contributions, 
we were being asked to present at community events, and our developers were frequently asked 
to help at several workshops. The community provided many opportunities for professional 
development and code from the community provided new features to our users. We felt this work 
was successful.  

We had three major releases. One was something that our local users were able to experience 
directly. Two of our upgrades were largely on the back end and, while there is no argument on 
their importance, it can be a challenge to illustrate the significance of largely opaque technology 
upgrades to users. 

Concurrently, we were coming up against other challenges that were proving difficult to solve in a 
sustainable and scalable way. Large file size (larger than 1 GB) for uploads and downloads 
remained an issue that researchers seemed to be encountering more frequently. Our mechanisms 
for getting around some of these obstacles led us to looking at an API for administrators and other 
applications to integrate. For example, if the web browser upload was not working, perhaps we 
could physically get the file from a user and upload it to the system ourselves. If we could do that, 
maybe we could use an API to do this upload, but we did not have an API.  

When developing new features, we would question if it should be code to contribute back to the 
community or only for our (Penn State) needs. Frequently, the devil is in the details and , while 
several institutions were interested in a feature based on a conversation, implementation could be 
much more detailed and it was difficult to find common ground. This complexity could lead to 
longer timelines and more difficult planning for local development features. 

Team Dynamics 
Over this time period we advanced our team by adding several highly skilled developers (some of 
whom have now moved on to other positions and remain highly respected within the community), 
and enriched the collective skill set of the group. The team was enriched by this experience 
overall. 

The balance between community involvement and local needs became a frequent conversation 
point for our team. We spent a lot of effort on initiatives that had not solved some of the bigger 
problems our users were experiencing locally; our community disengagement was likely a 
combination of common reasons, for example, our lack of time to make meaningful 
contributions.10 In the spring of 2019, the development team that worked on ScholarSphere 
shrunk from three developers down to one. We had a strong number of developers within the 
Samvera community to collaborate with; however, we had difficulty bringing on new members at 
the time because the complexity within the ScholarSphere system created a high learning curve 
that was not necessarily transferable to other technology stacks.  

At the end of the summer in 2019 we were given 25 GB of video files to upload in ScholarSphere 
and make accessible. The parameters of the request were outside of what we could support from 
our web interface, and we had no API to allow a product owner to develop against and work with 
the researcher to meet this request. After approximately one month of working with the data and 
our system, we successfully ingested the files into ScholarSphere. At the end of this month, we 


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 7 

decided that we needed to more urgently evaluate our path forward because we could not have 
our lone developer spending this amount of time on single-user requests.  

Platform and Infrastructure Stability 
Each of the major versions released between 2012 and 2019 had several patches and feature 
releases to enhance the system, the interface, and/or our processes for change management 
within the software system. For example, we went from a typed script containing a series of 
commands to Chef (a language used to automate software deployment) for deployment 
management; we upgraded infrastructure core components (Fedora, Solr, Travis, RedHat, etc.); 
and we added infrastructure to keep up with the system demands. In terms of adding 
infrastructure, we both enhanced the virtual capabilities (CPU and RAM) of our systems and had 
tasks offloaded to other systems. We did not want the systems our users interfaced with to be 
responsible for all the heavy lifting. These tasks included characterization, indexing metadata for 
search, creating thumbnails, etc. (see figure 2). 

 
Figure 2. Systems and services with basic workflow process for uploading a file to ScholarSphere, 
including the background jobs that ran on file upload. 

Adding additional components improved the user experience but made our infrastructure d ifficult 
to manage. We were continually trying to push our systems to reflect the best practices of the 
twelve-factor app.11 However, over time, we had certain “infrastructure smells.” The 
infrastructure smells were essentially anti-patterns of these best practices or symptoms of a 
bigger problem.12 These anti-patterns included: 

• Storage coupled closely to application 
• Lack of flexibility to scale storage to integrate 
• Inability to spin up a ScholarSphere instance  

Web 1

Repo

Isilon

Jobs

MySQL

Services
• Apache
• Rails
• Passenger
• ClamAV

Services
• Maria DB

Services
• Tomcat
• Fedora
• Jetty 
• Solr
• Redis

Services
• Rails
• Resque
• FITS

Services
• NFS

Jobs on Upload
• Characterization
• Thumbnail creation
• Text Extraction (Solr)
• Derivatives


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 8 

• Taking days to set up a dev environment  
• Lack of flexibility to decouple small tasks that may require increased resources (create 

derivatives)  

Evaluating Next Steps 
Although we were coming up against some struggles and continued maintenance with 
ScholarSphere it was a successful software project that had several things we liked (and likely 
took for granted). It was important for us to recognize what features and characteristics of 
ScholarSphere were a part of this list. ScholarSphere‘s data model was flexible enough to support 
several current use cases and future needs and was developed with a significant amount of 
community input. There were other development teams within our organization that were also 
developing new applications in Ruby, so the language continued to be relevant within ou r larger 
group as was Ruby on Rails, Blacklight, and Solr. Some of the libraries developed with these 
frameworks were providing us with struggles and we knew that tools and infrastructure could be 
barriers to newcomers onboarding and orientation.13 However, the languages themselves were 
still flexible enough for us to continue our work. We had three permission levels to access the full 
text of an uploaded file: (1) public, (2) Penn State only, and (3) private, and we didn’t want to 
develop anything more complex than that around access permissions. Fedora provided us with 
versioning capability of our objects and we thought that this was something not only to continue 
but potentially enhance. We also had strong support from the Samvera community for 
ScholarSphere. Many people had worked on the code that helped provide functionality and we 
could collaborate within that community when problems arose. At that point we largely decided to 
continue to develop needed features for ScholarSphere while the community pushed forward. In 
part we were hoping that our divergent paths would converge within a year (give or take).  

The month following the relatively manual process of ingesting the 25 GB of video files into 
ScholarSphere was spent making important updates to the system and fixing any low-hanging 
fruit. In October 2019 we decided to start from scratch and spend about two months developing a 
new solution and to evaluate our path forward after that. 

CURRENT SOLUTION 

We turned to the same four established lenses when evaluating our needs in 2019. However, it is 
worth noting that organizationally we were in a much different position than when we started in 
2012. The software development and infrastructure team that managed the service was 
organizationally moved from Central IT to the Libraries where the service and product owner 
resides. Being in the same building and having the same priorities improved communications. 
Also, people within the teams had changed, and our leadership had changed, which changed how 
we approached some of our decisions. We had more experience in technical skills, specifically in 
repository development; we were more refined in our implementation of Agile methodologies; 
and having run a service for years, we had a better sense of our users’ needs.  

Community Involvement 
The community saw a tremendously successful period of growth during this time in adoption of 
software, exposure for funded grants, and number of partners. There was renewed excitement 
about multiple solutions including turnkey repository solutions, hosted solutions, the merging of 
two highly regarded software libraries for performance, and improvement in developer 
friendliness. The latter improvement stripped some of the design patterns that developers 
struggled with to something more familiar and made it easier to onboard new developers.  


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 9 

Local Needs 
The pressure to meet our local needs and competing priorities for the community-based software 
became a sticking point for us. We needed to have a more scalable backend and we were not su re 
when our needs and the priorities of the community would merge. We had also been behind on 
several dependencies and the lift to get back up to date, before being able to add anything new, 
was considerable. This situation led us to create a prototype for evaluation. Our initial goal was to 
see how difficult it would be to build a system to meet the needs of uploading the video files that 
ScholarSphere currently could not handle. We had confidence we could develop features, but this 
area was a consistent challenge and we considered it a primary hurdle for us to jump.  

Team Dynamics 
Our development team consisted of a single developer. However, we had an infrastructure 
developer who was able to help with systems configuration, automation, and containerization. Our 
developer thoroughly understood ScholarSphere and the underlying codebase and architecture 
and had the resources to hire a consultant to help with our efforts.  

We had considerable work performed by a local software development company on other 
repositories (Electronic Theses & Dissertation system, a digital cultural heritage repository, and 
our Researcher Metadata Database). We valued this partnership and wanted to continue to utilize 
them as our staff numbers were down. We needed to be able to more quickly onboard others than 
we previously had in the past. If we were able to have three relatively new members of the team 
contributing to this progress, then we would also potentially have chosen a technology stack that 
was comfortable for others outside our development team to make a more immediate impact.  

Platform and Infrastructure Stability 
As with many systems that are actively developed for years, our current system had several 
dependencies that had organically grown over time to become burdensome to put together in 
order to set up a development environment. Additionally, a local development environment was 
not an exact replica of the production environment because networked storage was implemented 
on production and our development systems had a local storage. We also took this opportunity to 
test out Amazon S3 storage options as our production storage system. We chose this alternative to 
see if we had increased reliability in our storage and to see how well we could manage data in S3 
and get a production service using this to provide an example of the annual operating cost by 
using the Cloud vendor. We were able to simplify our rollout a bit, and modernize the technologies 
used to run our systems (i.e., Docker containers, Kubernetes cluster) (see figure 3). 

Development 

We had three general goals: (1) to improve stability/scalability for local needs; (2) to improve our 
ability to get an environment up for developers more simply; and (3) to be able to onboard new 
developers more quickly. Shortly after our prototype test proved we could meet local needs in 
scalability, we were able to test out our second goal, getting a ScholarSphere environment set up 
easily. The process of setting up a development environment went from days to hours. We had 
reached two of our three goals with these tests and believed our development team (that was two 
to three new developers) contributing to our first two goals was proof that we could onboard new 
developers quickly (our third goal). 

After several months of development in early 2020 we had accomplished moving several of the 
obstacles that had been in our way in recent years but were nowhere near feature parody with 
ScholarSphere.  


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 10 

 
Figure 3. Current infrastructure for ScholarSphere, released in November of 2021. 

We had a rich feature set to transfer from the existing ScholarSphere and did not want to 
simultaneously run two systems until we achieved some level of feature parody. We wanted to get 
to a minimal viable product (MVP) for our new prototype, migrate data, release our new version, 
and retire our existing system. Our product owner had been working directly with ScholarSphere 
users and was able to help us determine priorities for the features we needed in order to have an 
MVP. The following were some of those features: 

• an API, at the very least an internal one for  
o our migration script 
o other home-developed applications 
o internal Library employees 

• versioning and the ability to view versions 
• updated status (pending published) 
• updated user interface 
• URLs that were harvestable 
• maintaining our data model for continued support of concepts such as collections 
• enhanced support for DOIs 


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 11 

We also identified some features that had been developed over the years to either simplify or 
eliminate. The profiles within ScholarSphere were not heavily used and over the years the 
University had more mature systems for this type of purpose. Similarly, finding a featured 
researcher for the home page seemed to create more work than it was worth, and our social media 
integrations were not going to be a priority. We also thought a user’s dashboard—the default page 
after logging in—could be greatly simplified based on the most prominent actions our researchers 
wanted to perform. 

CONCLUSION 

After a little over a year of development, in November 2020, we released our new version of 
ScholarSphere. We used our own internal API, as planned, for data migration from our existing 
Fedora Commons storage system into the new one in Amazon S3. Over the past seven months we 
have done nine feature releases, including collections, and an enhanced API to support Penn 
State’s Open Access initiative. We learned some lessons along the way within all of these lenses. 
We have also more than doubled the physical storage size of our repository since releasing in 
November 2020. Over the summer, we were able to meet a faculty member’s request to upload 30 
to 40 videos of 300 to 400 GB, a request we never would have been able to meet in our prior 
solution. 

Community & Local Needs 

Working with the Samvera community has provided countless opportunities for our entire team. 
We were able to sharpen our technical skills, were given opportunities to lead workshops, 
organized community development sprints, and collaborated on a plan for a community roadmap 
(to name a few). Our entire team benefitted in several ways by the involvement in the community: 
our software knowledge is higher, our problem-solving skills are more creative, and our outside 
professional opportunities expanded. Ultimately, our paths diverged in a way that made it difficult 
to justify the time and resources required for merging back.  

There are several benefits to community-based software: more eyes looking at potential security 
issues in code, more voices to let you know when a dependency of your code has beco me 
vulnerable, shared software ideas for developing issues, and shared solutions for common 
problems. The cost of all these benefits comes with increased complexity in organizing a solution 
(you need to take multiple institutions into account), workflows for development (your local 
workflow may not be the same as the community approved workflow), competing priorities 
within the community, and competing priorities with the community and local roadmap. Open -
source communities are largely online, these groups typically have a more shared, informal 
leadership structure and that lack of formal leadership can make it difficult to find solutions to 
these complexities.14 

Team Dynamics, and Platform and Infrastructure Stability 
Rewriting a system can be a daunting task, and several prominent developers would argue against 
it.15 Reasons we believe we were successful are that (1) we did not change our data model, 
(2) although we changed our architecture, we did not change our coding conventions or our agile 
development process, and (3) the benefits of our changes were multidimensional. We were 
meeting users’ needs with our development work and our infrastructure was enhancing our 
capabilities and making the work of our developers easier and less frustrating. Our deployment 
process has improved to the point that we can perform a release easily and without downtime.  


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 12 

Our technology is no longer based on Samvera, and is now, largely, a more generic Ruby on Rails 
application. We migrated from using Fedora as both a metadata and object store (retrieving 
objects on our central Isilon system through Fedora) to using Postgres as a metadata store and 
Amazon’s S3 storage service for our files. We migrated our background jobs processing services 
from Rescue to Sidekiq. We continue to use Blacklight discovery and search interface, with Solr as 
our search platform.  

Many of these technical decisions were made because of the change in dynamics of our team, and 
perhaps the single biggest change was around experience and the confidence that comes with that. 
Selecting a platform and infrastructure to support that platform is daunting. It is particularly 
difficult when you have so many questions in front of you about how the system will be used, the 
demand it may be under, the need to scale, how to deploy new features and update dependencies, 
etc. Our decisions in 2019 were made with much more experience and understanding of what was 
required of our system as well as what desired by our users. This gave us the confidence to branch 
off slightly from the joined technical path and recognize all the value (beyond technical solutions) 
to remain members of the community albeit in a modified capacity.  

ACKNOWLEDGEMENTS 

Many people put in tremendous time, effort, skill, thought, and enthusiasm into ScholarSphere 
over the years. We want to acknowledge all those that have contributed to the development and 
advancement of the system and appreciation for their work: Carolyn Cole, Hector Correa, Michael 
Tribone, Michael J. Giarlo, Adam Wead, Ryan Schenk, Jeff Minnelli, Dann Bohn, Justin Patterson, 
Joni Barnoff, Seth Erickson, Kieran Etienne, Calvin Morooney, Jim Campbell, Paul Crum, Chet 
Swalina, Matt Zumwalt, Justin Coyne, Elizabeth Sadler, Valerie Maher, Jamie Little, Brian Maddy, 
Kevin Clair, Patricia Hswe, and Beth Hayes. 

ENDNOTES 
 

1 Helen Hockx‐Yu, “Digital Preservation in the Context of Institutional Repositories,” Program 40, 
no. 3 (2006): 232–43, https://doi.org/10.1108/00330330610681312. 

2 Raymond Okon, Ebele Leticia Eleberi, and Kanayo Kizito Uka, “A Web Based Digital Repository 
for Scholarly Publication,” Journal of Software Engineering and Applications 13, no. 4 (2020), 
https://doi.org/10.4236/jsea.2020.134005. 

3 Research Data Access and Preservation, “Browse Data Sharing Requirements by Federal Agency,” 
SPARC, September 29, 2020, 
http://researchsharing.sparcopen.org/compare?ids=18&compare=data; “Publisher Data 
Availability Policies Index,” CHORUS, October 8, 2021, 
https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availability-
policies-index/. 

4 “Registry of Open Access Repository Mandates and Policies,” ROARMAP, 
http://roarmap.eprints.org/view/country/840.html. 

5 “Student Enrollment – Fall 2021,” The Pennsylvania State University Data Digest 2021, 
https://datadigest.psu.edu/student-enrollment/. 

 
https://doi.org/10.1108/00330330610681312
https://doi.org/10.4236/jsea.2020.134005
http://researchsharing.sparcopen.org/compare?ids=18&compare=data
https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availability-policies-index/
https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availability-policies-index/
http://roarmap.eprints.org/view/country/840.html
https://datadigest.psu.edu/student-enrollment/


INFORMATION TECHNOLOGY AND LIBRARIES  MARCH 2022 

BALANCING COMMUNITY AND LOCAL NEEDS | COUGHLIN 13 

 
6 Stephen Abrams, John Kunze, and David Loy, “An Emergent Micro-Services Approach to Digital 
Curation Infrastructure,” The International Journal of Digital Curation 5, no. 1 (2010): 172–86, 
https://doi.org/10.2218/ijdc.v5i1.151. 

7 Jennifer Marlow and Laura Dabbish, “Activity Traces and Signals in Software Developer 
Recruitment and Hiring,” in CSCW ’13: Proceedings (ACM, 2013): 145–56, 
https://doi.org/10.1145/2441776.2441794. 

8 Dai Clegg and Richard Barker, CASE Method Fast-Track: A RAD Approach (Reading: Addison-
Wesley, 1994). 

9 “Questioning Authority,” GitHub, accessed September 2021, 
https://github.com/samvera/questioning_authority; “Browse-Everything,” GitHub, accessed 
09/05/2021, https://github.com/samvera/browse-everything. 

10 Sophie Huilian Qiu et al., “Going Farther Together: The Impact of Social Capital on Sustained 
Participation in Open Source,” 2019 IEEE/ACM 41st International Conference on Software 
Engineering (ICSE) (2019): 688–99, https://doi.org/10.1109/ICSE.2019.00078. 

11 Adam Wiggins, “The Twelve-Factor App,” accessed September 2021, http://12factor.net. 

12 Akond Rahman, Chris Parnin, and Laurie Williams, “The Seven Sins: Security Smells in 
Infrastructure as Code Scripts,” 2019 IEEE/ACM 41st International Conference on Software 
Engineering (ICSE) (2019): 164–75, https://doi.org/10.1109/ICSE.2019.00033. 

13 Christopher Mendez et al., “Open Source Barriers to Entry, Revisited: A Sociotechnical 
Perspective,” in Proceedings of the 40th International Conference on Software Engineering (May 
2018): 1004–15, https://doi.org/10.1145/3180155.3180241. 

14 Lindsay Larson and Leslie A. DeChurch, “Leading Teams in the Digital Age: Four Perspectives on 
Technology and What They Mean for Leading Teams,” Leadership Quarterly 31, no. 1 (2020), 
https://doi.org/10.1016/j.leaqua.2019.101377. 

15 Fredrick P. Brooks Jr., The Mythical Man-Month: Essays on Software Engineering (Reading, Mass.: 
Addison-Wesley Pub. Co., 1982) https://search.library.wisc.edu/catalog/999550146602121; 
Joel Spolsky, “Things You Should Never Do Part I,” Joel On Software, April 6, 2000, 
https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/. 

https://doi.org/10.2218/ijdc.v5i1.151
https://doi.org/10.1145/2441776.2441794
https://github.com/samvera/questioning_authority
https://github.com/samvera/browse-everything
https://doi.org/10.1109/ICSE.2019.00078
http://12factor.net/
https://doi.org/10.1109/ICSE.2019.00033
https://doi.org/10.1145/3180155.3180241
https://doi.org/10.1016/j.leaqua.2019.101377
https://search.library.wisc.edu/catalog/999550146602121
https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/

	Abstract
	The Institutional Repository (IR)
	Selecting a Repository
	Community Involvement
	Local Needs
	Team Dynamics
	Platform and Infrastructure Stability
	Initial Release

	Maintaining the IR
	Community Involvement
	Team Dynamics
	Platform and Infrastructure Stability
	Evaluating Next Steps

	Current Solution
	Community Involvement
	Local Needs
	Team Dynamics
	Platform and Infrastructure Stability
	Development

	Conclusion
	Community & Local Needs
	Team Dynamics, and Platform and Infrastructure Stability

	Acknowledgements
	Endnotes