key: cord-0686459-sj5qdi9q
authors: Matzke, Lise A; Watson, Peter H
title: Biobanking for Cancer Biomarker Research: Issues and Solutions
date: 2020-10-19
journal: Biomark Insights
DOI: 10.1177/1177271920965522
sha: f39e221e4fff1bae3d516d8d48305946b9bf517c
doc_id: 686459
cord_uid: sj5qdi9q

Biomarkers are critical tools that underpin precision medicine. However there has been slow progress and frequent failure of biomarker development. The root causes are multifactorial. Here, we focus on the need for fast, efficient, and reliable access to quality biospecimens as a critical area that impacts biomarker development. We discuss the past history of biobanking and the evolution of biobanking processes relevant to the specific area of cancer biomarker development as an example, and describe some solutions that can improve this area, thus potentially accelerating biomarker research.

Biomarkers have been defined broadly as "A defined characteristic that is measured as an indicator of normal or pathogenic biological processes, or the biological responses to an exposure or intervention." 1 The importance of biomarkers lies in their central role in the implementation of "Precision Medicine" to guide decisions and management strategies. 2, 3 There are many types of biomarkers and these can be delineated broadly as either molecular, histologic, physiologic or in some cases radiographic characteristics. 4 Nevertheless, access to human biospecimens and accompanying data is essential to enable the discovery, translation, validation and implementation of most gene based biomarkers. 5, 6 

Cancer gene biomarkers can be classified into those that delineate risk of disease, define diagnosis, or provide prognostic, predictive, or monitoring information to guide decisions around therapy and management. But despite their importance and potential in cancer care, the actual numbers of biomarkers currently deployed routinely for the management of common cancers are still few and far between. For example, breast cancer management remains founded on mostly a mix of a few historical prognostic features (eg, tumor size and grade) and a very limited number of prognostic gene expression panel type biomarkers (eg, Oncotype Dx) and predictive gene biomarkers (eg, ER and Her2 7 ).

The development of the now well established Her2 biomarker is illustrative of the problem. Even if you discount the decade or more of basic research that created knowledge of these genes and tools to interrogate them, a period of ~15 years elapsed from initial human tumor studies to implementation of a robust assay and scoring approach. 8 For the more recently adopted Oncotype Dx gene panel assay, one can also discount the decade or more required to compile the first cohorts of biospecimens and data used in establishing the prognostic value of individual component genes as candidates to be considered for the final panel. But a period of over 10 years elapsed from then to the point of recognition and clinical approval to deploy the final assay. 9 One could argue that several other genes that were once considered to be promising biomarkers and did not make it on their own are part of current prognostic gene panels (eg, ki67, c-myc within the Oncotype DX panel), but for the most part the progress in developing new gene biomarkers remains well short of the promise conveyed by the literature.

There are many established factors and reasons for slow progress and frequent failure in the biomarker development pipeline that have been discussed by others. [10] [11] [12] But there are also other issues that have not been widely considered. One issue that remains to be proven is the interesting recent suggestion that the potential or promise of early research findings is systematically overexaggerated. And that the explanation for the gap between high promise and actual outcomes in the scientific literature may be attributable to overrepresentation of male gender and imbalance in the main authors of papers. 13 Another issue that has not been widely discussed that we will focus on in this review is the challenges in obtaining fast, efficient, and reliable access to the right quality biospecimens and biospecimen cohorts. We will first discuss the current landscape of research biobanking, and then the most important factors around obtaining biospecimens for current research. And finally, we present some solutions that facilitate better access to biospecimens and data for research.

Biospecimens and annotating data are compiled for biomarker research through a process called biobanking that is coordinated by an entity called a biobank. There are many types of biobanks. Some prefer to restrict the term "biobank" to mean only a dedicated entity that compiles a biospecimen collection with the intention of making this collection available for future research and for qualified researchers. But the term "biobank" is best considered to encompass all types of research collections irrespective of size, type or intended use because their origins and intended purposes are the same. 14 In the early 1990's biobanks were rare as they developed as a research tool mostly associated with a small emerging sector known as "translational research." 15 Since then the use of human biospecimens and data has expanded dramatically to the point that biospecimens contribute to the data in ~40% of all cancer research papers and biobanks of all types have proliferated and expanded to serve research across all sectors from basic discovery to clinical validation research. 16 

The process of biobanking has also changed significantly over the past 3 decades, driven largely by the need for increased scale and by increased emphasis on quality, as appreciation of the potential impact of preanalytical variables has slowly developed. 15, 17, 18 This evolution has been accompanied by maturation of best practices that have served as the basis for the development of national and international standards and most recently, external quality assurance programs for biobanks. [19] [20] [21] [22] Another aspect of changes in biobanking since the early 1990's was the gradual separation of research biobanking from clinical biobanking. 23 This has served research fairly well until the 2010's when several new factors emerged that have begun to drive the consideration of reintegration of research and clinical biobanking. Paramount amongst these new factors is the increased research appetite for access to the clinical pathology FFPE archives. 24, 25 Biobanks and challenges These changes in models and processes have contributed to some acute challenges that now limit biomarker research and threaten the realization of our current investment in developing Precision Medicine. One issue is that the research biobanking system has become inefficient and hard to sustain. 26 Further systemic issues exist: too many biobanks are hidden or hard to find; were created with inadequate quality considerations; or are comprised of biospecimens and data of an unknown standard; and difficult legacy decisions around what collections to continue to store and what to discard, are becoming commonplace. 27 Furthermore, information about the source of biospecimens and the quality standards associated with them is not systematically reported in published research and is rarely considered in peer review, making the research findings harder to reproduce. The publication of relevant reporting standards by the translational research (REMARK) 28 and biobank (BRISQ) communities have attempted to address this. 29

For individual researchers the challenge of obtaining the right biospecimens for a biomarker research study is very familiar. This challenge has in part been ameliorated by the increasing availability of high quality "digital" biospecimen derived datasets comprising research data generated from biospecimen cohorts that is often included as supplemental data in papers on individual studies, 30 or is available from large national and international "omics" initiatives that continue to be improved and expanded (eg, TCGA and Human Protein Atlas 31, 32 ). With the recent disruption to biobanking caused by the COVID-19 pandemic these existing digital resources have become even more important. However, the utility of these datasets for cancer biomarker studies have limitations, 33, 34 and new biospecimen cohorts continue to be needed. Biospecimens can be obtained by locating an existing collection, by collecting the necessary biospecimens specifically for the study, or by contracting with another entity to provide some or all of the necessary biobanking services to collect them for the study. In any given research situation, the relevance and best choice between these 3 options, varies depending on many factors. Sometimes there is 1 factor that is decisive or only 1 option is feasible. But more often there are several factors that are relevant but have different degrees of importance and so each has a graded influence that needs to be weighed in order to make the best and informed decision as to the most appropriate option. We will discuss some of the most important factors in determining the best option relating to the (1) study design; (2) the biospecimen and data quality needed to address the research question;

(3) the representation required of the biospecimen; and 4) the overall cost and effort that can be justified and supported.

Important features of research study design involving biospecimens have been summarized by Simon et al. 5 These include the observational versus intentional characteristics of the study, statistical design considerations, and biospecimen factors such as the degree to which the biospecimens and data are/were collected with the focus of a specific study research question in mind, and the degree to which control is/was exerted over the details of the biospecimen and data collection process. These biospecimen factors can essentially be distilled to the retrospective vs prospective nature of the collection process and this strongly influences the strength or "level of evidence" 12,35 that the results of a study are considered to be capable of generating and the extent of validation needed.

For biomarker discovery research, the demand for sets of biospecimens is often based on tissue level criteria. These are selected on the basis of factors such as representative pathology in the tissue and on the basis of preservation formats or ability to obtain live cells. In the subsequent "translation" and "validation" phases, the research focus moves to disease and then participant levels and more specifically defined cohorts and then establishing potential clinical relevance in preclinical studies. These latter phases in particular, are dependent on biospecimen quality achieved through meticulous adherence to standards and documentation of all processes including the biobanking component. 14 

Quality is usually considered in terms of the intrinsic features of a biospecimen and its annotating data that determines its fitness for research purpose. Intrinsic qualities include the composition, condition, and preservation of the biospecimen, the data concerning the collection and processing, and the data concerning the source that can include patient demographic and clinical data at the time of collection. 36 Verification of these aspects of quality and standardization of the processes associated with collection and storage of biospecimens and data is increasingly recognized to be important. 37 Quality control (QC) approaches to determine the quality of extracted derivatives such as DNA, RNA, and proteins are well recognized in the laboratory. In addition several organizations have created internationally recognized biobanking standards, including the recent development of the ISO 20387 standard, and associated external quality assurance (QA) programs for biobanks that provide assurance of a known standard and level of quality around the broader "biobanking processes." 38

There are also other aspects or extrinsic quality features that determine the degree to which a biospecimen is valued for an intended research purpose. 39 These extrinsic or "complex" qualities include; spatial and temporal relationships that link a biospecimen to other biospecimens from the same participant (eg, primary tumor resection and subsequent biopsy of a later metastasis), information and annotation generated by additional analysis of the biospecimen during handling by the biobank, and the degree to which the biospecimen is representative of the population under study. 36 This last challenge, obtaining biospecimens that are truly representative of the population or acknowledging the extent to which a research cohort under study is representative of the population, may be a significant factor in failure of biomarker validation. However this is often not addressed in the cancer biomarker literature. 40 Many translational cancer research studies are based on retrospective case cohorts with 5 to 10-year follow-up data that are provided by tumor biobanks. However even a biobank that accrues cases across a geographical population can experience significant changes over time in the proportion of incident cases that it is able to collect and therefore changes in the degree to which cases are representative. In the example biobank shown in Figure 1 , major factors in the decline in accrual were the diminishing tumor size and increased clinical requirements for sampling, precluding collection of fresh tumor tissue from earlier stage tumors for research. Even if known factors such as tumor size are controlled for in the selection of research cohorts from a biobank, other important selection biases can persist. For example, the widespread use of Tissue Microarrays to study cancer biomarkers is subject to bias in the construction of the TMAs. TMAs are typically based on coring central regions of tumor blocks and so invasive margins are less well represented. Therefore, biomarkers and other features that have a regional distribution within tumors, such as those associated with tertiary lymphoid structures, are underrepresented in most TMAs. To illustrate further potential bias in cohorts used for biomarker studies, Table 1 shows how different cohort compositions can be observed with respect to breast cancer molecular subtypes between examples of translational studies 41,42 based on selected cohorts as compared to population based studies 43,44 based on population/registry based data from the same regions.

The overall cost and the level of evidence that the research is intended to generate are often among the most important factors influencing the investigator's decision about acquiring biospecimens. Acquiring biospecimens for research involves harnessing or deploying resources such as capital, operating costs, personnel time, and effort. These expenses may not be incurred directly by the researcher collecting the biospecimen Graph showing biobank collection profile over time. The graph shows the decline in the proportion of all incident breast cancer cases from which a frozen sample for research was collected by a provincial tumor biobank in Canada over a 10-year period. but they will be incurred by someone in the process of acquisition. These costs are typically highest if the researcher creates their own biobank, intermediate if an existing biobank can be contracted to obtain the necessary biospecimens, and lowest if biospecimens are already available in an existing biobank. 14 However, the level of evidence generated when biospecimens are collected prospectively for a specific study (as in creating a new biobank directly or through a contract) is higher than when biospecimens are obtained from an existing biobank.

Cancer research is dynamic and constantly evolving. This means that the types and details of biospecimen requirements also evolves and that the main pathways for researchers to obtain annotated biospecimens should be expected to change.

While the emphasis differs in different areas of cancer research, the dominant route for research focused on biomarkers for guiding management of disease has, until recently, driven the widespread adoption and use of the "classic" biobank operating model which is to collect biospecimens and annotating data in order to store them and generate a large stock collection from which specific biospecimen cohorts could be selected for a given study. 45 Other biospecimen pathways (direct from person or patient and indirect via a clinical archive) are also important in cancer gene biomarker research (see Figure 2 ). However in the mid to late 2010's new methodologies (eg, high throughput proteomics applied to formalin-fixed-paraffin-embedded "FFPE" materials), technologies (eg, detection of ctDNA in blood plasma by sequencing), research concepts (eg, a new focus on the dynamic balance between tumor and immune system and appreciation of tumor heterogeneity and clonal evolution), and the expanding scale of biomarker research (eg, cancer genome atlas, GWAS studies, etc.), have all driven a new appetite for biospecimens. In particular, in research focused on biomarkers for guiding management of disease there is a significant switch to seeking access to biospecimens obtained direct from patients (eg, serial blood samples) or indirectly via pathology (eg, FFPE blocks 46 ) (see Figure 2 ). This means that the dominant biospecimen pathway supporting this area of research is now changing rapidly. 18 Key solutions that improve access to these changing pathways include (1) transforming biobank services;

(2) utilizing biobank and biospecimen locators; (3) implementing programs that facilitate the research clinical access pathway. These are discussed further in the next sections.

Tumor biobanks can best serve changes in research needs by refocusing their operating models and the investment in their expertise in biobanking. Many disease focused biobanks such as tumor biobanks, operate principally on a "classic" 45 or "stock" 47 based operation model. But their existing stocks of preassembled frozen cases representing primary tumors are now less in demand. At the same time these biobanks have distinct expertise in assembling cases and often well-established interactions with pathology departments. Instead of prioritizing and expanding their own preassembled biospecimen collections, biobanks should shift their operations to prioritize providing brokerage services to facilitate research access to clinical pathology archives and biobanking services to assist with consenting and collection of bespoke cohorts for individual research studies on a contract service basis. 48

Finding a biobank can be a challenge for researchers. Many biobanks and networks of biobanks are represented through locators which are online displays of contact and requirements details, 49 or simple lists of biobanks and categories of materials available (eg, fresh and or FFPE specimens 36 ), or more complex displays of "aggregated" data and are often combined with query tools. 50 The most extensive locator developed by the BBMRI-ERIC is based on methodical data sharing terminology 51 and augmented by the addition of a "negotiator" function to improve the ability to follow through a query to secure approval for access. 52 However all locators have to overcome perceptions of barriers to specimen access and a preference for local known sample sources. 53, 54 Many of these locators can be found online by using relevant search terms, as shown in Table 2 . However, choosing the right terms can still be challenging and many locators are unable to provide sufficient biospecimen level detail on existing stocks or to identify biobanks that can provide specific bespoke services. Nevertheless, use of locators that allow researchers to query across groups of dedicated biobanks can be a very efficient way for a researcher to obtain the necessary biospecimens. This is because dedicated biobanks will have (a) a governance structure and operational capacity that makes it possible to apply for biospecimens, (b) an open access policy to the materials determined only by research quality criteria and not restricted by the condition of establishing a research collaboration, (c) appropriate consent and research ethics approval already in place to allow rapid distribution of biospecimens and data, and (d) expertise and advice available to refine the best selection of cases to match the research question and assays proposed.

Researchers can access some important types of biospecimens (eg, blood samples) directly from normal individuals and patients. But, as noted above, to access biospecimens representing the full range of pathological processes in tissues, researchers increasingly seek access to FFPE biospecimens in clinical archives. These FFPE biospecimens have been obtained in the course of clinical diagnosis and treatment and are stored as part of the patient record and to support future care and testing. Unlike many biobank biospecimens these materials hold several distinct advantages: they contain most categories of pathology (that in some cases are very rarely available in a research biobank), they are preserved and processed in the same clinical format used for clinical testing, and the scope of these biospecimens is population based.

Accessing archival FFPE materials is contingent on a complex process that the clinical laboratory communicates and operates for the purpose of retrieving materials for clinical purposes, as well as adapting to support research requests. While no one uniform standard for the research access process exists, it typically involves an application by the researcher, and evaluation of regulatory documentation (such as proof of ethics review, consent of the patient, and proof of study funding), an evaluation by the laboratory of whether materials can be provided without eroding the capability to support possible future assessments, and then a release of materials accordingly. In many cases the added task of determining the most suitable FFPE block for research from review of H&E stained clinical slides cut from the many sample blocks associated with each pathology case adds a significant extra step. Finally, the process is keenly dependent on an ability for the block material to be recalled efficiently from short-or long-term storage that in many instances involves another service provider and competing workload priorities with clinical requests, and the entire process needs to be appropriately tracked and recorded to ensure integrity of the clinical archive.

As a result, accessing clinical archives can be a lengthy processes and clinical priorities or lack of clear process can add further delays for researchers accessing the materials that are critical to the research pipeline. As listed in Table 3 there are many discrete steps, for the researcher and for pathology, involved in requesting and obtaining FFPE blocks. Perhaps not surprisingly there are many interface issues that arise related to this process that have been identified by both pathology departments and researchers in the Canadian landscape (see Figure 3 ).

Strategies to enhance this access process have been proposed in the Netherlands, 55 Denmark, 56 and Canada 57including concepts that have been designed to create platforms that embed staff in pathology departments with the sole focus 

of receiving and processing research requests. These platforms also aim to provide researchers with better communication on how to apply, a mechanism to catalog and track research requests, retrieve and ship materials to researchers and monitor demand. A common goal of shortening turnaround times and ensuring sound process that meets the needs of clinical pathology and researchers are a hallmark of all these platforms. However, embedding an understanding of research needs and a standard process for response to researchers at the level of individual pathology laboratories is also important. To address this the Canadian Tissue Repository Network (CTRNet) has developed a program called the Pathology Research Support Certificate Program 58 to provide standards and education for personnel within clinical pathology laboratories who are involved in the decisions or the processes of providing support for research involving biobanking. The program is accessible online, self-paced, and provides information on the key issues for consideration by a diagnostic pathology department with respect to supporting research involving biobanking and a national standard that delineates a standardized approach and the practices to be implemented to provide research access and support (See Figure 4) . The education and standards integral to Biomarker Insights this program have undergone review and input from leaders in Pathology and Biobanking across Canada and Australia and the program is accredited by University of British Columbia (Continuing Professional Development accreditation for 3.0 Maintenance of Competence Section 3 Self-assessment hours).

The inability to efficiently access the right biospecimens, contributes to failures in biomarker development. In particular, the high demand in the face of multiple challenges in accessing clinical archival materials are specific factors that ought to be recognized as significant issues to address. Improved mechanisms that enable appropriate and efficient access by researchers whilst maintaining integrity of the clinical archives and adhering to the requirements set out by the needs of clinicians, is critical. Biobanks can play an important role in addressing these issues by redirecting their expertise to brokering access to clinical specimens as well as focusing on services that provide researchers with bespoke models of collecting and processing biospecimens that are right for their biomarker research. The current evolution of biobanks from the existing prevalent classic model to prospective and services-based models, coupled with development of tools and programs aimed at improving the ways researchers can find biobank resources and disseminating common standards for access to clinical archives, and will ultimately improve biomarker discovery.

US Food and Drug Administration. The BEST resource: harmonizing biomarker terminology

What are biomarkers?

Implementation of biomarker-driven cancer therapy: existing tools and remaining gaps

National Institutes of Health. BEST (Biomarkers, EndpointS, and other Tools) Resource. Bethesda, MD: National Institutes of Health

Use of archived specimens in evaluation of prognostic and predictive biomarkers

The importance of human tissue bioresources in advancing biomedical research

Prognostic and predictive factors in early-stage breast cancer

Development of the 21-gene assay and its application in clinical practice and clinical trials

Waste, leaks, and failures in the biomarker pipeline

Breaking a vicious cycle

Biomarker validation and testing

Papers with male authors are more self-promoting. The Economist

Research perspective on utilizing and valuing tumor biobanks

Biobanking 3.0: evidence based and customer focused biobanking

Biospecimen use in cancer research over two decades

In search of an evidencebased strategy for quality assessment of human tissue samples: report of the tissue Biospecimen Research Working Group of the Spanish Biobank Network

Biobanking in health care: evolution and future directions

Certification for biobanks: the program developed by the Canadian Tumour Repository Network (CTRNet)

The evolution of biobanking best practices

The college of American Pathologists Biorepository Accreditation Program: results from the first 5 years

Standardization and innovation in paving a path to a better future: an update of activities in ISO/ TC276/WG2 biobanks and bioresources

Precision medicine: driving the evolution of biobanking quality

The factors that drive the increasing use of FFPE tissue in basic and translational cancer research

Building a 'Repository of Science': The importance of integrating biobanks within molecular pathology programmes

A framework for biobank sustainability

Fundamental considerations for biobank legacy planning

Reporting recommendations for tumor marker prognostic studies (REMARK): explanation and elaboration

Biospecimen reporting for improved study quality (BRISQ )

Overcoming the translational roadblocks: a cancer care and research model

Before and after: comparison of legacy and harmonized TCGA genomic data commons' data

The human protein atlas: a spatial map of the human proteome

An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics

Pan-cancer analysis of genomic sequencing among the elderly

Rules of evidence and clinical recommendations on the use of antithrombotic agents

Biospecimen complexity and the evolution of biobanks

Biobanks for life sciences and personalized medicine: importance of standardization, biosafety, biosecurity, and data management

Comparison and analysis of two internationally recognized biobanking Standards

Biospecimen complexity -the next challenge for cancer research biobanks?

Biobanking in the twenty-first century: driving population metrics into biobanking quality

The single-cell pathology landscape of breast cancer

CD103 and intratumoral immune response in breast cancer

Risk of locoregional recurrence and distant metastases of patients with invasive breast cancer up to ten years after diagnosis -results from a registry-based study from Germany

US incidence of breast cancer subtypes defined by joint hormone receptor and HER2 status

Commentary on improving biospecimen utilization by classic biobanks: Identifying past and minimizing future mistakes

Factors that drive the increasing use of FFPE tissue in basic and translational cancer research

A model to estimate frozen tissue collection targets in biobanks to support cancer research

The utilization of biospecimens: impact of the choice of biobanking model

The Australian and New Zealand children's haematology/oncology group biobanking network

BBMRI-ERIC directory: 515 biobanks with over 60 million biological Samples. Biopreserv Biobank

A minimum data set for sharing biobank samples, information, and data: MIABIS

Extending the minimum information about biobank data sharing terminology to describe samples, sample donors, and events

A decentralized IT architecture for locating and negotiating access to biobank samples

The barriers and motivators to using human tissues for research: the views of UK-based biomedical researchers

Pathology databanking and biobanking in The Netherlands, a central role for PALGA, the nationwide histopathology and cytopathology data network and archive

Evolutionary concepts in biobanking -the BC BioLibrary

Pathology Research Support Certificate Program

We gratefully acknowledge support for this work by the Biobanking and Biospecimen Research Program at BC Cancer