key: cord-0810151-p7e3dhb8 authors: Rodriguez, Princess D.; Mariani, Michael; Gay, Jamie; Hogan, Tyler C.; Amiel, Eyal; Deming, Paula B.; Frietze, Seth title: A guided‐inquiry investigation of genetic variants using Oxford nanopore sequencing for an undergraduate molecular biology laboratory course date: 2021-05-03 journal: Biochem Mol Biol Educ DOI: 10.1002/bmb.21514 sha: 4af74085b53c42bc341235f81bb0fcd62f8822d0 doc_id: 810151 cord_uid: p7e3dhb8 Next Generation Sequencing (NGS) has become an important tool in the biological sciences and has a growing number of applications across medical fields. Currently, few undergraduate programs provide training in the design and implementation of NGS applications. Here, we describe an inquiry‐based laboratory exercise for a college‐level molecular biology laboratory course that uses real‐time MinION deep sequencing and bioinformatics to investigate characteristic genetic variants found in cancer cell‐lines. The overall goal for students was to identify non‐small cell lung cancer (NSCLC) cell‐lines based on their unique genomic profiles. The units described in this laboratory highlight core principles in multiplex PCR primer design, real‐time deep sequencing, and bioinformatics analysis for genetic variants. We found that the MinION device is an appropriate, feasible tool that provides a comprehensive, hands‐on NGS experience for undergraduates. Student evaluations demonstrated increased confidence in using molecular techniques and enhanced understanding of NGS concepts. Overall, this exercise provides a pedagogical tool for incorporating NGS approaches in the teaching laboratory as way of enhancing students' comprehension of genomic sequence analysis. Further, this NGS lab module can easily be added to a variety of lab‐based courses to help undergraduate students learn current DNA sequencing methods with limited effort and cost. The first complete draft of the human genome was finished in 2003, initiating a period of rapid advancement in our knowledge of the role of genomic variation in human disease. Developments in next generation sequencing (NGS) technologies has laid the framework for expanding genetic research and has opened up an entire industry of new diagnostic applications. Notable new diagnostic tests are now being widely used in the clinical setting and are contributing to our understanding of diverse diseases. In particular, various NGS-based applications are aiding in the understanding of the pathogenesis and progression of cancer. Genomic sequence analysis can inform distinctive responses to drugs (pharmacogenomics) and the various mechanisms underlying different types of cancer. Genomic knowledge therefore permits more personalized health care as NGS-based analytical approaches are becoming increasingly relevant to all medical and biomedical professionals. [1] [2] [3] Educating the next generation of physicians, pathologists, biomedical, and laboratory scientists will be instrumental in the workforce development necessary for the continued delivery and advancement of genomic medicine. With reduced costs in real-time sequencing, new opportunities exist for the application of NGS in the teaching laboratory. However, there remains a challenge in implementing cost-effective contemporary diagnostic tools in the undergraduate teaching laboratories. The introduction of NGS to undergraduates interested in pursuing careers in the biomedical sciences presents an opportunity to integrate active 'wet-lab' experiences with bioinformatics analysis to promote the analytical skills required for the inquiry of data output from the sequencer. Strong evidence exists to suggest that immersing students in a hands-on and investigative learning experience leads to better student outcomes. 4 While there are several examples of integrating genomic data, gene expression and Sanger sequencing data in the classroom, to date there are few guidelines available for laboratory instructors interested in bringing NGS technology into the classroom. There are also several considerations that must be accounted for prior to establishing an NGSfriendly module in the teaching laboratory including, (1) the time required to generate and sequence libraries, (2) analysis and interpretation of bioinformatically-heavy sequencing data, and (3) high-cost associated with running next generation sequencing machines. Finally, the 2019 coronavirus pandemic has introduced a new hurdle for laboratory instructors which includes executing a college-level molecular biology module off-site, to encourage distance-learning. For the purpose of introducing NGS concepts in molecular pathology, we created a 4-week laboratory module that highlights variant detection in cancer cells using an affordable, applicable NGS technique, which can be fulfilled with a mix of in person and remote/online experiences. The MinION sequencing platform introduced by Oxford Nanopore Technologies in 2014, offers many advantages over second-generation sequencers making it an optimal teaching tool. 5 The MinION operates by using a nanometer wide protein pore that allows single DNA or RNA molecules to pass through. As these molecules cross, the unique current measurement for each nucleotide is recorded and translated into sequence information in real-time. 6, 7 The MinION device is portable, easily interfaces with a laptop, and has a startup investment estimated at $1000. 8 In addition, the MinION has been used for clinical applications such as summarizing the gut metagenome of head and neck cancer patients, detecting structural variants in pancreatic cancer cell-lines, and assessing the TP53 mutational status in chronic lymphocyte leukemia patients. [9] [10] [11] The reasonable cost and ease of use, together with a wide range of applicability, make the MinION sequencing device an ideal platform for the teaching laboratory. Here, we present a molecular laboratory course exercise that combines both wet-lab experimentation and variant interpretation after sequencing on the MinION platform. The overall goal for student groups was to correctly identify an unknown non-small cell lung cancer (NSCLC) cell-line based on its unique genomic profile. Students isolated and then PCR amplified genomic DNA from NSCLC cells using primers they created to amplify known variants in KRAS, EGFR, and TP53. [12] [13] [14] [15] [16] Each forward and reverse primer was tagged at the 5 0 end with a unique 12-nucleotide molecular index and combined into a single reaction to develop a multiplex PCR genotyping assay that can be interpreted through bioinformatics analysis. These unique index tags or barcodes allowed for each group to combine their reaction products into a single sample that was then sequenced in real-time on the MinION device during the laboratory session. Following data processing by demultiplexing, the step where the barcode information is used to generate unique sequence files from each group after being sequenced together, and alignment of each file to a reference genome sequence, students were able to interpret variants via annotation using the freely available Integrative Genomics Browser software. 17 We show that our students were able to accurately detect characteristic variants in three NSCLC cell-lines. Further, we provide pre-laboratory assignments (see Appendix S1) for each unit that can be easily modified by instructors as needed. Overall, this 4-week module can be implemented at a low-cost, with minimal bioinformatic background, using available equipment from a molecular biology laboratory. The course, "Applied Molecular Biology Laboratory" was open to senior undergraduates or graduate students enrolled in the Medical Laboratory Science program at the University of Vermont. Twenty undergraduates and four graduate students were registered for the course. The course met once weekly for 3 h per session. For the duration of the 4-week module, students worked in groups of three and were asked to take turns during hands-on experimentation. HCC827 (CRL-2868), H1975 (CRL-5908), and H441 (HTB-174) lung cancer cell-lines were purchased from the American Type Culture Collection (ATCC, Manassas, VA). Celllines were cultured in RPMI-1640 medium with 10% fetal bovine serum and 1% penicillin-streptomycin. All cells were cultured at 37 C and 5% CO 2 . Cultured cells were counted and resuspended to a final concentration of one million cells per milliliter. One million cells were aliquoted into 1.5 ml tubes, washed once with PBS, and the cell pellets were stored in the −80 C. These steps were not performed by the students but were a part of the pre-laboratory preparation. Genomic DNA was isolated by students from thawed cell pellets using the GeneJet Genomic DNA Purification Kit (Thermo Fisher Scientific, USA) as instructed by the manufacturer for adherent cells. All DNA pellets were eluted in 200 μl of distilled water. The concentration of the DNA was determined by students using a NanoDrop-2000 (Thermo Fisher Scientific, USA). PCR was performed by students against two regions of the human EGFR gene and one region of the human KRAS, TP53, and GAPDH gene. Primers were designed to amplify regions containing known variants in NSCLC cell-lines. [12] [13] [14] [15] [16] Forward and reverse primers were tagged at the 5 0 end with a unique 12-nucleotide sequence designed using Barcode Generator. 18 The minimum number of mismatches between tags was set to 6 bp. 19 Each group was given their own unique combination of tagged forward and reverse primers for each target. The primers and the expected amplicon sizes are shown in Table 1 . The 50 μl PCR reactions were performed using AccuPrime Pfx DNA polymerase (Thermo Fisher Scientific, USA) with two-step cycling conditions. Initial denaturation was performed for one cycle at 95 C for 2 min followed by 25 cycles at 95 C for 15 s and 68 C for 1 min. Students then ran 12 μl of PCRamplified DNA in an 1.5% agarose gel. As two or more groups received the same starting DNA material, representative samples from each cell-line were chosen for sequence analysis to ensure read depth required for variant detection. Thus, the remaining 38 μl of a set of 12 total amplicons (four gene regions for each cell-line) were selected for pooling. Fragments below 150 bp were removed using a 0.9X bead cleanup with Ampure XP beads (Beckman Coulter, USA) and eluted in 20 μl of water. The final sequencing pool submitted for Oxford Nanopore 1D ligation for long-read sequencing was quantified at 222.8 ng/μl. Raw data sequencing data in the form of a FASTQ file was retrieved from the MinION sequencing run and transferred to the UVM Vermont Advanced Computing Core (VACC) for bioinformatic analysis. The NanoPack software tools were used to calculate the sequencing run metrics (Table 2 ). 20 FASTQ files were used to align raw sequence reads against the GRCh38 human reference genome using BWA-MEM. 21 Aligned reads were then sorted and indexed using SAMtools 22 and visualized with IGV. 17 3 | RESULTS A new inquiry-based curriculum was designed and implemented for a molecular biology laboratory course at the University of Vermont with the overall goal to help students learn and apply practical NGS sequence T A B L E 1 Target specific primer sequences used for PCR amplification analysis. In this project, senior level students generated and analyzed DNA sequences of known gene polymorphisms from unknown NSCLC cell-line samples. To support in the learning process outside of the laboratory, we developed a set of core teaching units to provide students with an introduction to nucleic acid analysis and variant detection using NGS technologies and bioinformatics sequence analysis (Figure 1 ). The four units included: (1) primer design for multiplex targets, (2) PCR amplification and purification of amplicon targets, (3) MinION sequencing, and (4) data analysis. Each unit was accompanied by a corresponding pre-laboratory exercise and the project culminated in an oral and written report where students were asked to identify unknown NSCLC samples based on genotype information. The primary goal of the introductory unit was to offer students the opportunity to learn and apply PCR primer design that would be used to amplify target genomic loci of interest. Accordingly, students were provided with a table of genes and variants of interest in a prelab tutorial (see Appendix S1) on how to design PCR primers using the freely available Benchling software suite. 23 The Benchling platform enables database searching to retrieve gene sequence and annotation, as well as, use of the Primer3 software to perform primer design to capture each target of interest. 24 Specifically, our students were asked to design target primers at specific genetic loci according to NSCLC genotypes (i.e., EGFR, TP53, KRAS) (see Table S1 ). The primers designed by students amplified approximately 1000 bp of genomic DNA containing the genomics variants of interest (Table 1) . Each group was then assigned a 12-nucleotide group-specific DNA sequence as a barcode to incorporate into dual-index primer pairs. A list of the 12-nucleotide molecular barcodes to be added to the 5 0 end of each primer is provided (see Appendix S1). Each index primer averaged 37-nucleotides in length and the average cost per primer was $7.50. The second unit of this module focuses on the PCR method, including the extraction and quantification of genomic DNA as a PCR template, preparation of working primer dilutions, and setting up individual or multiplex PCR reactions. The prelab material provided for this unit included a brief tutorial on the methods and principles of DNA extraction and quantification, as well as, the protocol used to perform primer dilutions for PCR (see Appendix S1). In the laboratory, students extracted genomic DNA from frozen cell pellets from NSCLC cell-lines labeled 'A', 'B' or 'C' (sample identities were blind to students). Students then quantified the total genomic DNA yield of each sample via the Nanodrop-2000 spectrophotometer. The typical yields of genomic DNA isolated by students was 20 ng/μl with a 260/280 absorbance range of 1.8-2.5 (see Figure S1 ). Each group then made primer dilutions and used the genomic DNA sample as a template for both single target and multiplex PCR reactions (Figure 2(a) ). A positive control primer set was provid ed to the entire class which targeted the GAPDH gene and routinely produced a distinctive 296 bp PCR amplicon. A negative control reaction which lacked genomic DNA was also included by each student group. Each student group had a distinct, nonredundant, index sequence added to the 5 0 end of the PCR primers used (Figure 2(b) ). This dual indexing strategy was utilized for the purpose of identifying student group samples during the downstream bioinformatic analyses. The third unit focuses on the generation of the pooled amplicon library, derived from the classes' multiplex PCR products, to be sequenced on the MinION platform in real-time. The prelab for this unit provided background information and instructions on how to perform DNA gel electrophoresis, PCR purification, and amplicon quantitation for pooling (see Appendix S1). Following interpretation of the DNA agarose gel, students performed PCR purification with magnetic beads and DNA quantification with the Qubit fluorometer. Group amplicons were pooled with all other samples and used for loading a single-use MinION Flongle flowcell as a live in-class demonstration (Figure 2(c) ). To maximize student engagement, student groups were asked to submit questions to be answered during the question and answer period of the demonstration. The final unit began with an overview and detailed explanation of the bioinformatic analysis process and file conversion steps performed by instructors after receiving sequencing data (Figure 2(d) ). In total, the MinION sequencing run produced 310,500 sequencing reads covering 232,217,962 bases ( Table 2) . Following demultiplexing of sequence reads based on dual-indexes, we found that each amplicon was overall proportionately represented in the sequencing run (Figure 3(a) ). The read length distribution showed reads spanning from 350 to 850 bp with an average read length of 747.9 base pairs and PHRED score of 8.5 (Figure 3(b) ). All amplicons mapped to their expected regions. The scripts used for file conversions, alignments, and quality control are provided in Appendix S1. Each student group was provided with the sequencing results in the form of a BAM and BAI index files which were then used to perform variant analysis with the freely available Integrated Genome Viewer (IGV) software. 17 The Unit 4 prelab distributed contained a tutorial on how to load the human genome and BAM files for visualization in IGV (see Appendix S1). Students were asked to identify and annotate if mutations at the specified genomic locations occurred when compared to the hg38 reference sequence. Upon examining gene alignments, student groups identified their assigned NSCLC cell-line based on the genomic profile. For example, groups that began with an HCC827 cells were able to visualize a characteristic 15-bp deletion found in exon 19 of the EGFR gene (Figure 4(a) ), while groups that analyzed samples derived from H1975 cells were able to visualize the R273H mutation in TP53 gene (Figure 4(b) ). 3.6 | Assessment of student work 3.6.1 | Learning assessment As individuals working in the biomedical laboratory fields require a strong background in scientific communication, we chose to evaluate students using both oral and written assessments. Therefore, in addition to the pre-laboratory assignments that accompanied each unit, this 4-week module culminated to a final project accounting for 25% of the course grade. The project continued to foster group work, through a group oral presentation to be presented to the entire class. In addition, individual student learning was evaluated through a 3-page written summary. For the oral portion, each student group was allotted a total of 15 min (12 min for the presentation and 3 min for questions). All student groups were instructed to incorporate the following during their presentations: 1. A broad overview of the analysis and data used for genotyping the NSCLC samples. 2. Describe the normal function of the genes that were mutated and why these specific genes are considered diagnostic in cancer. 3. Identify and describe if these two genes are altered in other cancers. 4. Read and describe a primary reviewed journal article that incorporated a NSCLC cell-lines to study one of the genetic alterations identified. For the 3-page written summary, in addition to answering the questions above, students were asked to include the purpose of the research undertaken in the journal article selected and to summarize the results and major findings of the study. Individual student participation was assessed by instructors as follows: 1. Each student was asked to submit a peer evaluation form for each group presentation. 2. Question(s) queried by the student during the question portion of the group presentation. The rubric used to evaluate the group presentations and a summary of select written feedback provided by students is shown (see Table S3 ). Finally, to accommodate students with documented learning disabilities, students were permitted to pre-record their 12-min presentation to be viewed by the entire class, as well as, submit questions to be answered at a later time by writing them down on an index card and submitting them to the instructor after the session ended. During the final week of classes, students were given a 5-question survey to determine student perception after completing this module. Out of the 24 students enrolled in the course, 18 completed the evaluation. In total, 83% and 94% of students reported this was their first time learning about the MinION sequencing platform and IGV, respectively (Table 3) . After participating in the laboratory, a total of 38% and 60% of students reported feeling more comfortable than not to prepare samples for MinION and to independently use IGV for the purpose of genetic variant analysis, respectively. Finally, 50% of students reported to improve upon their knowledge in NGS technologies after participating in this 4-week module. Some students also provided general comments about their experience participating in this 4-week module. Students commented it was an overall 'positive lab experience' and 'loved that the module had a purpose'. However, some students did comment that at times the instruction felt 'rushed'. As a suggestion, a couple of students recommended spending more time, before starting 'hands-on' work, on the background or with the assignment tutorials provided. In this article, we present an accurate, specific and sensitive multiplex PCR reaction that can be sequenced in real-time on the MinION platform to provide students with inquiry-based hands-on learning of DNA sequencebased diagnostic tests. In the past, there were many barriers to overcome by instructors who sought to incorporate DNA sequencing into the teaching laboratory curriculum. With the advent of third generation sequencing platforms, DNA sequencing has become more accessible for a greatly reduced cost. The results reported here serve as proof-of-concept that MinION sequencing can be implemented in the teaching laboratory for the purpose of cultivating scientific inquiry and providing experience for methods used in contemporary molecular biology. We provide a 4-week module with a week-to-week guide including assignments that cover primer design, multiplex PCR, sample preparation, and entry-level bioinformatics to be used in the molecular biology laboratory class setting. Using MinION sequencing, all student groups were able to correctly identify the NSCLC sample they were assigned, based on its unique genomic profile. The MinION system was able to produce the data in the time frame of a single class. The size and ability of the MinION to easily hook-up to a laptop allowed us to feature an in-class demonstration where students watched sequencing occur live. Students then used this data to detect both deletions and single nucleotide polymorphisms characteristic to lung cancer samples. To adapt to distance-learning, the MinION demonstration could be easily broadcasted and streamed online by instructors. Importantly, the raw data is available on NCBI and therefore, a majority of this lab exercise could be adapted for an entirely remote laboratory experience. In conclusion, the MinION system allowed for the detection of genetics variants in non-small cell lung cancer cell-lines. By implementing this 4-week module students were able to take ownership in the design and sample preparation required for NGS sequencing. This module serves as a useful tool for instructors which could be formatted to accommodate a non-BLS2 (Biosafety Level 2) laboratory and modified for off-site learning. Furthermore, many outcomes were achieved as a result of this laboratory module. First, student independence and proficiency were enhanced by allowing them to design and test their own PCR primers. Second, students were introduced to advanced molecular assays including multiplex PCR and NGS sample preparation, in addition to computational experience in a practical introductory manner. Finally, this module can be effectively incorporated into a molecular biology curriculum as it could function to enhance genetic variant analysis and core concepts in molecular pathology and oncology. This NGS-based module is an easily implementable way to introduce a wide range of students to contemporary molecular biology techniques and bioinformatics, and also serves as a springboard for interested students to pursue further training in research and medical diagnostics. Molecular pathology curriculum for medical laboratory scientists: a report of the association for molecular pathology training and education committee Effect of comprehensive oncogenetics training interventions for general practitioners, evaluated at multiple performance levels Preparing medical specialists for genomic medicine: continuing education should include opportunities for experiential learning Active learning increases student performance in science, engineering, and mathematics The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community Decoding long nanopore sequencing reads of natural DNA Characterization of individual polynucleotide molecules using a membrane channel A first look at the Oxford Nanopore MinION sequencer An assessment of Oxford Nanopore sequencing for human gut metagenome profiling: a pilot study of head and neck cancer patients Nanopore sequencing detects structural variants in cancer TP53 gene mutation analysis in chronic lymphocytic leukemia by nanopore MinION sequencing Aberrant epidermal growth factor receptor signaling and enhanced sensitivity to EGFR inhibitors in lung cancer Rapid identification of somatic mutations in colorectal and breast cancer tissues using mismatch repair detection (MRD) Therapeutic effects of statins against lung adenocarcinoma via p53 mutant-mediated apoptosis Prevalence of human papillomavirus 16/18/33 infection and p53 mutation in lung adenocarcinoma Lack of direct association between EGFR mutations and ER beta expression in lung cancer Integrative genomics viewer A MinION™-based pipeline for fast and cost-effective DNA barcoding NanoPack: visualizing and processing long-read sequencing data Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM The sequence alignment/map format and SAMtools Primer3-new capabilities and interfaces Special thanks to the students of the Applied Molecular Biology Laboratory, BHSC 282, fall of 2019 for their participation in this 4-week module. We would also like to thank Scott Tighe and Pheobe Kehoe of the Vermont Integrative Genomics Resource (VIGR). Materials used in the BHSC 282 course was funded by the University of Vermont, Department of Biomedical and Health Sciences. We would also like to acknowledge the following funding sources, NSF OIA-1736097 (SF), T32AI055402 (PDR) and U54GM115516 from the National Institutes of Health for the Northern New England Clinical and Translational Research network. The sequence data from this study have been submitted to the NCBI Short Read Archive (https://www.ncbi.nlm. nih.gov/sra/) under accession number PRJNA668750.ORCID Seth Frietze https://orcid.org/0000-0003-4058-3661 Additional supporting information may be found online in the Supporting Information section at the end of this article.