key: cord-0947393-ssgh61w6 authors: Robinson, Serina L.; Biernath, Troy; Rosenthal, Caleb; Young, Dean; Wackett, Lawrence P.; Martinez-Vaz, Betsy M. title: Development of the Organonitrogen Biodegradation Database: Teaching Bioinformatics and Collaborative Skills to Undergraduates during a Pandemic date: 2021-03-31 journal: J Microbiol Biol Educ DOI: 10.1128/jmbe.v22i1.2351 sha: 6d2ef403fea7dfb06f9f62f61b2f00a6b1701f0c doc_id: 947393 cord_uid: ssgh61w6 Physical distancing and inaccessibility to laboratory facilities created an opportunity to transition undergraduate research experiences to remote, digital platforms, adding another level of pedagogy to their training. Basic bioinformatics skills together with critical analysis of scientific literature are essential for addressing research questions in modern biology. The work presented here describes a fully online, collaborative research experience created to allow undergraduate students to learn those skills. The research experience was focused on the development and implementation of the Organonitrogen Biodegradation Database (ONDB, z.umn.edu/ondb). The ONDB was developed to catalog information about the cost, chemical properties, and biodegradation potential of commonly used organonitrogen compounds. A cross-institutional team of undergraduate researchers worked in collaboration with two faculty members and a postdoctoral fellow to develop the database. Students carried out extensive online literature searches and used a biodegradation prediction website to research and represent the microbial catabolism of different organonitrogen compounds. Participants employed computational tools such as R, Shiny, and flexdashboard to construct the database pages and interactive web interface for the ONDB. Worksheets and forms were created to encourage other students and researchers to gather information about organonitrogen compounds and expand the database. Student progress was evaluated through biweekly project meetings, presentations, and a final reflection. The ONDB undergraduate research experience provided a platform for students to learn bioinformatics skills while simultaneously developing a teaching and research tool for others. Research experiences help students develop critical thinking and problem-solving skills while actively engaging with current scientific challenges (1) (2) (3) . During the COVID-19 pandemic, physical distancing and inaccessibility to laboratories led to the cancellation of numerous undergraduate research programs globally (4, 5) . These circumstances motivated educators to develop innovative ways to provide research experiences. Analysis of large data sets, surveys, citizen science, and literature reviews were proposed as options to transition from bench to virtual research (6, 7) . Projects involving database searches and development of computational biology tools were underrepresented despite the significance of these skills in current biological studies. Bioinformatics skills are increasingly relevant to academic research and postgraduate job opportunities in biology (8) (9) (10) (11) . The Network for Integrating Bioinformatics into Life Sciences Education (NIBLSE) conducted a survey of biology faculty across the United States which identified nine core bioinformatic competences for undergraduate life scientists (10, 11) . These included: (i) understanding the role of data mining and computation in the life sciences, (ii) describing the relevance and applications of computational concepts to the life sciences, (iii) finding, retrieving, and organizing various types of biological data, (iv) employing bioinformatics to examine biological problems, and (v) writing scripts and utilizing command-line programs (11) . The bioinformatics skills described in the NIBLSE survey strongly support the incorporation of projects involving data mining and database development into undergraduate research experiences and courses, including laboratory exercises that entail large data sets and database searches (12) (13) (14) (15) (16) . Organonitrogen compounds are heavily used societally, as pharmaceuticals and, in greatest volume, as agricultural chemicals such as fertilizers, pesticides, and soil amendments. Excessive use of these chemicals leads to nitrogen runoff from soils, pollution of rivers, and increased biomass in large bodies of water leading to hypoxic "dead zones" (17) (18) (19) . It is important to compile information about these compounds: chemical properties, environmental fate, transformation products, and microbial degradation. To help meet both educational and informational needs, we created an online tool, the Organonitrogen Biodegradation Database (ONDB, or the Database), which provides information about the cost, common uses, and biodegradation of agriculturally relevant organonitrogen compounds. The ONDB was developed by undergraduate students as part of a remote collaborative research experience. The research experience, presented here, allowed students to engage with effective literature searching and data analysis while acquiring basic bioinformatics and programming skills. The students also practiced how scientific communication and interdisciplinary collaborations work in science. Participation in the ONDB team addressed many of the bioinformatics competences listed in the NIBLSE survey (Table 1) . The ONDB research team was composed of three students and a professor from primarily undergraduate institutions, a postdoctoral fellow, and a faculty member from a large research university. Participants had completed sophomore-level coursework in biology and chemistry and were recruited based on their interests in bioinformatics and microbial biochemistry. The project was conducted over 10 weeks. Students spent the first 4 weeks of the program learning computational tools and practicing data retrieval from databases (Appendix 1). They used the EAWAG-BBD Pathway Prediction System (20) to assess the biodegradation potential of their compounds. They employed chemical databases (ChemSpider, PubChem) to investigate the uses and chemical properties of the target compounds (21, 22) . The chemicals chosen for the project were selected based on their widespread use as fertilizers and agricultural additives. Once data collection and analysis were completed, students spent the remaining weeks working on the design and implementation of the ONDB. All research activities and meetings were carried out remotely using teleconferencing tools such as Zoom and Google Meet. The students were assigned their roles in the ONDB team based on their primary interests. They were all exposed to the programming tutorials and C1. Explain the role of computation and data mining in addressing hypothesis-driven and hypothesisgenerating questions within the life sciences. • Formulate hypotheses regarding the biodegradation potential of different organonitrogen compounds. • Examine current scientific literature to identify bacteria and proteins associated with the biodegradation of organonitrogen compounds. • Make predictions about the microbial utilization of organonitrogen compounds using a computational biodegradation pathway prediction tool. C2. Summarize key computational concepts, such as algorithms and relational databases, and their applications in the life sciences. • Describe how to create and organize database pages using R, Shiny, and flexdashboard. • Design and implement a database web tool to catalog information about the degradation of organonitrogen compounds. C3. Write simple computer scripts and utilize command-line bioinformatics programs. • Use R, Shiny, and flexdashboard scripts and command-line tools to build interactive pages, and a web interface. C4. Use bioinformatics tools to examine complex biological problems in evolution, information flow, and other important areas of biology. • Use bioinformatics tools such as MetaCyc, the Biocatalysis and Biodegradation database, and pathway prediction tools to investigate the microbial degradation of organonitrogen compounds. C5. Find, retrieve, and organize various types of biological data. • Use the appropriate NCBI databases to find and retrieve research papers, and DNA and protein sequences. • Retrieve information and data from various public databases such as ChemSpider, PubChem, and the National Pesticide Center Information Center. • Organize chemical structures, research papers, and metabolic pathway information in formats suitable for addition to the ONDB database pages a The organonitrogen database research experience addressed five of the nine core bioinformatics competences listed in the NIBLSE survey. • I learned how to turn a website from technically functioning to truly useful. • Programming with R is extremely variable and learning to apply the simple things I had learned before, along with some HTML, was challenging but useful for future research I may do. • Our team had the opportunity to improve upon our bioinformatics skills by requiring us to understand how other databases of similar content work. We utilized other bioinformatics tools such as UNIPROT, EAWAG's Pathway Prediction System, and BLAST. • I expanded my abilities with standard tools such as BLAST, but I found my favorite tool to be the Pathway Prediction System (PPS) provided by Eawag. • As I learned to use other databases, such as Uniprot and BLAST, I unlocked new ways to utilize the information provided from the PPS to make meaningful connections. How did this project help improve your literature and database searching skills? • I improved my ability to quickly and accurately scan the literature before me and assess whether or not it will be helpful. • Creating the database fostered ever-increasing proficiency in locating articles of importance. This proficiency manifested itself both in specific techniques (keyword operators, results filters) as well as a more intuitive knowledge of key phrases. Additionally, as I learned new terms such as "syntrophic degradation." • By synthesizing the current literature knowledge of biodegradation -of enzymes, bacteria, catalysts, and so on -I acquired a multifaceted lens through which to view microbial metabolism, one that could not be gained from any article alone. • I did not know much about database searching myself but organizing the databases and resources into distinct categories took some critical thinking and collaboration in the group that was very informative. • Working as a team and in collaboration is a benefit because when we each brought diverse ideas to the table, the clearer and more beneficial the database turned out. Diversity can be positively correlated with growth. • We sought to solve everything from computer programming problems, to unifying the way in which we drew chemical structures using a drawing tool. Much of the work we accomplished required us to work as a team. • It was very useful to work as a team of students who could make our own decisions on details and preference issues, while still getting overall direction and feedback from senior researchers. • It was a unique experience during COVID-19 to find ways to communicate effectively. I learned how to summarize my work in a way that could be quickly explained to the group as a whole in order to get constructive feedback. There are many potential platforms for communication, but the most useful ones seemed to be the ones that provided face-to-face interaction. • It was useful to work together simultaneously, although remotely, so that we could prevent extra work due to miscommunication. • The database project also served as an exercise in team communication. We were working in the initial stages of development, there was no fixed template for our entries. Teamwork was of paramount importance to ensure formatting was consistent across our entries. • The most valuable part of working in a team environment and with senior researchers would be the troubleshooting and idea generation. Collaboration amongst the group would generate so many new ways to solve an issue or problem that might come up. • It is helpful to watch and listen to how the scientific community works and breathes through the researchers who have more experience. • Writing scientifically was necessary for several parts of the database, therefore it was crucial that I learned from my experienced colleagues how to do so. It was important that I consider the audience that would be utilizing the database. My collaborating peers and senior researchers worked with me to tailor my thinking and writing for the audience that the ONDB. bioinformatics exercises during the first 4 weeks of the summer research experience. Subsequently, they were given the choice to focus on the programming and database web interface design or researching and curating scientific content. To teach data mining and literature research skills, we created a set of worksheets to report relevant information from databases and scientific papers (Appendices 2 to 5). The research team held biweekly meetings to discuss the main findings of current studies and to decide on relevant information for inclusion in the ONDB pages. The literature searching worksheet focused on how to read a research paper and highlighted skills such as identifying the main hypotheses, interpreting figures, and summarizing key findings. The data mining worksheets aimed to help students retrieve and organize information from public repositories. When completing these forms, students had to state why they chose a particular data repository, the type of data obtained from it, and its relevance to the ONDB project. The ONDB research experience emphasized effective communication and teamwork with each undergraduate student functioning as a "domain expert" for different components of the ONDB. This presaged the modern work environment, where individuals with different backgrounds (e.g., scientists, programmers, and designers) collaborate to create a common product. One student focused on the programming and web application development while other students focused on scientific content, ONDB design, and user experience. To encourage daily team interactions, we used Slack, a workplace tool for real-time communication. Database researchers attended weekly laboratory meetings with graduate students and postdoctoral fellows to present their work and learn about ongoing research at large research universities (Table 2) . Additionally, they shared their findings with a broader audience by presenting a poster at a virtual interdisciplinary research symposium (Appendices 9 and 10). A GitHub repository containing the code for the ONDB as well as a curated list of R resources, tutorials, and packages can be accessed using the following link: https://github.com/serina-robinson/ondb. A set of selfpaced tutorials used during the ONDB research experience are included in Appendix 11. The ONDB is an interactive web application built using R Shiny and flexdashboard. The prototype of the ONDB consists of four main pages: (i) a page stating the mission and goal of the database, (ii) a compound page containing the structures, formulas, and cost per ton of the chemical of interest, (iii) a reaction page with clickable boxes linked to the microbial degradation steps of each chemical, and (iv) a resource page with links to other relevant resources (Appendix 7). The ONDB (https://z.umn.edu/ondb) is freely available to anyone interested in learning about organonitrogen compounds and biodegradation. The students involved in this project had no prior programming experience. Self-paced tutorials were designed for the students consisting of daily coding exercises (Appendix 11). All coding exercises were shared daily via a GitHub repository to give students the experience of working collaboratively using the Git version control system. Each coding exercise focused on learning one or more skills in R while working with real datasets related to the ONDB project. Topics covered included data structures, data wrangling, and building interactive visualizations. A way to scale up and continue to develop the ONDB is by adapting its research activities as teaching modules to incorporate bioinformatics into biology courses. For example, the ONDB activities focused on literature searches and scientific content curation can be incorporated in the laboratory component of various undergraduate courses, including microbiology, biochemistry, and bioinformatics. Instructors can select compounds of interest and assign them to a given class. Alternatively, students can select an organonitrogen compound of their choice and investigate its relevance, environmental fates, and biodegradation potential. Once students have completed the research on a compound of interest, the information needed to generate database pages can be shared with the ONDB team. Additionally, instructors can use the worksheets and tutorials as stand-alone activities to teach database searching, R programming, and critical analysis of research papers. Multiweek projects mirroring the ONDB activities offer a practical alternative to do research remotely while allowing students to use computational tools to address biological questions. These projects to can be offered as independent research experiences when access to laboratories is restricted (Appendix 8). There were no safety issues associated with this activity. This research was designated exempt by the Hamline University IRB committee as defined by federal regulations (Final Common Rule, 45 CFR §46.104) under normal educational research. Vision and Change in Undergraduate Biology Education. A call to action: a summary of recommendations made at a national conference organized by the A call to develop course-based undergraduate research experiences (CUREs) for nonmajors courses A New Biology for the 21st Century How COVID-19 is affecting undergraduate research experiences COVID-19 shakes up summer internship and research opportunities: how companies, research experiences, and undergraduate students are adapting Transitioning undergraduate research from wet lab to the virtual in the wake of a pandemic Alternative summer experiences for undergraduate students during COVID-19 The genome solver project: faculty training and student performance gains in bioinformatics The genome solver website: a virtual space fostering high impact practices for undergraduate biology NIBLSE: a network for integrating bioinformatics into life sciences education Bioinformatics core competencies for undergraduate life sciences education Using an international p53 mutation database as a foundation for an online laboratory in an upper-level undergraduate biology class Involving undergraduates in the annotation and analysis of global gene expression studies: creation of a maize shoot apical meristem expression database Studying gene expression: database searches and promoter fusions to investigate transcriptional regulation in bacteria Use of the University of Minnesota biocatalysis/biodegradation database for study of microbial degradation Novel biocatalysis by database mining Spreading dead zones and consequences for marine ecosystems Nutrient imbalances in agricultural development The University of Minnesota pathway prediction system: predicting metabolic logic ChemSpider -building a foundation for the semantic web by hosting a crowd-sourced databasing platform for chemistry PubChem 2019 update: improved access to chemical data The development of the ONDB provided a valuable remote research experience that allowed participants to develop bioinformatics competences while practicing collaboration and scientific communication skills. Within college-level courses, the ONDB web tool could be used to illustrate the roles of microbes in the breakdown of environmental contaminants (Appendices 6 and 7). The web tool can be expanded for additional educational training and to provide a useful resource for the scientific community (Appendix 8).A potential limitation of the ONDB project is the lack of research information on some organonitrogen compounds of interest. Another common challenge is formatting the images and content needed to create database pages. Occasionally, data formatting issues can be a source of frustration amongst undergraduate researchers. Lastly, the choice was made for undergraduates to construct the ONDB using R since it is a versatile and widely used language in biology. Given the relatively small size of the ONDB, R is able to quickly query and display database results. However, with a much larger dataset, we would recommend the use of a relational database management system and SQL. Due to our limited timeframe, students involved in this project were not taught database semantics, architecture, or implementation with SQL. Despite this, we felt the utility of R and its associated packages for data wrangling, graphing, and statistical analysis provided transferable skills for students at an early stage in their scientific careers.