key: cord-0057749-e3zqpk5q authors: Kessler, Ingmar; Perzylo, Alexander; Rickert, Markus title: Ontology-Based Decision Support System for the Nitrogen Fertilization of Winter Wheat date: 2021-02-22 journal: Metadata and Semantic Research DOI: 10.1007/978-3-030-71903-6_24 sha: e4998ba499ce6ee0853a12751996e44d990bf88d doc_id: 57749 cord_uid: e3zqpk5q Digital technologies are already used in several aspects of agriculture. However, decision-making in crop production is still often a manual process that relies on various heterogeneous data sources. Small-scale farmers and their local consultants are particularly burdened by increasingly complex requirements. Regional circumstances and regulations play an essential role and need to be considered. This paper presents an ontology-based decision support system for the nitrogen fertilization of winter wheat in Bavaria, Germany. Semantic Web and Linked Data technologies were employed to both reuse and model new common semantic structures for interrelated knowledge. Many relevant general and regional data sources from multiple domains were not yet available in RDF. Hence, we used several tools to transform relevant data into corresponding OWL ontologies and combined them in a central knowledge base. The GUI application of the decision support system queries it to parameterize requests to external web services and to show relevant information in an integrated view. It further uses SPARQL queries to automatically generate recommendations for farmers and their consultants. The digital transformation of agriculture is an ongoing process whose progress differs for various regions, organizations, and activities. The regional circumstances in Bavaria for instance include the following factors [3] : It has both the most farm holdings and the smallest area per holding compared to the national average in Germany. 106 718 farm holdings in Bavaria are sole proprietorships and account for 87% of the agricultural area in the state. 61% of them are parttime holdings and 81% of their personnel are family members. Winter wheat has the largest share of the cereal production by area with 47%. Farmers typically aim at optimizing their crop yield while remaining within the limits set by agricultural best practices and local legal regulations. Small agricultural holdings in particular may still often rely on paper documents or homemade spreadsheets to manage their fields. Their decision-making depends on finding, combining, and interpreting heterogeneous data sources that are published by various organizations. These data sources often have different formats such as paper documents, PDF documents, websites, and spreadsheets, which contain informal identifiers and implicit definitions. As the number and complexity of regulations increase, e.g., regarding the environment, water protection, pesticides, and fertilizers, this becomes increasingly difficult for small-scale farmers in particular. Many farmers need to call a small number of local consultants about similar routine concerns, which is time-consuming and cost-intensive. Digital technologies may become necessary to support farmers in their knowledge management and decision-making. The presented system uses Semantic Web and Linked Data technologies such as OWL ontologies to integrate heterogeneous data sources and create a decision support system (DSS). Due to its local significance, our initial use case focuses on the automatic generation of recommendations for the nitrogen (N) fertilization of winter wheat. Farmers can already use software tools such as a website called Bodenportal by an association for Bavarian farmers (LKP) to calculate the legal upper limit for the total required N fertilization (N req ) for a given field, crop, and year. It is usually not applied to the field all at once, but rather split into three separate applications to optimize the results. Farmers and their consultants must decide on how much fertilizer to apply and when, while complying with this legal upper limit. The presented DSS uses this N req value and other parameters from various data sources to automatically recommend an application time and rate (kg N ha −1 ). The rest of the paper is structured as follows. Section 2 discusses related work including existing semantic resources and other ontology-based DSS. Section 3 describes the overall concept and architecture of the presented system and gives an overview of regional data sources. Section 4 presents an evaluation of the DSS for a particular N fertilization recommendation. Section 5 concludes the paper. The Semantic Web and Linked Data ecosystem includes ontologies and vocabularies to encourage common semantic structures, data sources using these structures, and applications consuming these data sources. Unit ontologies [9] such as QUDT (Quantities, Units, Dimensions and Types) [7] specify how to semantically describe measurements. The GeoSPARQL [16] standard consists of a vocabulary and SPARQL extensions to describe and work with geographic information. EU Vocabularies [19] provides several authority tables including one for administrative territorial units (ATU). GeoNames [21] contains semantic geospatial features and, e.g., their names, coordinates, and hierarchical relations. Wikidata [23] is an open knowledge base about a wide variety of topics. In GovData [11] , the open data portal of the German government, the most common formats are PDF, HTML, and CSV, whereas RDF is very rare for actual data. However, metadata is often provided in RDF according to DCAT-AP.de [20] , the German adaptation of the Data Catalog Vocabulary Application Profile (DCAT-AP) for data portals in Europe. An overview and a survey of Semantic Web technologies in agriculture are provided in [2] and [6] . AGROVOC [5] is a multilingual SKOS-XL thesaurus that covers many topics such as environment, agriculture, food, and nutrition. AgroPortal [8] is a repository for hosting, searching, versioning, and aligning agricultural ontologies and vocabularies. The Crop Ontology [13] is a communitybased platform for creating OBO ontologies and vocabularies in RDF to provide common semantic structures for phenotypes, breeding, germplasms, and traits. Agricultural data sources, including German and Bavarian ones, are often not available in RDF, but there have been efforts to change that. The SPARQL endpoint in [12] provides reference data to estimate the costs of machine use. Similarly, the SPARQL endpoint in [1] provides requirements regarding water protection based on the database of the Federal Office of Consumer Protection and Food Safety (BVL) on authorized plant protection products. This database uses EPPO codes 1 as plant and pest identifiers. The ontology-based DSS in [14] and [22] use SWRL rules to generate recommendations for wheat production in Syria and for home gardens in Ecuador. In [17] , existing semantic resources are reused and new ontologies are modeled to integrate heterogeneous data sources, so that SPARQL queries can calculate answers to questions of farmers in Nepal. The DSS presented here, which is based on our previous work [15] , is similar to [17] to some extent, but focuses on a different region with different data sources and questions. The data sources also include GeoTIFF, relational databases, and web services. GeoSPARQL is used extensively to model and query geospatial data to show relevant information in a GUI application and to automatically generate recommendations. The aim of the presented DSS is to assist farmers and consultants in a GUI application by retrieving and showing relevant information and automatically generating fertilization recommendations. This requires the integration of various heterogeneous data sources with different formats (CSV, PDF, SHP, SQL, etc.) and structures from multiple organizations, as well as of human expert knowledge. In order to provide a solid foundation for the system and potential future applications, common semantic structures were reused or modeled as OWL ontologies based on regional data sources. Relevant data was transformed into additional ontologies and combined in a central knowledge base. This way, SPARQL queries provide unified access to interrelated knowledge from multiple data sources. Several agricultural experts collaborated with us on this work by gathering and preparing relevant data sources and posing competency questions that the system should be able to answer. These data sources were not available in RDF and ranged from general agricultural, geospatial, and weather information to data, definitions, and regulations specific to Bavaria. They also interviewed two local consultants about their decision-making regarding N fertilization to create corresponding decision trees. In our approach (Fig. 1) , various tools are used to integrate non-RDF data sources and existing semantic resources by creating corresponding OWL ontologies that contain common semantic structures or actual data (Sect. 3.2). The created ontologies are consistent with OWL 2 DL. As a result, they are compatible with DL reasoners and in principle other DL ontologies. The ontologies are stored in corresponding named graphs in a GraphDB triplestore 2 , which features OWL 2 RL inference and acts as the central knowledge base of the system. The competency questions and decision trees were turned into parameterizable SPARQL queries (Sect. 3.3). A web-based GUI application was built for farmers to intuitively interact with the DSS (Sect. 3.4). It parameterizes predefined SPARQL queries to the knowledge base with both automatically derived values and user input to show relevant information to the user, to parameterize requests to external web services, and to automatically generate recommendations. The OWL ontologies in the knowledge base can be grouped into different categories and usually import several higher-level ontologies. Since not all relevant existing semantic resources provide SPARQL endpoints and to improve performance and maintainability, relevant subsets were extracted from them using SPARQL updates and saved as Linked Open Data (LOD) ontologies. This includes crops from AGROVOC, ATU individuals from EU Vocabularies, ATU interrelations from GeoNames, and additional ATU labels as well as German regional keys from Wikidata. The structures in the non-RDF data sources were modeled as common semantic structures in the form of OWL entities, i.e., classes, properties, and individuals, in one upper and several domain ontologies, while also taking the LOD ontologies into account. The upper ontology consists of toplevel classes, common properties, and individuals that are shared among several domains. The domain ontologies group together entities related to topics such as weather, soil, crops, seed varieties, field records, fertilizers, or pesticides. Each dataset ontology is usually created from a single data source using the tools shown in Fig. 1 as described in Sect. 3.2. Finally, the data of farm holdings are saved in separate firm ontologies. The core entities (Fig. 2) in the OWL ontologies include fex:Firm, fex:Field, and fex:FieldRecord, i.e., a firm has a record on the cultivation of a crop on a field. Accordingly, a field record corresponds to a period of time that usually starts with soil preparation and sowing activities and ends with a harvesting activity. In the ontologies, sowing activities for instance are characterized by, e.g., their date, seed density, and seed variety, which grows into a certain kind of crop. If multiple crops are grown simultaneously in different parts of a field, it has multiple fex:FieldRecord individuals for a given year and each may have its own geospatial polygon. If multiple crops are grown sequentially as main, second, or catch crops in a given year, this is indicated by the fex:hasCropCategory property of each fex:FieldRecord individual. Each fex:Field individual is linked not only to fex:FieldRecord individuals, but also to additional entities such as soil measurements (soil:SoilMeasurement) from laboratory results, which are characterized by, e.g., their date, soil texture, humus class, and pH value. The geospatial polygon (sf:Polygon) of a field provides its location and boundary and may change over the years. Polygons are not just available for fields, but also for rural and urban districts as well as areas with fertilization restrictions (Fig. 3) . Therefore, SPARQL queries can determine in which district a field is located (Listing 1), which affects for instance reference crop yields (Listing 2). The IRIs of districts and other administrative territorial units (euvoc:Atu) were reused from EU Vocabularies and aligned with other semantic resources. This can simplify information exchange and queries to additional semantic resources that share the same IRIs. The soil textures in the soil ontology are one of the new common semantic structures that have been modeled in the various domain ontologies for our initial use case based on regional data sources. While there already are other ontologies that describe soil textures, they do so in general or for other countries. However, data sources and regulations in Germany and in Bavaria use two different definitions of soil textures. The soil texture (soil:SoilTextureAlkis) of a polygon in the soil assessment map has the same definition as the ones in ALKIS (Authoritative Real Estate Cadastre Information System) and is a parameter of the automatic generation of N fertilization recommendations. The soil texture (soil:SoilTextureBavaria) of a soil measurement in Bavaria is different and affects many fertilization requirements other than N such as CaO and K 2 O. Similarly, local farmers and data sources use regional terms and definitions for crops that focus more on the regional crop usage than on botanical definitions. For example, crop:WinterSoftWheatQualityE indicates in simplified terms that a farmer intends to achieve a protein content of more than 14%, while crop:WinterSoftWheatWCS indicates that he intends to produce silage from the harvest. The distinction is important, as they each have different N requirements and reference values in the regional data sources. Additionally, while both crops are still the same botanical species Triticum aestivum, certain seed varieties are often more suitable for different purposes than others. The class hierarchy (Fig. 4) in the crop ontology was modeled based on tables of reference values called Basisdaten by the Bavarian State Research Center for Agriculture (LfL), the code list for grant applications (FNN) by the StMELF, the database on authorized plant protection products by the BVL, and the Descriptive Variety Lists by the Federal Plant Variety Office to represent how the various concepts relate to each other. This way, a dataset ontology can unambiguously assert to which specific concept each datum from a data source refers. This is important, as data referring to a concept applies to all of its subclasses as well. The following non-RDF data sources are relevant to agriculture in Bavaria in general and to our initial use case in particular. Figure 1 shows the tools that were used to create corresponding OWL ontologies, which were made available to the DSS via a central knowledge base. A CSV file from the German Meteorological Service (DWD) includes 490 local weather stations that provide soil temperature and moisture profiles at various depths below the ground over time as well as station attributes such as their geographic coordinates. A second CSV file from an agrometeorological service in Bavaria (AMB) includes 150 stations with weather forecasts that provide agriculturally relevant data such as the air temperature 5 cm above the ground instead of the usual 2 m and the soil temperature 10 cm below the ground. Both CSV files were loaded into GraphDB OntoRefine to create virtual SPARQL endpoints and then transformed by SPARQL updates into OWL ontologies. A shapefile (SHP) from the StMELF contains polygons of 1947 red areas and 3207 white areas, which were defined by the Bavarian State Office for the Environment (LfU) and specify certain fertilization restrictions for water protection. It was loaded into QGIS 3 , exported as a CSV file containing the polygons in the well-known text (WKT) format as well as their attributes, and then loaded into GraphDB OntoRefine. Similarly, a shapefile from the Bavarian State Office for Digitization, Broadband and Surveying (LDBV) contains 376 196 polygons from the official soil assessment map and their attributes soil texture, field value, and soil value. Additional shapefiles from the LDBV contain polygons of the borders of the Bavarian state, its 7 governmental districts, and its 71 rural and 25 urban districts. We used SPARQL updates to match the German regional keys from the shapefiles to the ones from Wikidata to link the polygons to the ATUs from EU Vocabularies. The SPARQL query in Listing 1 returns the rural or urban district in which a field is located and covers the edge case where a field may intersect with two or more districts by comparing the sizes of the intersections. GeoSPARQL itself does not specify a function to calculate the area of a polygon, but several triplestores provide extension functions such as ext:area. SELECT Listing 1: SPARQL query returning district with largest intersection with field. GeoTIFF files from the DWD contain raster maps for the monthly average temperature and precipitation in Bavaria. Since GeoSPARQL is not well suited for working with raster maps, they and the field polygons from the knowledge base were loaded into PyQGIS. The average pixel value for each polygon was calculated in each map and added to the OWL ontology containing the polygon. PDF documents on the website of the LfL such as the Basisdaten contain, e.g., tables of reference values. This includes the crop yield in the rural and urban districts of Bavaria, the mineral N available in the soil (N min ) for various crops in the governmental districts, and the nutrient contents of various organic fertilizers. Tables from PDF documents were first converted into CSV files and then loaded into GraphDB OntoRefine. The SPARQL query in Listing 2 returns the reference crop yield for the district returned by the query in Listing 1. The relational database of the Bodenportal by the LKP contains data about agricultural firms in Bavaria. This includes for instance the name, official ID, and location of a firm; the names, official IDs, sizes, and polygons of its fields; laboratory results of the fields' soil; and the calculated N req value as well as its various input parameters. Ontop [4] supports R2RML Direct Mapping [18] as well as custom mappings to create a virtual SPARQL endpoint to access a relational database. The data of several firms, which agreed to participate in the evaluation of the DSS, were exported as separate firm ontologies. During the export, PyQGIS was used to convert field polygons from the Gauss-Krüger coordinate system to WGS 84, as the latter usually has better GeoSPARQL support in triplestores. Additionally, SPARQL updates were used to categorize numeric laboratory results of various nutrients into qualitative soil content levels. In doing so, the updates used knowledge that had been modeled in the OWL ontologies based on agricultural literature such as the LfL's guide for the fertilization of arable and grassland. One aspect of our work was the formalization of relevant knowledge of human experts. This way, the DSS can use not only institutional data sources (Sect. 3.2) but also the implicit knowledge of local consultants gained through years of experience. For this purpose, the agricultural experts collaborating with us interviewed two Bavarian consultants about the decision-making processes underlying their recommendations to farmers about the appropriate N application time and rate for winter wheat. This showed that their recommendations differ in regard to both the required parameters and the results for each of the three separate N applications. Therefore, the agricultural experts created for each of the two consultants one set of three decision trees. We then turned these decision trees into parameterizable SPARQL queries and related knowledge in the OWL ontologies, so that the DSS can automatically calculate the time and rate for each N application. Farmers may choose their preferred consultant in the GUI application when using the DSS, which determines what recommendations and corresponding queries are used. The GUI application of the DSS was implemented as a web application using the Angular framework. It sends SPARQL queries to the knowledge base to retrieve agricultural information and the data of several firms, which agreed to participate in the evaluation of the DSS. After a user has selected a firm, one of its fields, and one of its field records, the GUI displays the view depicted in Fig. 5 (left). It shows a map of the field and its surroundings, information about the field, its vegetation, and its soil. This includes the BBCH code, which indicates the current growth stage of the crop, and its description from the BBCH monograph. To do so, the GUI application queries the knowledge base for the current crop, the sowing date and the geographic coordinates of the field. It then sends them to the external SIMONTO web service, which simulates and returns the BBCH code. Four expandable panels at the bottom contain the field's photo gallery as well as embedded external diagrams of the soil temperature profile, the soil moisture profile, and the weather forecast at the closest weather station to the selected field. The input mask for the automatic N fertilization recommendation changes depending on which consultant and which of the three N applications have been selected. There, the user can inspect and modify automatically derived parameters, fill in any remaining ones, and trigger the calculation. Accordingly, SPARQL queries (Sect. 3.3) are parameterized and sent to the knowledge base to calculate the recommendation results, which replace the input mask. The presented DSS and its GUI application were tested during their development by agricultural experts, Bavarian farmers, and their consultants, so that their feedback could be taken into account. Feature requests included, e.g., the weather forecast, the soil temperature profile, the SIMONTO web service, the photo gallery, and displaying city names in the map view. Their suggestions also provided an agricultural perspective to make the phrasing of the labels in the GUI more familiar and understandable to farmers and consultants. The agricultural experts checked that the automatically generated recommendations match the results of their decision trees, which they updated based on field tests. The following qualitative evaluation shows the parameterization and the results for the first N application for the field depicted in Fig. 5 . Many parameters of the input mask for the N fertilization recommendation are automatically filled in by the GUI using knowledge that has been asserted or inferred in the knowledge base. This includes quantitative parameters such as the N min value as well as qualitative parameters such as whether a field is at a warm or cool location. As part of the work described in Sect. 3.3 this qualitative parameter was defined quantitatively. The agricultural experts interviewed consultants and determined that a specific threshold for the average temperature in a certain month at the field's location determines whether it is warm or cool. Hence, the GUI queries the field's average value from one of the DWD raster maps to auto- matically fill in the corresponding parameter. If multiple sources are available for a parameter, data of the firm is preferred to generic reference values. Table 1 shows the parameters and their values, which include the N req and N min values, the seed variety, whether the field's location is warm or cool, and the crop cultivated in the previous year. Some parameters cannot be automatically derived from the currently available data sources and still need to be manually entered by the user. This includes the current stand density of wheat shoots and the optional use of an organic fertilizer, i.e., its type, application rate, and relative time. The recommendation results are shown in Fig. 5 (right) and consist of the parameter values, their effects, the application time and rate, and the remainder of the N req value. The DSS has calculated an organic N rate of 15.8 kg ha −1 due to 2.0 t ha −1 chicken manure and recommends for the first N application a mineral N rate of 52.5 kg ha −1 at the beginning of the vegetation period. In this work, common semantic structures were reused or newly modeled in OWL ontologies for various regional data sources. These structures were used to transform heterogeneous data into OWL ontologies that were stored in a central knowledge base. The presented DSS and its GUI application use SPARQL queries to display information relevant to crop production and to automatically generate recommendations. In this way, farmers may become less reliant on external help. The agricultural experts plan to present the results of a UEQ-based [10] user study with farmers and consultants in a subsequent paper. The scope of the system could be extended, since its approach and much of its knowledge are not strictly limited to its initial use case, i.e., the N fertilization of winter wheat in Bavaria. Potential candidates include a DSS for pesticide applications, a GUI application to semantically manage field activities, and supporting other regions. All of them would entail integrating additional agricultural data sources. This work and many others could be simplified, if more organizations were to provide their data as semantic resources. Linked open data im Pflanzenschutz Landscaping the use of semantics to enhance the interoperability of agricultural data Bavarian State Ministry of Food, Agriculture and Forestry: Bavarian agricultural report Ontop: answering SPARQL queries over relational databases A survey of semantic web technology for agriculture Quantities, units, dimensions and types (QUDT) AgroPortal: a vocabulary and ontology repository for agronomy Comparison and evaluation of ontologies for units of measurement Construction and evaluation of a user experience questionnaire Metadata aggregation at GovData.de: an experience report Webservices auf heterogenen Datenbeständen -Methoden der Umsetzung am Beispiel der KTBL-Planungsdaten Crop ontology: vocabulary for crop-related concepts An ontologydriven decision support system for wheat production Wissensbasierte digitale Unterstützung in der Pflanzenbauberatung GeoSPARQL -a geographic query language for RDF data Ontology based data access and integration for improving the effectiveness of farming in Nepal A direct mapping of relational data to RDF. W3C Recommendation Publications Office of the European Union: EU vocabularies DCAT-AP.de Spezifikation: Deutsche Adaption des "Data Catalogue Application Profile GeoNames ontology An ontology-based decision support system for the management of home gardens Wikidata: a free collaborative knowledgebase Acknowledgments. The research leading to these results has been funded by the Bavarian State Ministry of Food, Agriculture and Forestry (StMELF) under grant agreement no. D/17/02 in the project FarmExpert.