key: cord-0057754-id969dqd
authors: Chapman, Christy; Parker, Seth; Parsons, Stephen; Seales, W. Brent
title: Using METS to Express Digital Provenance for Complex Digital Objects
date: 2021-02-22
journal: Metadata and Semantic Research
DOI: 10.1007/978-3-030-71903-6_15
sha: a5f052ef908af55f8b3c2371affefaa083b43fc6
doc_id: 57754
cord_uid: id969dqd

Today’s digital libraries consist of much more than simple 2D images of manuscript pages or paintings. Advanced imaging techniques – 3D modeling, spectral photography, and volumetric x-ray, for example – can be applied to all types of cultural objects and can be combined to create complex digital representations comprising many disparate parts. In addition, emergent technologies like virtual unwrapping and artificial intelligence (AI) make it possible to create “born digital” versions of unseen features, such as text and brush strokes, that are “hidden” by damage and therefore lack verifiable analog counterparts. Thus, the need for transparent metadata that describes and depicts the set of algorithmic steps and file combinations used to create such complicated digital representations is crucial. At EduceLab, we create various types of complex digital objects, from virtually unwrapped manuscripts that rely on machine learning tools to create born-digital versions of unseen text, to 3D models that consist of 2D photos, multi- and hyperspectral images, drawings, and 3D meshes. In exploring ways to document the digital provenance chain for these complicated digital representations and then support the dissemination of the metadata in a clear, concise, and organized way, we settled on the use of the Metadata Encoding Transmission Standard (METS). This paper outlines our design to exploit the flexibility and comprehensiveness of METS, particularly its behaviorSec , to meet emerging digital provenance metadata needs.

As GLAM (Galleries, Libraries, Archives, and Museums) institutions increasingly move their collections online, and as technological advances make it easier, safer, and cheaper to create digital versions of practically any type of cultural heritage object, the metadata needs surrounding digital libraries (DLs) are increasingly complex. Today's DL objects consist of computation-and data-intensive components that are combined and manipulated in a multitude of ways to create enhanced digital representations of heritage materials. Examples range from complex 3D models built from x-ray scans or photogrammetry point clouds to layered images of objects captured with spectral photography under different wavelengths of light that reveal features invisible to the naked eye [1] . Other emergent technologies like virtual unwrapping [2, 3] and inpainting [4] use AI to create "born digital" versions of texts and paintings, thereby exposing features "hidden" by damage but lacking analog counterparts that can be used to verify the integrity of the digitally produced rendition.

While it is tempting to gather the various files and outputs from these technologies and simply provide them as-is to scholars and patrons, most archivists agree that a digital object should be more durable, designed, and structured than the somewhat accidental collection of files produced by software. Given the "behind-the-scenes" nature of the computations that generate these representations, how can scholars, reviewers, and GLAM patrons trust that such complex digital objects faithfully render what they claim to render and can serve as objects worthy of scholarly study? Provenance metadata should provide the answer, but traditional structures that simply outline descriptive details about an object and provide basic information about its digital capture are no longer sufficient. Instead, we need durable, robust mechanisms capable of depicting all of the algorithmic steps and file combinations used to create complicated digital representations.

As one of its digital restoration projects, EduceLab 1 is creating 3D compilations of papyri fragments from the collection of opened Herculaneum scrolls carbonized by the eruption of Mount Vesuvius in 79 AD. 2 In 2017, we piloted the project using the historical images of P.Herc.118, a set of scroll fragments housed in 12 "pezzi" or frames (Pezzo 1-Pezzo 12, see Fig. 1 ) at the University of Oxford's Bodleian Library [1] . Our 3D compilations of P.Herc.118 comprise various versions of different types of files generated over the years using five different imaging modalities -multispectral and hyperspectral photography under numerous wavelengths of light; digitization of color analog photographs; digitization of hand-drawn sketches; and 3D photogrammetry models built using hundreds of 2D photos. These data undergo various computational processes, such as segmentation, stitching, image registration, and machine learningbased contrast enhancement, during the process of creating the compilation 3 . One of the primary goals of the project is to develop a transparent method for depicting and disseminating the digital provenance of these born-digital objects in a clear, concise, and organized way. In this paper, we describe our conceptual model for using METS, in particular its behaviorSec, to accomplish this goal.

Described as "the one metadata schema to rule them all," [5] , METS (the Metadata Encoding Transmission Standard, http://www.loc.gov/standards/mets/) serves as a comprehensive container that can enumerate various types of files; show how they are interconnected and work together to create a complex digital object; and provide all of the relevant metadata for each file, either embedded within the document itself or linked to a location outside of the document. A key benefit of METS is that it allows users to combine elements from different schemata [6] , an important characteristic when various types of files are required to construct a complex digital object like that of P.Herc.118. Unrolled by Italian scholars in 1883, P.Herc.118 has been the subject of various attempts over the years to create a visually accessible facsimile. In addition to the hand-drawn "disegni" sketches of the visible text made by artists as the scroll was first unfurled, the image record also includes a 1998 set of high-resolution color digital photographs, as well as a series of multispectral images captured using infrared lighting in 2005 [7, 8] . In 2017, our team added a hyperspectral stack of some 370 images per pezzo that range from near ultraviolet to infrared lighting. We also captured 3D scans of each of the 12 pezzi using photogrammetry. Our project compiled all of these images into one unified data set, so that the best visual representations from each facsimile could be combined and viewed at the same time.

It has been documented that METS is less useful for interchanging digital objects among institutions than it is for packaging the information for digital objects [9] , and that may be particularly true when it comes to the complicated provenance of complex digital objects. But by offering a balance between expressiveness, which allows complex representations of various types resulting from complicated processes such as volumetric x-ray and AI tools, and proscriptive structure, which simplifies and leads to predictable, canonical sections, METS is a useful packaging tool for achieving our EduceData goals for P.Herc.118 and other digital restoration projects.

• First, METS allows all of the files that go into creating a complex digital object to be referenced in a single METS document, and it depicts how the files work together. Seeing at a glance the list of files and their uses reveals much about the digital provenance of an object. • Second, because METS allows the user to combine elements from different schemata, all of the descriptive, administrative, and preservation metadata for a digital object can be collocated in one METS document. This attribute is extremely useful when several different file types combine to create a particular digital object. The metadata schema chosen for a 3D mesh (i.e. CARARES) is likely to be different from that of a 2D image (i.e. NISO MIX), yet both are included in the 3D version of an artifact. The multiple schemata option is also powerful because it provides a way to include descriptive metadata for both the original cultural heritage object and the digital version(s) of that object. Catalogers often struggle to disambiguate descriptive details about the two conceptual entities (physical artifact versus digital representation), but the flexibility of METS successfully addresses this issue. In our profile, for example, the dmdSec (descriptive metadata section) for a digital object will be used to describe the physical heritage object itself using the Dublin Core standard. Basic descriptive and technical metadata for the image files, on the other hand, will be included in the amdSec (administrative metadata section) as techMD using the NISO MIX and PREMIS standards. • Third, because METS allows one to point externally to files, we can develop tools that automatically collect and compile technical metadata as computational operations are performed. Most of the technical metadata generated by our software tools will be collected and stored as JSON files, which are not compatible with XML-based METS.

To avoid having to crosswalk all of the JSON metadata to XML, most of the technical metadata (other than basic descriptive details such as date/time captured, etc.) will reside outside the METS document but will be referenced using the mdRef element in the amdSec as techMD, allowing users to peruse the easier-to-read JSON files or crosswalk them to XML if they so desire. • Finally, by using the behaviorSec, we can delineate, describe, and enable the execution of specific computational actions to both create and recreate a final digital object. The behaviorSec has generally been limited to such uses as indicating how a digital object should be displayed -the order of pages with page turning behaviors, for example. Our METS profile goes a step further, however, and uses the behav-iorSec to make explicit for analysis and review the algorithmic steps imposed upon the referenced files. These computational processes are depicted through visualizations that are enabled through executable code referenced in the <mechanism> element of the behaviorSec. This final point is the most powerful, and perhaps most unconventional, use of METS in our profile.

Several thorough overviews of METS extol its flexibility and simplicity [10, 11] . However, little research has been conducted on best practice usage of the METS behav-iorSec itself. One exception is that of Gartner [12] , who suggested using METS as an "Intermediary XML Schema" for digital objects that result from experiments in live cell protein studies. In these studies, a variety of biological nanoimaging experiments generate a series of raw images that must be combined and processed with various software tools before being delivered to the biology research team. Gartner proposes using METS documents as "mediating encoding mechanisms" from which data or metadata for archival or delivery can be generated by Extensible Stylesheet Language Transformations (XSLT). According to Gartner, "the intermediary schema technique may be used to define templates, similar to a content model, from which the final METS files to be delivered can be constructed," [12] . This technique is similar to the one we describe, in that it uses the behavior section "to invoke the XSLT transformation by which a METS template file is to be processed and to define the software necessary to co-process the raw image experiment files for delivery," [12] .

To demonstrate EduceLab's planned usage of METS, consider a small part of the P.Herc.118 3D model compilation process -the creation of a 3D version of the 2005 infrared (IR) photographs of Pezzo 4. Creating this DL object involved the following source files (Fig. 2 ):

Pipelines and processes applied to these files to combine them into one composite image include Stitching, ArtecStudioReconstruct, ArtecReorderTexture, and 2D-3DRegistration.

Although a complete list of files and transformations could be considered to be a kind of comprehensive catalogue of an object's digital provenance, the METS behav-iorSec creates an opportunity for designing a kind of structural guide to the many pieces of data, metadata, and data transformations at play. This additional design opportunity goes well beyond what many times passes for packaged digital objects. Rather than merely listing inside a single METS document all of the composite files, along with all of the intermediate files and data transformations that occurred "behind the scenes" to create the composite 3D model, the METS behaviorsSec feature can serve as a powerful design tool for structuring all kinds of interpretive operations, such as displaying and visualizing the digital provenance chain of any given digital object.

We employ various working pieces of software internally to create ad hoc visualizations of our processes and their outputs (to view an example using the P.Herc.118 data, see http://infoforest.cs.uky.edu/pherc118/). We refer to these behaviors as EduceData Visualizations and are in the process of transforming these tools into the METS behaviors format. Applying the METS structure to our tools will afford them greater portability, scalability, and searchability so that they and the digital objects we create can achieve widespread distribution and use.

According to the METS primer [13] , the METS behavior section "provides a means to link digital content with applications or computer programming code that can be used in conjunction with the other information in the METS document to render or display the digital object, or to transform one or more of its component content files." For our digital objects, the METS behaviorSec will include a set of behavior mechanism xlink:href pointers to code that, when invoked, will assemble all of the files and interface information necessary to execute the behavior. The various behavior mechanisms will launch visualizations that (1) explain how input files were manipulated at various iterations, (2) enable the replication of such manipulations, and (3) depict all of the processes and associated metadata.

Various users of the P.Herc.118 compilations will benefit from these EduceData Visualizations. For example, allowing one transparently to "see" the parameter settings and human judgements that play a role in the creation of a digital object will help engender the researcher's trust in the object as a resource for scholarly study. The casual museum patron might like to see if they can improve upon the visibility of a text by making their own contrast or color adjustments. Or, a peer reviewer of research produced using the born-digital object may want to inspect the quality of the registration process to assess the validity of resulting scholarly claims.

For example, the following behaviors will produce EduceData Visualizations for examining and reproducing the creation of the 2005 IR 3D version of P.Herc.118 Pezzo 4:

• VisualizeImage: Allows a user to view any 2D image with adjustments, such as color and contrast. • VisualizeMesh: Allows one to view a particular 3D model. Behaviors in this example would enable simple viewing of the 3D version of P.Herc.118 Pezzo 4's 1998 color image or the 3D version of the 2005 infrared image. • VisualizeMetadata: Behavior that aggregates all metadata for a particular digital object from the various sections of the METS file, such as the dmdSec (descriptive metadata) and amdSec (techMD, digiprovMD, rightsMD, etc.), and presents it in graphical form. • VisualizePipeline: Shows connections inside of a single pipeline or process. The referenced code will use the software-generated metadata.json files to visualize a graph produced by a processing pipeline, such as those for Stitching and 2D-3D Registration. • VisualizeHistory: Uses all of the components of a digital object to depict the entire digital object's history in graphical form. This is an extension of the Visual-izeMetadata and VisualizePipeline behaviors, in that it will render the connections between pipelines, such as between Stitching and Registration, or between ArtecReconstruction and Registration, in a fashion similar to that of Fig. 2 .

• Register2Dto2D, Register3Dto3D, and Register2Dto3D (three different behaviors):

Replicates the specified registration process by using the intermediate files along with the fixed and moving images stored in a .regpkg project file to recreate a registered digital object, such as the 2D 1998 color photo registered to the Artec 3D mesh. In the future, this behavior will give users the ability to perform a new registration of any 2D or 3D object referenced in the METS document to any other 2D or 3D object also so referenced, generating a new transform file that stores the chosen registration parameters. • ExploreRegistration: Allows one to inspect the intermediate results of all steps for a selected registration pipeline. • StitchImage: Draws from a Photoshop metadata file to replicate the stitching process, such as that of the 35 2005 IR images, that composes a single image from multiple images that possess overlapping field-of-views. • ArtecReconstruct or PGReconstruct: ArtecReconstruct uses the files from the Artec Studio Project directory created by the Artec Spider capture event to replicate the rendering of the 3D meshes used to create the 3D models. PGReconstruct is the same behavior, but one associated with our 3D modeling pipeline going forward.

A new custom designed photogrammetry kit will enable a simplified dataflow and processing pipeline, which will in turn create a cleaner, more straightforward METS file and behavior mechanism.

A truncated, conceptual draft of a METS file for the 3D version of the 2005 IR image of P.Herc.118 Pezzo 4 is available for viewing at https://tinyurl.com/y57dfonp. Once METS files have been constructed for the compiled image of each of the 12 pezzi, a consolidated METS file can be created by using the mets:fptr element in the structMap section to aggregate each pezzo's representative METS file, thereby creating a complete catalog and digital provenance chain for the entire set of 3D models for P.Herc.118.

Because of the flexibility and extensibility of METS, we can add behaviors that provide different functionalities as needed or desired by various stakeholders. For example, the use of METS behaviors as described opens up possibilities for much more interactivity with the source files used to create complex digital objects. With all of the component files clearly delineated and organized, we can establish groups of behaviors that not only serve as the blueprint for how a digital object was constructed, but also provides the building blocks for future digital objects. New complex digital object visualizations can be created on the fly, eliminating the need for extensive image processing that creates a static digital object. Instead, the user can see the output of a new "virtual" digital object with all of its relevant metadata generated and maintained in a new project file for examination.

For example, two additional image sets can be registered to the P.Herc.118 Pezzo 4 3D mesh: the original disegni drawings and the 2017 hyperspectral images. While the process outlined in this paper does not include these images, most scholars will want to view the disegni images registered to the 2005 IR 3D model. Or, a museum patron might like to see how a papyrus fragment has changed since 2005 by registering the stitched 2005 IR image to the 2017 IR version. The visualization behaviors proposed below provide a better option for achieving such goals, rather than creating and storing a new digital object each time a new need arises to view the images or the data in a different way.

• GenerateCompositeImage: For combining one or more spectral images into a single output image. The type of composite is user-defined, and examples might include false color rendering, color mapping, contrast enhancement, etc. • GenerateRGBImage: Same process as GenerateCompositeImage, but with a predefined operation for generating an RGB image, a popular spectral combination based on red, green, and blue values that results in a full-color rendition. • VisualizeSegmentation: For P.Herc.118 Pezzo 4, there are nine disegni drawings which must be isolated from each other digitally through a process called segmentation before they can be registered to any other 2D images or 3D meshes. This behavior enables one to examine the segmentation process applied to the original digital images of the disegni. Each of these custom visualizations will generate a new project file containing all of the relevant metadata produced by each new process as it is applied to the selected image files. If desired, systems could provide version control, allowing users to save locally the entire project package along with the new digital object for further study and use. By using the METS document to house all relevant information about a digital object, from descriptive details about the physical source to technical details about its electronic capture to digital provenance details about the final digital object's computational construction, users have everything they need at their fingertips to recreate and investigate claims arising from the study of these digital objects as well as expand upon them by creating their own new versions.

Image registration is one of the simplest types of digital restoration that EduceLab performs. As noted earlier, other more complex digital objects are created using our virtual unwrapping software pipeline and by applying new AI techniques. METS behaviors can also be used to describe these computational actions.

For example, virtual unwrapping applies a series of algorithmic steps to micro-CT scans of manuscripts that cannot be opened and renders the hidden text within. This process of identifying and isolating the writing surfaces that contain text, flattening them, and then rendering the writing on those layers can be made explicit using EduceData Visualization behaviors similar to those described above for registration. Behaviors could be created, for example, to visualize the volume of slice images generated from a micro-CT scan from which a virtually unwrapped image is constructed, and then replicate the entire virtual unwrapping process. Another possible use is to visualize aligned objects in the same 3D space, such as examining a virtually unwrapped 3D mesh showing writing with a multispectral photograph registered to it. Another behavior might allow users to inspect the intermediate results of each step in a specific virtual unwrapping pipeline, a visualization that would enable the backtracking of results to the inputs and that could eventually be expanded to allow one to see how results change when different parameters are chosen or steps are implemented.

AI also enables new complex digital analysis tasks, such as machine-based recognition of artistic styles and recto/verso determinations from x-ray scans, all using large labeled data sets and tools like trained Convolutional Neural Networks (CNNs) or Generative Adversarial Networks (GANs). METS can be used to not only track the provenance of how images are created, but it can also be used to track these black-box analyses to ascertain how researchers come to expert scholarly conclusions about these digital objects (i.e., how a particular style determination was made using CNNs).

We are currently using machine learning techniques in two areas: contrast enhancement of spectral images and ink identification in micro-CT data. Potential relevant behaviors could analyze an entire spectral suite of images and generate a single image with the ideal contrast or other AI-enabled enhancements. A graphical visualization of all the steps and intermediate results in a chosen spectral enhancement pipeline could be displayed. For ink identification pipelines, a behavior could access the complete training dataset for specified ink type, along with the specific parameters (i.e. the model architecture, training algorithm, and hyperparameters) captured automatically and stored in metadata.json files to replicate the training of the ink identification and texturing model. Another behavior could replicate the creation of a textured 3D mesh showing the text by applying a trained ink-ID model to a selected writing surface from a micro-CT volume.

At EduceLab, we want our digital objects to be easy to exchange, validate, and reuse. Today's digital library represents a big data problem in need of careful organization through imposed structure in order for these goals to be achieved. Future work will incorporate the CIDOC-CRM ontology in our METS files to improve their semantic interoperability. But as noted by Doerr [14] , METS is without a competitor when it comes to providing syntactic interoperability for information exchange.

This paper describes our effort at building complex digital objects, which we believe must be designed and packaged with intentionality to support access, all kinds of intended uses, and scholarly analysis. The METS mechanism is expressive enough to capture the complexity of new algorithms and big-data applications, yet still respects the important standards that have emerged around descriptive, technical, and provenance metadata and their accepted standards.

We seek to move beyond a process where almost all the care and design is focused on developing the software -creating tools that generate interesting results, like image enhancements or virtual unwrapping -only to produce a digital object that is merely a wrapped set of file lists and links that were likely produced through the ad hoc happenstance of software parameters and outputs. While that could be considered to be a comprehensive archival record of the process, it misses a very important opportunity. The structuring of complex digital objects using METS brings a new level of intentionality and design to the overall process, building a structural guide for the users of those objects, and giving the software a meaningful target for what to produce and how those results can and should be combined to support a durable digital object.

The Digital Compilation of P.Herc.118. Accepted for publication in Manuscript Studies

From invisibility to readability: recovering the ink of Herculaneum

From damage to discovery via virtual unwrapping: reading the scroll from En-Gedi

Towards an inpainting framework for visual cultural heritage

Providing metadata for compound digital objects: strategic planning for an institution's first use of METS, MODS, and MIX

Application of astronomical imaging techniques to P.Herc.118

817 from facsimiles to MSI: a case for practical verification

The digital object in context: using CERIF With METS

METS: the metadata encoding and transmission standard

New metadata standards for digital resources: MODS and METS

METS as an intermediary schema for a digital library of complex scientific multimedia

Primer and Reference Manual, Version 1

METS and the CIDOC CRM -a comparison