key: cord-0546826-cknao3wk authors: Viswanathan, Vijay; Neubig, Graham; Liu, Pengfei title: CitationIE: Leveraging the Citation Graph for Scientific Information Extraction date: 2021-06-03 journal: nan DOI: nan sha: 21c9c624bc328686cef4bb1f80a786a5027d8886 doc_id: 546826 cord_uid: cknao3wk Automatically extracting key information from scientific documents has the potential to help scientists work more efficiently and accelerate the pace of scientific progress. Prior work has considered extracting document-level entity clusters and relations end-to-end from raw scientific text, which can improve literature search and help identify methods and materials for a given problem. Despite the importance of this task, most existing works on scientific information extraction (SciIE) consider extraction solely based on the content of an individual paper, without considering the paper's place in the broader literature. In contrast to prior work, we augment our text representations by leveraging a complementary source of document context: the citation graph of referential links between citing and cited papers. On a test set of English-language scientific documents, we show that simple ways of utilizing the structure and content of the citation graph can each lead to significant gains in different scientific information extraction tasks. When these tasks are combined, we observe a sizable improvement in end-to-end information extraction over the state-of-the-art, suggesting the potential for future work along this direction. We release software tools to facilitate citation-aware SciIE development. The rapid expansion in published scientific knowledge has enormous potential for good, if it can only be harnessed correctly. For example, during the first five months of the global COVID-19 pandemic, at least 11000 papers were published online about the novel disease (Hallenbeck, 2020) , with each representing a potential faster end to a global pandemic and saved lives. Despite the value of this quantity of focused research, it is infeasible " [...] The very deep convolutional networks are inspired by the "VGGNet" architecture introduced in [16] for the 2014 ImageNet classification challenge, with the central idea to replace large convolutional kernels by small 3×3 kernels. […] Given the recent popularity of LSTMs for acoustic modeling, we have experimented with such models on the Switchboard task [ Figure 1 : Example of using the citation graph to improve the task of salient entity classification (Jain et al., 2020) . In this task, each entity in the document is classified as salient or not, where a salient entity is defined as being relevant to its paper's main ideas. for the scientific community to read this many papers in a time-critical situation, and make accurate judgements to help separate signal from the noise. To this end, how can machines help researchers quickly identify relevant papers? One step in this direction is to automatically extract and organize scientific information (e.g. important concepts and their relations) from a collection of research articles, which could help researchers identify new methods or materials for a given task. Scientific information extraction (SciIE) (Gupta and Manning, 2011; Yogatama et al., 2011) , which aims to extract structured information from scientific articles, has seen growing interest recently, as reflected in the rapid evolution of systems and datasets (Luan et al., 2018; Gábor et al., 2018; Jain et al., 2020) . Existing works on SciIE revolve around extraction solely based on the content of different parts of an individual paper, such as the abstract or conclusion (Augenstein et al., 2017; Luan et al., 2019) . However, scientific papers do not exist in a vacuum -they are part of a larger ecosystem of papers, related to each other through different conceptual relations. In this paper, we claim a better under-standing of a research article relies not only on its content but also on its relations with associated works, using both the content of related papers and the paper's position in the larger citation network. We use a concrete example to motivate how information from the citation graph helps with SciIE, considering the task of identifying key entities in a long document (known as "salient entity classification") in Figure 1 . In this example, we see a paper describing a speech recognition system (Saon et al., 2016) . Focusing on two specific entities in the paper ("Ima-geNet classification challenge" and "Switchboard task"), we are tasked with classifying whether each is critical to the paper. This task requires reasoning about each entity in relation to the central topic of the paper, which is a daunting task for NLP considering that this paper contains over 3000 words across 11 sections. An existing state-of-the-art model (Jain et al., 2020) mistakenly predicts the non-salient entity "ImageNet classification challenge" as salient due to the limited contextual information. However, this problem is more approachable when informed of the structure of the citation graph that conveys how this paper correlates with other research works. Examining this example paper's position in the surrounding citation network suggests it is concerned with speech processing, which makes it unlikely that "ImageNet" is salient. 2 The clear goal of incorporating inter-article information, however, is hindered by a resource challenge: existing SciIE datasets that annotate papers with rich entity and relation information fail to include their references in a fine-grained, machinereadable way. To overcome this difficulty, we build on top of an existing SciIE dataset and align it with a source of citation graph information, which finally allows us to explore citation-aware SciIE. Architecturally, we adopt the neural multi-task model introduced by Jain et al. (2020) , and establish a proof of concept by comparing simple ways of incorporating the network structure and textual content of the citation graph into this model. Experimentally, we rigorously evaluate our methods, which we call CitationIE, on three tasks: mention identification, salient entity classification, and document-level relation extraction. We find that leveraging citation graph information provides significant improvements in the latter two tasks, in-cluding a 10 point improvement on F1 score for relation extraction. This leads to a sizable increase in the performance of the end-to-end CitationIE system relative to the current state-of-the-art, Jain et al. (2020) . We offer qualitative analysis of why our methods may work in §5.3. 2 Document-level Scientific IE We consider the task of extracting document-level relations from scientific texts. Most work on scientific information extraction has used annotated datasets of scientific abstracts, such as those provided for SemEval 2017 and Se-mEval 2018 shared tasks (Augenstein et al., 2017; Gábor et al., 2018) , the SciERC dataset (Luan et al., 2018) , and the BioCreative V Chemical Disease Relation dataset (Wei et al., 2016) . We focus on the task of open-domain documentlevel relation extraction from long, full-text documents. This is in contrast to the above methods that only use paper abstracts. Our setting is also different from works that consider a fixed set of candidate relations (Hou et al., 2019; Kardas et al., 2020) or those that only consider IE tasks other than relation extraction, such as entity recognition (Verspoor et al., 2011) . We base our task definition and baseline models on the recently released SciREX dataset (Jain et al., 2020) , which contains 438 annotated papers, 3 all related to machine learning research. Each document consists of sections D = {S 1 , . . . , S N }, where each section contains a sequence of words S i = {w i,1 , . . . , w i,N i }. Each document comes with annotations of entities, coreference clusters, cluster-level saliency labels, and 4-ary document-level relations. We break down the end-to-end information extraction process as a sequence of these four related tasks, with each task taking the output of the preceding tasks as input. Mention Identification For each span of text within a section, this task aims to recognize if the span describes a Task, Dataset, Method, or Metric entity, if any. Coreference This task requires clustering all entity mentions in a document such that, in each cluster, every mention refers to the same entity (Varkel and Globerson, 2020) . The SciREX dataset includes coreference annotations for each Task, Dataset, Method, and Metric mention. Salient Entity Classification Given a cluster of mentions corresponding to the same entity, the model must predict whether the entity is key to the work described in a paper. We follow the definition from the SciREX dataset (Jain et al., 2020) , where an entity in a paper is deemed salient if it plays a role in the paper's evaluation. The ultimate task in our IE pipeline is relation extraction. We consider relations as 4-ary tuples of typed entities (E Task , E Dataset , E Method , E Metric ), which are required to be salient entities. Given a set of candidate relations, we must determine which relations are contained in the main result of the paper. We base our work on top of the model of Jain et al. (2020) , which was introduced as a strong baseline accompanying the SciREX dataset. We refer the reader to their paper for full architectural details, and briefly summarize their model here. This multi-task model performs three of our tasks (mention identification, saliency classification, and relation extraction) in a sequence, treating coreference resolution as an external black box. While word and span representations are shared across all tasks and updated to minimize multi-task loss, the model trains each task on gold input. Figure 2 summarizes the baseline model's end-to-end architecture, and highlights the places where we propose improvements for our CitationIE model. The model extracts features from raw text in two stages. First, contextualized word embeddings are obtained for each section by running SciBERT (Beltagy et al., 2019) on that section of text (up to 512 tokens). Then, the embeddings from all words over all sections are passed through a bidirectional LSTM (Graves et al., 2005) to contextualize each word's representation with those from other sections. Mention Identification The baseline model treats this named entity recognition task as an IOBES sequence tagging problem (Reimers and Gurevych, 2017) . The tagger takes the SciBERT-BiLSTM (Beltagy et al., 2019; Graves et al., 2005) word embeddings (as shown in the Figure 2 ), feeds them through two feedforward networks (not shown in Figure 2 ), and produces tag potentials at each word. These are then passed to a CRF (Lafferty et al., 2001) which predicts discrete tags. Span Embeddings For a given mention span, its span embedding is produced via additive attention (Bahdanau et al., 2014) over the tokens in the span. Coreference Using an external model, pairwise coreference predictions are made for all entity mentions, forming coreference clusters. Saliency is a property of entity clusters, but it is first predicted at the entity mention level. Each entity mention's span embedding is simply passed through two feedforward networks, giving a binary saliency prediction. To turn these mention-level predictions into cluster-level predictions, the predicted saliency scores are max-pooled over all mentions in a coreference cluster to give cluster-level saliency scores. The model treats relation extraction as binary classification, taking as input a set of 4 typed salient entity clusters. For each entity cluster in the relation, per-section entity cluster representations are computed by taking the set of that entity's mentions in a given section, and max-pooling over the span embeddings of these mentions. The four entity-section embeddings (one for each entity in the relation) are then concatenated and passed through a feedforward network to produce a relation-section embedding. Then, the relation-section embeddings are averaged over all sections and passed through another feedforward network which returns a binary prediction. Although citation network information has been shown to be effective in other tasks, few works have recently tried using it in SciIE systems. One potential reason is the lack of a suitable dataset. Thus, as a first contribution of this paper, we address this bottleneck by constructing a SciIE dataset that is annotated with citation graph information. 4 Specifically, we combine the rich annotations of SciREX with a source of citation graph information, S2ORC (Lo et al., 2020) . For each paper, S2ORC includes parsed metadata about which other papers cite this paper, which other papers are cited by this paper, and locations in the body text where reference markers are embedded. To merge SciREX with S2ORC, we link records using metadata obtained via the Semantic Scholar API: 5 paper title, DOI string, arXiv ID, and Semantic Scholar Paper ID. For each document in SciREX, we check against all 81M documents in S2ORC for exact matches on any of these identifiers, yielding S2ORC entries for 433 out of 438 documents in SciREX. The final mapping is included in our repository for the community to use. Though our work only used the SciREX dataset, our methods can be readily extended to other SciIE datasets (including those mentioned in §2.1) using our released software. Statistics Examining the distribution of citations for all documents in the SciREX dataset (in Figure 3) , we observe a long-tailed distribution of citations per paper, and a bell-shaped distribution of references per paper. In addition to the 5 documents we could not match to the S2ORC citation graph, 7 were incorrectly recorded as containing no references and 5 others were incorrectly recorded as having no citations. These errors are due to data issues in the S2ORC dataset, which relies on PDF parsers to extract information (Lo et al., 2020) . We now describe our citation-aware scientific IE architecture, which incorporates citation information into mention identification, salient entity classification, and relation extraction. For each task, we consider two types of citation graph information, either separately or together: (1) structural information from the graph network topology and (2) textual information from the content of citing and cited documents. The structure of the citation graph can contextualize a document within the greater body of work. Prior works in scientific information extraction have predominantly used the citation graph only to analyze the content of citing papers, such as Cite-TextRank (Das Gollapalli and and Citation TF-IDF , which is described in detail in §4.2.2. However, the citation graph can be used to discover relationships between non-adjacent documents in the citation graph; prior works struggle to capture these relationships. Arnold and Cohen (2009) are the only prior work, to our knowledge, to explicitly use the citation graph's structure for scientific IE. They predict key entities related to a paper via random walks on a combined knowledge-and-citation-graph consisting of papers and entities, without considering a document's content. This approach is simple but cannot generalize to new or unseen entities. A rich direction of recent work has studied learned representations of networks, such as social networks (Perozzi et al., 2014) and citation graphs (Sen et al., 2008; Yang et al., 2015; Bui et al., 2018; Khosla et al., 2021) . In this paper, we show citation graph embeddings can improve scientific information extraction. To construct our citation graph, we found all nodes in the S2ORC citation graph within 2 undirected edges of any document in the SciREX dataset, including all edges between those documents. This process took 10 hours on one machine due to the massive size of the full S2ORC graph, resulting in a graph with ∼1.1M nodes and ∼5M edges. Network Representation Learning We learn representations for each node (paper) using Deep-Walk 6 (Perozzi et al., 2014) via the GraphVite library (Zhu et al., 2019) , resulting in a 128dimensional "graph embedding" for each document in our dataset. For each task, we incorporate the document-level graph embedding into that task's model component, by simply concatenating the document's graph embedding with the hidden state in that component. We do not update the graph embedding values during training. Incorporating Graph Embedding Each task in our CitationIE system culminates in a pair of feedforward networks. Figure 4 describes this general architecture, though the input to these networks varies from task to task (SciBERT-BiLSTM embeddings for mention identification, span embeddings for salient entity classification, and per-section relation embeddings for relation extraction). This architecture gives two options for where to concatenate the graph embedding into the hidden state -Stage 1 or Stage 2 -marked with a light blue block in Figure 4 . Intuitively, concatenating the graph embedding in a later stage feeds it more directly into the final prediction. We find Stage 1 is superior for relation extraction, and both perform comparably for salient entity classification and mention identification. We give details on this experiment in Appendix A.3. Most prior work using the citation graph for SciIE has focused on using the text of citing papers. We examine how to use two varieties of textual information related to citations. Citation sentences, also known as "citances" (Nakov et al., 2004) , provide an additional source of textual context about a paper. They have seen use in automatic summarization (Yasunaga et al., 2019) , but not in neural information extraction. In our work, we augment each document in our training set with its citances, treating each citance as a new section in the document. In this way, we incorporate citances into our CitationIE model through the shared text representations used by each task in our system, as shown in Figure 5 . If our document has many citations, we randomly sample 25 to use. For each citing document, we select citances centered on the sentence containing the first reference marker pointing to our document of interest, and include the subsequent and consequent sentences if they are both in the same section. We ensure the mention identification step does not predict entities in citance sections, which would lead to false positive entities in downstream tasks. Citation TF-IDF , is a feature representing the TF-IDF value (Jones, 1972) of a given token in its document's citances. We consider a variant of this feature: for each token in a document, we compute the TF-IDF of that token in each citance of the document, and average the percitance TF-IDF values over all citances. We imple- mented this feature only for saliency classification, as it explicitly reasons about the significance of a token in citing texts. As a local token-level feature, it also does not apply naturally to relation extraction, which operates on entire clusters of spans. We lastly consider using graph embeddings and citances together in a single model for each task. We do this naively by including citances with the document's input text when first computing shared text features, and then concatenating graph embeddings into downstream task-specific components. The ultimate product of our work is an end-to-end document-level relation extraction system, but we also measure each component of our system in isolation, giving end-to-end and per-task metrics. All metrics, except where stated otherwise, are the same as described by Jain et al. (2020) . We evaluate mention identification with the average F1 score of classifying entities of each span type. Jain et al. (2020) we evaluate this task at the mention level and cluster level. We evaluate both metrics on gold standard entity recognition inputs. Relation Extraction This is the ultimate task in our pipeline. We use its output and metrics to evaluate the end-to-end system, but also evaluate relation extraction separately from upstream components to isolate its performance. We specifically consider two types of metrics: (1) Document-level: For each document, given a set of ground truth 4-ary relations, we evaluate a set of predicted 4-ary relations as a sequence of binary predictions (where a matching relation is a true positive). We then compute precision, recall, and F1 scores for each document, and average each over all documents. We refer to this metric as the "document-level" relation metric. To compare with Jain et al. (2020) , this is the primary metric to measure the full system. (2) Corpus-level: When evaluating the relation extraction component in isolation, we are also able to use a more standard "corpus-level" binary classification evaluation, where each candidate relation from each document is treated as a separate sample. We also run both these metrics on a binary relation extraction setup, by flattening each set of 4-ary relations into a set of binary relations and evaluating these predictions as an intermediate metric. For each task, we compare against Jain et al. (2020) , whose architecture our system is built on. No other model to our knowledge performs all the tasks we consider on full documents. For the 4-ary relation extraction task, we also compare against the Doc-TAET model (Hou et al., 2019) , which is considered as state-of-the-art for full-text scientific relation extraction (Jain et al., 2020; Hou et al., 2019) . Significance To improve the rigor of our evaluation, we run significance tests for each of our proposed methods against its associated baseline, via paired bootstrap sampling (Koehn, 2004) . In experiments where we trained multiple models with different seeds, we perform a hierarchical bootstrap procedure where we first sample a seed for each model and then sample a randomized test set. We build our proposed CitationIE methods on top of the SciREX repository 7 (Jain et al., 2020) in the AllenNLP framework (Gardner et al., 2018) . For each task, we first train that component in isolation from the rest of the system to minimize the task-specific loss. We then take the best performing modifications and use them to train endto-end IE models to minimize the sum of losses from all tasks. We train each model on a single GPU with batch size 4 for up to 20 epochs. We include detailed training configuration information in Appendix A.1. For saliency classification and relation extraction, we trained the baseline and the strongest proposed models three times, 8 to improve reliability of our results. For mention identification, we did not retrain models, as the first set of results strongly suggested our proposed methods were not helpful. Mention Identification For mention identification, we observe no major performance difference from using citation graphs, and include full results in Appendix A.2. Table 1 shows the results of our CitationIE methods. We observe: (1) Using citation graph embeddings significantly improves the system with respect to the salient mention metric. (2) Graph embeddings do not improve cluster evaluation significantly (at 95%) due to the small test size 10 (66 samples) and inter-model variation. (3) Incorporating graph embeddings and citances simultaneously is no better than using either. (4) Our reimplemented baseline differs from the results reported by Jain et al. (2020) despite using their published code to train their model. This may be because we use a batch size of 4 (due to compute limits) while they reported a batch size of 50. Table 2 shows that using graph embeddings here gives an 11.5 point improvement in document-level F1 over the reported baseline, 11 and statistically significant gains on both corpus-level F1 metrics. Despite seemingly large gains on the documentlevel F1 metric, these are not statistically significant due to significant inter-model variability and small test set size, despite the graph embedding model performing best at every seed we tried. Table 3 , we observe: (1) Using graph embeddings appears to have a positive effect on the main task of 4-ary relation extraction. However, these gains are not statistically significant (p = 0.235) despite our proposed method outperforming the baseline at every seed, for the same reasons as mentioned above. (2) On binary relation evaluation, we observe smaller improvements which had a lower p-value (p = 0.099) due to lower inter-model variation. (3) Using citances instead of graph embeddings still appears to outperform the baseline (though by a smaller margin than the graph embeddings). We analyzed our experimental results, guided by the following four questions: Do papers with few citations benefit from citation graph information? Our test set only contains two documents with zero citations, so we cannot characterize performance on such documents. However, Figure 6 shows that the gains provided by the proposed CitationIE model with graph embeddings counterintuitively shrink as the number of citations of a paper increases. We also observe 10 The limited size of this test set is an area of concern when using the SciREX dataset, and improving statistical power in SciIE evaluation is a crucial area for future work. 11 The large gap between reimplemented and reported baselines is likely due to our reproduced results averaging over 3 random seeds. When using the same seed used by Jain et al. (2020) , the baseline's document-level test F1 score is almost 20 points better than with two other random seeds. this with citances, to a lesser extent. This suggests more work needs to be done to represent citation graph nodes with many edges. How does citation graph information help relation extraction? With relation extraction, we found citation graph information provides strongest gains when classifying relations between distant entities in a document, seen in Figure 7 . For each relation in the test set, we computed the average distance between pairs of entity mentions in that relation, normalized by total document length. We find models with graph embeddings or citances perform markedly better when these relations span large swaths of text. This is particularly useful since neural models still struggle to model longrange dependencies effectively (Brown et al., 2020) . Does citation graph information help contextualize important terms? Going back to our motivating example of a speech paper referring to Ima-geNet in passing §1, we hypothesized that adding context from citations helps deal with terms that are important in general, but not for a given document. To measure this, we grouped all entities in our test dataset by their "global saliency rate" measured on the test set: given a span, what is the probability that this span is salient in any given occurrence? In Figure 8 , we observe that most of the improvement from graph embeddings and citances comes at terms which are labeled as salient in at least 20% of their training-set mentions. This suggests that citation graph information yields improvements with reasoning about important terms, without negatively interfering with less-important terms. We explore the use of citation graph information in neural scientific information extraction with Cita-tionIE, a model that can leverage either the structure of the citation graph or the content of citing or cited documents. We find that this information, combined with document text, leads to particularly strong improvements for salient entity classification and relation extraction, and provides an increase in end-to-end IE system performance over a strong baseline. Our proposed methods reflect some of the simplest ways of incorporating citation graph information into a neural SciIE system. As such, these results can be considered a proof of concept. In the future we will explore ways to extract richer information from the graph using more sophisticated techniques, hopefully better capturing the interplay between citation graph structure and content. Finally, we evaluated our proof of concept here on a single dataset in the machine learning domain. While our methods are not domain-specific, verifying that these methods generalize to other scientific domains is important future work. We train each model on a single 11GB NVIDIA GeForce RTX 2080 Ti GPU with a batch size of 4. We train for up to 20 epochs, and set the patience parameter in AllenNLP to 10; if the validation metric does not improve for 10 consecutive epochs, we stop training early. For each taskspecific model, we use a product of validation loss and corpus-level binary F1 score on the validation set as the validation metric. For salient entity classification and relation extraction, we choose the best threshold on the validation set using F1 score. In total, training with these configurations takes roughly 2 hours for salient entity classification, 8 hours for mention identification, 18-24 hours for relation extraction, and 24-30 hours for the end-toend system. Our CitationIE models took roughly as long to train as the baseline SciREX models did. For models that we trained three different times, we use different seeds for each software library: • For PyTorch, we use seeds 133, 12 We include results from using citation graph information for the mention identification task in Table 4. We observe no major improvements in this task. Intuitively, recognizing a named entity in a document may not require global context about the document (e.g. "LSTM" almost always refers to a Method, regardless of the paper where it is used), so the lack of gains in this task is unsurprising. 12 133/1337/13370 is the default seed setting in AllenNLP. Each of our task-specific components in the Cita-tionIE model contains two feedforward networks where we may concatenate graph embedding information. We refer to these two options for where to fuse graph embedding information as "early fusion" and "late fusion", illustrated in Figure 4 . Here we show a detailed comparison of early fusion vs late fusion models on Mention Identification (Table 5) , Salient Entity Classification (Table 6), and Relation Extraction (Table 7) . Based on these results, we used early fusion in our final CitationIE models for mention identification and relation extraction. For saliency classification, the relative performance of early fusion and late fusion differed across our two metrics, making this inconclusive. We used early fusion for saliency classification in the end-to-end model due to strong empirical performance there. Table 7 : Comparing CitationIE models for relation extraction with early graph embedding fusion vs late fusion. Early fusion models were trained 3 times, late fusion was trained once. † indicates significance at 95% confidence, and the best model in each metric is bolded. Information extraction as link prediction: Using curated citation networks to improve gene detection ScienceIE -extracting keyphrases and relations from scientific publications Neural machine translation by jointly learning to align and translate ACM International Conference on Web Search and Data Mining Citationenhanced keyphrase extraction from research papers: A supervised approach Extracting keyphrases from research papers using citation networks SemEval-2018 task 7: Semantic relation extraction and classification in scientific papers AllenNLP: A deep semantic natural language processing platform Bidirectional lstm networks for improved phoneme classification and recognition Analyzing the dynamics of research by extracting key aspects of scientific papers The covid-19 deluge: Is it time for a new model of data disclosure? ASBMB Today: The Member Magazine of the American Society for Biochemistry and Molecular Biology Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction SciREX: A challenge dataset for document-level information extraction A statistical interpretation of term specificity and its application in retrieval AxCell: Automatic extraction of results from machine learning papers A comparative study for unsupervised network representation learning Statistical significance tests for machine translation evaluation Conditional random fields: Probabilistic models for segmenting and labeling sequence data S2ORC: The semantic scholar open research corpus Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction A general framework for information extraction using dynamic span graphs Citances: Citation sentences for semantic analysis of bioscience text Deepwalk: Online learning of social representations Optimal hyperparameters for deep lstm-networks for sequence labeling tasks The IBM 2016 english conversational telephone speech recognition system Collective classification in network data Pre-training mention representations in coreference models A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools Assessing the state of the art in biomedical relation extraction: overview of the biocreative v chemical-disease relation (cdr) task. Database : the journal of biological databases and curation Network representation learning with rich text information Scisummnet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks Predicting a scientific community's response to an article Graphvite: A high-performance cpu-gpu hybrid system for node embedding The authors thank Sarthak Jain for assisting with reproducing baseline results, Bharadwaj Ramachandran for giving advice on figures, and Siddhant Arora and Rishabh Joshi for providing suggestions on the paper. The authors also thank the anonymous reviewers for their helpful comments. This work was supported by the Air Force Research Laboratory under agreement number FA8750-19-2-0200. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory or the U.S. Government.