key: cord-0557205-xma5jfhy authors: Tatro, N. Joseph; Das, Payel; Chen, Pin-Yu; Chenthamarakshan, Vijil; Lai, Rongjie title: Learning Geometrically Disentangled Representations of Protein Folding Simulations date: 2022-05-20 journal: nan DOI: nan sha: ac6b10fcb6869769d4f7c75a4ad8f46bba65e99f doc_id: 557205 cord_uid: xma5jfhy Massive molecular simulations of drug-target proteins have been used as a tool to understand disease mechanism and develop therapeutics. This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein, e.g. SARS-CoV-2 Spike protein, obtained from computationally expensive molecular simulations. Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules, as well as efficient generation of protein conformations that can serve as an complement of a molecular simulation engine. Specifically, we present a geometric autoencoder framework to learn separate latent space encodings of the intrinsic and extrinsic geometries of the protein structure. For this purpose, the proposed Protein Geometric AutoEncoder (ProGAE) model is trained on the protein contact map and the orientation of the backbone bonds of the protein. Using ProGAE latent embeddings, we reconstruct and generate the conformational ensemble of a protein at or near the experimental resolution, while gaining better interpretability and controllability in term of protein structure generation from the learned latent space. Additionally, ProGAE models are transferable to a different state of the same protein or to a new protein of different size, where only the dense layer decoding from the latent representation needs to be retrained. Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations, charting the path toward scalable and improved approaches for analyzing and enhancing high-cost simulations of drug-target proteins. Understanding the protein conformational landscape is critical, as protein functionsare intimately connected with structural variations. Recently, deep learning based models have made impressive progress in accurately predicting protein structures [1] . There has been also interest in modeling the underlying conformational space of proteins by using deep generative models, e.g. [2] and [3] [4] [5] . This line of work has mainly attempted to respect the domain geometry by using convolutional AEs on features extracted from 3D structures. In parallel, learning directly from 3D structure has recently developed into an exciting and promising application area for deep learning. In this work, we learn a model of the protein conformational space from a set of protein simulations by using geometric deep learning. We also investigate how the geometry of a protein itself can assist learning and improve interpretability of the latent conformational space. Namely, we consider the distinct influence of the intrinsic and extrinsic geometries. The intrinsic geometry captures the global (slow timescale) structural information of a protein, whereas the extrinsic geometry accounts for the local (fast timescale) structural variations. The structural variations can be further induced due to interactions with the external environment (drug molecule in the current case). Intrinsic geometric properties can be thought to be more robust to minor protein conformational change. To this end, we propose a Protein Geometric Autoencoder model, named ProGAE, to separately encode intrinsic and extrinsic protein geometries and subject that to understanding the structural landscape of established drug-target proteins, such as COVID-19 target proteins and GPCRs. The main contributions of this work are summarized: • Inspired by recent unsupervised geometric disentanglement learning works [6] [7] [8] , we propose a novel geometric autoencoder named ProGAE that directly learns from 3D protein structures via separately encoding intrinsic and extrinsic geometries into disjoint latent spaces used to generate protein structures. We propose a novel formulation, in which network intrinsic input is taken as the protein contact map, and the extrinsic input is the backbone bond orientations. • We find that the intrinsic geometric latent space improves the quality of the reconstructed proteins. Experiments confirm that the extrinsic geometry of proteins generated by ProGAE accounts for the structural variations due to specific drug interaction. This allows for better latent space interpretability and controllable generation. • Analysis shows the learned extrinsic geometric latent space can be used for drug classification and drug property prediction, where the drug is bound to the given protein. We also demonstrate that the learned ProGAE can be transferred to a trajectory of the protein in a different state or a trajectory of a different protein all-together. (a) S protein Figure 3 : The projection of the latent space embedding to the first two canonical vectors between the intrinsic and extrinsic latent spaces. Color indicates the identity of the drug that the protein is bound to in that conformation. Clustering by drug identity is apparent in the extrinsic latent space, but much weaker in the intrinsic latent space, consistent with Table 2 . Recently, a body of work has used deep learning to learn from protein structures [9, 10] . For example, [11] uses geometric deep learning to predict docking sites for protein interactions. [12] leverages the notion of intrinsic and extrinsic geometry to define an architecture for a fold classification task. Additionally, there has been focus on directly learning the temporal aspects of molecular dynamics from simulation trajectories, which is not directly related to the current work. Please see Appendix A.1 for a detailed discussion. Several recent papers use AE-based approaches for either analyzing and/or generating structures from the latent space [2] [3] [4] [5] , which are most closely related to this work. [3] and [4] aim at learning from and generating protein contact maps, while ProGAE directly deals with 3D structures. Therefore a direct comparison of ProGAE with these methods is not possible. [5] uses a VAE with a Gaussian Mixture Prior for performing clustering of high-dimensional input configurations in the learned latent space. While the method works well on toy models and a standard Alanine Dipeptide benchmark, its performance drops as the size of the protein system grows to 15 amino acids, which is approximately an order smaller than the protein systems studied here. [13] trains a 1D CNN autoencoder on backbone (including beta carbon) coordinates and uses a loss objective comprised of geometric MSE error and physics-based (bond length, bond angle, nonbonded) error. Due to the unavailability of code or pre-trained model, we were unable to perform a direct comparison. Nevertheless, we run ProGAE on the same MurD protein simulations studied in [13] and compare the reconstruction quality with respect to the value reported in that study as well as to the experimental resolution. None of these works has considered explicit disentangling of intrinsic and extrinsic geometries. To the best of our knowledge, this work is the first to propose an autoencoder for the unsupervised modeling of the geometric disentanglement of protein conformational space captured in molecular simulations. This representation provides better interpretability of the latent space, in terms of the physico-chemical and geometric attributes, results in more geometrically accurate protein conformations, as well as scales and transfers well to larger protein systems. First, we introduce the input signals for our novel geometric autoencoder, ProGAE. We then discuss how ProGAE utilizes these to generate the disentangled space. Geometric Features of Protein as Network Input ProGAE separately encodes intrinsic and extrinsic geometry with the goal of achieving better latent space interpretability. We clarify these geometric notions. Mathematically, we can consider a manifold (i.e. surface) independent of its embedding in Euclidean space. Properties that do not depend on an embedding are known as intrinsic geometric properties, with others referred to as extrinsic. As an example, suppose we approximate a protein structure via a graph, G, where certain atoms are connected by edges. Then, the lengths of edges in this graph are intrinsic, as they are not explicitly dependent on the 3D embedding of said graph. On the other hand, the orientations of the edges of the embedded graph are extrinsic. As we will train ProGAE to learn the conformational space of a given protein, the protein primary structure is implicit. Then in treating it geometrically, we view the protein at the level of its backbone, which specifies its shape. 1.76 ± 0.18 1.00 ± 0.09 1.00 ± 0.15 -SASA (Å) −6.8 ± 0.1% --- Table 1 : Average atom-wise L2 error on the training/test sets, as well as RMSD between reconstructed and true structures, using ProGAE. The Jaccard index of the reconstructed binarized contact map with a cutoff of 8Å is provided. The RMSD on test set are within the resolution of the associated PDB files. We also show the relative error in surface-accessible surface area of the binding pocket of the S protein. Drug classification S protein hACE2 Given primary structure, reconstructing this backbone is sufficient for reconstructing the detailed protein structure. Of importance in the backbone are the C α atoms, which are the centers of amino acids in the protein. A coarse characterization of the backbone is the protein contact map, an incomplete distance matrix between all C α atoms that contains all distances less than a specified threshold, typically between 6.5Å and 12Å [14] . As a thresholded distance matrix, the contact map defines a graph structure on a protein conformation that we will refer to as the contact graph. For our purposes, the contact graph will exclude edges between residues i and i + j for j ≤ 3. We will use the backbone and contact graph as domains for defining our signals. Both the protein backbone and contact graph can be viewed as polygonal chain in Euclidean space. They are depicted in Figure 1 with their geometric features as network input. We see that a polygonal chain is determined up to translation given both the length and orientation of its line segments. Then it follows that the backbone can be determined given the length and orientation of its bonds. Here the length of these bonds is intrinsic while the orientation is extrinsic. Thus, to explicitly decouple the intrinsic and extrinsic geometry, we consider separately encoding these signals. We note that the length of covalent bonds undergo very little change during a simulation performed using an empirical force-field, like those in this work. As a result, a standard deviation of less than 0.059Å from target bond lengths is common in PDB structures [15] . Thus we define intrinsic geometry at a coarse level, so the resulting signal has more variability. Specifically, we use lengths of edges in the contact graph as a representative of the intrinsic protein geometry, whereas backbone bond orientations capture extrinsic geometry. Formally, we model the backbone by the graph, , and the contact graph by the graph, Then our intrinsic and extrinsic signals, Int : E t → R and Ext : E b → R 3 are: Network Architecture With inputs defined, we discuss the architecture of ProGAE. The core idea is to create an intrinsic latent space, L I ∈ R ni , and an extrinsic latent space, L E ∈ R ne , of dimensions n i , n e respectively, via separately encoding the intrinsic and extrinsic signals. Consequently, our network contains two encoders, Enc i and Enc e where: Table 3 : Results of linear regression on the extrinsic latent space for predicting physical and chemical properties of the drugs that a protein is bound to. Error is normalized for interpretability. For comparison, performance of linear regression on the PCA embeddings of the orientation of the backbone bonds is reported. This embedding is restrained to the same dimension as the latent space. We then jointly decode these latent vectors to recover the coordinates of the atoms in the protein backbone. Thus, we formally define the decoder, Dec : This high level structure of ProGAE is depicted in Figure 1 . We provide additional details on the encoders and decoders. Specific details on layer widths and other parameters can be found in Appendix A.3. As the edge-based signals are defined on a geometric domain, it is sensible to learn feature representations using convolutions that respect the geometry of the data. As the intrinsic encoder and the extrinsic encoder operate on graphs, the layers of graph attention networks (GATs) introduced in [16] are a natural tool to use, albeit with some modification. Since the input signal is defined only on the edges of the graph, E b , we define a signal on the graph vertices, V b , as the average value of its incident edges, Then the first layer of each encoder uses the edge-convolution operator of [17] to map this edge-defined signal to a vertex-defined signal. The following layers of the extrinsic encoder contains successive graph attention layers with sparsity defined by a given neighborhood radius. At each layer, the signal is downsampled by a factor of two based on farthest point sampling. Given L layers, this defines a sequence of graphs, {G b,i } L i=0 , with increasing decimation. Each layer is followed with batch normalization and ReLU. Summarily, for l = 2, ..., L, The following layers of the intrinsic encoder are analogous, though we forgo downsampling. Global average pooling is applied to the encoder outputs to introduce invariance to size of V t and V b . Dense layers then map each result to their respective latent spaces, L I and L E . The Tanh function is applied to bound the latent space. This produces the intrinsic and extrinsic latent codes, z i and z e . The latent code z is taken as the concatenation of the two latent codes, [z i , z e ]. A dense layer maps z to the a signal defined on the most decimated backbone graph, G b,L . The structure of the decoder, Dec, mirrors Enc e with convolutions mapping to upscaled graphs. The output of Dec is the point cloud,P , corresponding to the predicted coordinates of the backbone atoms, V b ≈ P . The loss function is a basic reconstruction loss, where P andP are taken to be the true and predicted coordinates of the protein backbone atoms. Namely, we evaluate their difference using Smooth-L 1 loss, SL 1 . This loss, This loss is less sensitive to outliers [18] . In this section, we describe the setup of our experiments that confirm the usefulness of ProGAE in generating the conformational space. For each dataset, we train three models, each from a different random seed, and report both mean and standard deviation in our results. Our networks are trained on 3 Nvidia GeForce GTX 1080 Ti GPUs. Figure 4 : RMSD of proteins generated along the latent interpolation between two S proteins randomly sampled from different trajectories (see also Figure 7 for hACE2 experiments). The RMSDs are computed with respect to the endpoint proteins, with standard error shown. We see a smooth interpolation between the RMSD errors as desired. Examples of the structures along the interpolated path can be found in the appendix in Figure 8 . Extrinsic Only 9.68 ± 0.09 E-1 4.79 ± 0.01 E-1 Int. and Ext. 9.42 ± 0.08 E-1 4.68 ± 0.00 E-1 Datasets Datasets used in this work are atomistic simulation trajectories of drug-target proteins reported in [19] and [20] . The two main datasets from [19] used are simulations of proteins in presence of FDA approved or underinvestigation molecules, as we aim to test the performance of ProGAE on capturing drug-induced structural variations. This includes the S protein of SARS-CoV-2 and human-ACE2 protein, the former responsible for binding to the later. To show that a single ProGAE model can be trained on multiple different proteins, we also utilize simulations of 38 different G protein-coupled receptors from the GPCRmd dataset [20] . More information on these datasets is included in Appendix A.4. For comparing with existing work, we run ProGAE on MurD protein simulation data [13] . For transfer learning, we also consider two trajectories of the entire S protein containing 13,455 backbone atoms from [19] . One trajectory is initiated from a closed state, while the other from a partially open state. Structure Reconstruction Figure 2 displays the ability of ProGAE to accurately reconstruct conformations. The backbones are visible with atom-wise error in Figures 6a and 6b . From the visualized atom-wise L 2 reconstruction error, it is clear that our network can capture and reconstruct notable conformational changes of a protein. Figures 9a and 9b in the appendix display these reconstructions with color denoting fragment for clarity. In line with the low RMSD error, reconstructed structures appear consistent with ground truths, with larger RMSDs observed in the loop and turn regions. Table 1 contains performance metrics of ProGAE. Generalization is measured by the L 2 reconstruction error of the backbone atom coordinates, as well as RMSD (root mean square distance) after alignment. For hACE2, we achieve sub-Angstrom performance on the test set. In either case, the RMSD of the reconstruction is within the experimental resolution of the associated PDB files; 6VXX/6VW1 for the S protein and 1R42/1R4L for hACE2. Additionally, the average error in the length of the pseudobonds is also sub-Angstrom. Thus, ProGAE is able to reconstruct proteins Source data Closed S Open S Protease Baseline 1.55 ± 0.00 1.76 ± 0.00 1.14 ± 0.00 S protein 1.31 ± 0.01 1.41 ± 0.03 0.96 ± 0.00 hACE2 1.30 ± 0.00 1.42 ± 0.01 0.93 ± 0.00 within meaningful resolution. The RMSD (on secondary structure elements) and L 2 error on the benchmark MurD test data are again lower or comparable to the experimental resolution and within the range of what has been reported in the original study that uses more explicit loss (bond, angle, nonbonded) terms, compared to ProGAE, to preserve protein geometry. Additionally, we show in Table 1 that the reconstructed proteins have a large share of their ground-truth contacts recovered. This is measured via the Jaccard index on the binarized contact map using a threshold of 8.0Å. Given the set of ground truth contacts, C t , and the set of reconstructed contacts, C r , the Jaccard index is |C t ∩ C r |/|C t ∪ C r |. Then there is high overlap between the sets of true and ProGAE reconstructed contacts . With the reconstruction capabilities of ProGAE verified, we consider the benefit of separated intrinsic and extrinsic latent spaces. First, we explore the statistical relationship between the learned intrinsic latent space and the extrinsic latent space. Canonical correlation analysis (CCA) is a natural approach to assess if a linear relationship exists [21] (see Appendix A.2). Table 2 includes the leading correlation between the intrinsic and extrinsic latent spaces for each dataset, showing that the correlation between intrinsic and extrinsic latent space is non-negligible. However, we find that the extrinsic latent space is much more linearly separable concerning conformational properties reflective of specific drug binding. As stated earlier, each trajectory in the dataset corresponds to the S or hACE2 protein bound to a specific drug. Then it is natural to investigate if this distinct drug information is encoded in the two latent spaces. Table 2 contains the performance of a linear classifier trained via one-shot learning on the different latent spaces to classify the drug present in each frame. It is clear that the drug molecule can be almost perfectly classified in the extrinsic latent space, while such classification is much weaker in the intrinsic latent space. Figures 3a and 5a display the embeddings of the test set in the latent spaces, projected to the first two canonical components. Color denotes the identity of the drug that the protein is bound to. Even in the 2D projection of the extrinsic latent space, better clustering by the drug identity is apparent. To check if this linear separation is chemically meaningful, we train a linear regression model on the extrinsic latent space to predict physico-chemical properties of a drug binding to a protein. Table 3 displays the performance of the model at predicting the properties of molecular weight, hydrogen bond donor count, and topological polar surface area. For comparison to our latent embedding, we train a linear regression model on the first n e principal component scores of the PCA of the extrinsic signal on each element of the test dataset. The latent regression outperforms that of PCA, indicating that the extrinsic latent embedding captures more physico-chemical information about the bound drug. We believe this linear regression is appropriate to prevent overfitting. We now weigh the benefits of including the intrinsic latent space in the model. We find the inclusion of the intrinsic latent space improves the performance of the learned network during training. To see this, we trained a model that only encodes the extrinsic signal to reconstruct the protein. While it was comparable in performance regarding L 2 error, we found this extrinsic-only model resulted in a higher training loss on the test set. This is shown in Table 4 . Geometry and Steerability of the Disentangled Latent Space For a generative model, it is important to consider if paths in the latent space are smooth. We evaluate the performance of linear interpolations in the learned latent space. Given two protein conformations from different trajectories (i.e. in the context of two different drugs), we generate a path between them by generating the linear interpolation of their latent codes. This provides a path of structural variation that does not exist in the training data. The results of this interpolation in terms of RMSD is shown in Figure 4 . A smooth exchange in the RMSD error of the generated protein structures from the two endpoints is evident. Transfer Learning-Extension to Different Proteins To check the generalization of ProGAE, we investigate transfer learning to simulations of different proteins. We begin with models trained on the S protein comprised of the 3 RBDs and on hACE2. These results are summarized in Table 5 . We transfer learned ProGAE models to trajectories of the closed and partially open state of the entire S protein, as well as SARS-COV-2 main Protease, which provides insight into the generalization capability of the learned convolutional filters. As a result, six scenarios of the transfer learning, in addition to three random baselines, are reported in Table 5 . When transferring the model trained on the 3 RBDs of the S protein to the S protein in the closed state, we are transferring the model learned on a partial structure to the entire protein that is much larger in size. Model transfer to the S protein in the partially open state deals with a scenario where the conformational state of the protein is notably different (closed vs partially open). Transferring the model trained on hACE2 to the S protein datasets studies the knowledge transfer to an entirely different protein, but one which hACE2 is known to interact with. Finally, transferring both S protein and hACE2 models to the main Protease simulation allows us to study the transfer to a completely different protein without notable interaction with the source protein. As performing long time-scale simulations of large protein systems at high resolution is computationally expensive, our method appears beneficial, as ProGAE transfers well to non-related proteins of larger size. The only incompatible layer is the dense layer mapping from the latent spaces. To investigate transfer learning, we train just this dense layer for 10 epochs. As a baseline, we train the same layer of a randomly initialized model. In all cases, the transferred model performs better than the baseline. Thus the learned filters generalize to trajectories of different protein systems. Results on transferring only intrinsic/extrinsic filters are in the appendix in Table 6 . We introduce a novel geometric autoencoder, ProGAE, for learning meaningful disentangled representations of the protein conformational space. Our model accurately reconstructs structures of established drug-target proteins such as SARS-CoV-2 Spike, human ACE2, and GPCR proteins, as well as an existing benchmark MurD. The autoencoder separately encodes intrinsic and extrinsic geometries to ensure better latent interpretability. The extrinsic latent space can classify structures with respect to their bound drug molecules, as well as can predict the drug properties. The intrinsic space assists in improving the quality of the reconstructions. The resulting disentangled, smooth latent space enables controllable generation of protein structures in a drug-dependent manner. We also show that the filters learned in training can be successfully transferred to trajectories of different protein systems, irrespective of system size, conformational state, or presence of protein-protein interaction. These results on learning, prediction, and generation suggest that the proposed framework can serve as a step towards bridging geometric deep learning with protein simulations and provide an efficient and complimentary means for understanding and enhancing structural landscapes of important drug-target proteins. Related Work on Modeling Dynamics For completeness, we discuss recent work modeling temporal aspects of protein dynamics here. [22] learns the dynamics between a particular starting conformation and a particular target conformation by training a control Hamiltonian represented by a graph neural network. From the perspective of Hamiltonian systems, [23] introduces Symplectic RNNs, which leverages sympletic integration for learning the dynamics of a physical system. [24] learns dynamics via a differentiable simulator using Langevin dynamics. [25] introduces VAMPnet, which maps molecular coordinates to Markov states, to capture molecular kinetics. [26] augments molecular dynamics simulation with deep learning to improve the sampling of the folded states of proteins. [27] uses a LSTM to predict protein dynamics. Another line of work involves enhanced sampling with neural net approximations of slow variables [28] [29] [30] . Deep and adversarial learning have been employed for such enhanced sampling [31, 32] . [33] uses a deep generative neural net to directly sample the equilibrium distribution of a many-body system defined by an energy function, without using molecular simulation. Given the two embeddings, X ∈ R n×m1 and Y ∈ R n×m2 , CCA finds two linear transformations, A ∈ R m1×m1 and B ∈ R m2×m2 , such that For training, we use ADAM with a learning rate of 1E-3 [34] . Learning rate decays at a rate of 0.995 per epoch. We train models with a weight decay penalty of 5E-5. The models are trained 100 epochs, which is enough to achieve convergence, with a batch size of 64. Additionally, we set λ R = 5E-1 for the bond length penalty.The neighborhood radius for defining the sparsity of the graph attention layer is set to 2.5Å in the first layer. This radius is scaled at each layer with the stride of the previous convolution. Here we include additional details on the datasets used for experiments. These datasets in [19] are: (1) 50 independent trajectories, each simulating the SARS-CoV-2 trimeric spike protein (S protein) in the presence of a distinct drug for 2µs. The simulation is limited to 3 receptor binding domains (RBDs) of the protein, as well as a short region needed for the system to maintain a trimer assembly; (2) 75 independent trajectories, each simulating the ectodomain protein of human ACE2 (hACE2) in the presence of a distinct drug for 2µs. The backbones of the S protein and the hACE2 protein contain 3,690 atoms and 2,386 atoms, respectively. The time resolution is 1,200 ps. We form the training, validation, and test sets via randomly sampling frames with a 70%/10%/20% split. To form our GPCRmd datasets for training and testing our network, we sample 38 simulations of different proteins from this dataset. These proteins are G-protein coupled receptors. Regarding the transfer learning dataset, we use the first 2.5 µs of these 10 µs simulations, corresponding to 2,001 frames with a resolution of 1,200 ps. Additionally, we utilize the first 10 µs of a 100 µs simulation of the main Protease of SARS-CoV-2, a sequence of 10,001 frames with a 1,000 ps resolution. Figure 7 : RMSD of proteins generated along the latent interpolation between two hACE2 proteins randomly sampled from different trajectories. The RMSDs are computed with respect to the endpoint proteins, with standard error shown. We see a smooth interpolation between the RMSD errors as desired. Highly accurate protein structure prediction with alphafold Learning protein conformational space by enforcing physics with convolutions and latent interpolations Deep clustering of protein folding simulations Generating tertiary protein structures via an interpretative variational autoencoder Interpretable embeddings from molecular simulations using gaussian mixture variational autoencoders Unsupervised geometric disentanglement for surfaces via cfan-vae Disentangling content and style via unsupervised geometry distillation Dsm-net: Disentangled structured mesh net for controllable generation of fine geometry A review of deep learning methods for antibodies Learning from protein structure with geometric vector perceptrons Deciphering interaction fingerprints from protein molecular surfaces Proteinn: Intrinsic-extrinsic convolution and pooling for scalable deep protein analysis Learning protein conformational space by enforcing physics with convolutions and latent interpolations Recovery of protein structure from contact maps Stereochemical restraints revisited: how accurate are refinement targets and how much should protein structures be allowed to deviate from them? Graph attention networks Exploiting edge features for graph neural networks Fast r-cnn Molecular dynamics simulations related to sars-cov-2 Gpcrmd uncovers the dynamics of the 3d-gpcrome Canonical correlation analysis: An overview with application to learning methods Differentiable molecular simulations for control and learning Symplectic recurrent neural networks Learning protein structure with a differentiable simulator Vampnets for deep learning of molecular kinetics Deepdrivemd: Deep-learning driven adaptive molecular simulations for protein folding Learning molecular dynamics with simple language model built upon long short-term memory neural network Molecular enhanced sampling with autoencoders: On-the-fly collective variable discovery and accelerated free energy landscape exploration Nonlinear discovery of slow molecular modes using state-free reversible vampnets Reweighted autoencoded variational bayes for enhanced sampling (rave) Neural networks-based variationally enhanced sampling Targeted adversarial learning optimized sampling Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning Adam: A method for stochastic optimization J. Tatro's work was supported by the IBM-RPI AIRC program. R. Lai's work is supported in part by NSF CAREER Award (DMS-1752934)