key: cord-0731463-uyf46wvd authors: Tian, Hao; Tao, Peng title: ivis Dimensionality Reduction Framework for Biomacromolecular Simulations date: 2020-04-22 journal: Journal of chemical information and modeling DOI: 10.1021/acs.jcim.0c00485 sha: e5859a4d0ac11a4c83bc0b65f9ca3fd0c17c22e9 doc_id: 731463 cord_uid: uyf46wvd Molecular dynamics (MD) simulations have been widely applied to study macromolecules including proteins. However, high-dimensionality of the datasets produced by simulations makes it difficult for thorough analysis, and further hinders a deeper understanding of biomacromolecules. To gain more insights into the protein structure-function relations, appropriate dimensionality reduction methods are needed to project simulations onto low-dimensional spaces. Linear dimensionality reduction methods, such as principal component analysis (PCA) and time-structure based independent component analysis (t-ICA), could not preserve sufficient structural information. Though better than linear methods, nonlinear methods, such as t-distributed stochastic neighbor embedding (t-SNE), still suffer from the limitations in avoiding system noise and keeping inter-cluster relations. ivis is a novel deep learning-based dimensionality reduction method originally developed for single-cell datasets. Here we applied this framework for the study of light, oxygen and voltage (LOV) domain of diatom Phaeodactylum tricornutum aureochrome 1a (PtAu1a). Compared with other methods, ivis is shown to be superior in constructing Markov state model (MSM), preserving information of both local and global distances and maintaining similarity between high dimension and low dimension with the least information loss. Moreover, ivis framework is capable of providing new prospective for deciphering residue-level protein allostery through the feature weights in the neural network. Overall, ivis is a promising member in the analysis toolbox for proteins. Molecular dynamics (MD) simulations have been widely used in biomolecules to provide insights into their functions at the atomic-scale mechanisms. 1 For this purpose, extensive timescale is generally preferred for the simulations to study protein dynamics and functions. Due to the arising of graphics processing units (GPU) and their application in biomolecular simulations, MD simulation timescale has reached from nanoseconds to experimentally meaningful microseconds. 2, 3 However, simulation data for biomacromolecues such as proteins are high-dimensional and suffer from the curse of dimensionality, 4 which hinders in-depth analysis, including extracting slow time-scale protein motions, 5 identifying representative protein structures 6 and clustering kinetically similar macrostates. 7 In order to make these analyses feasible, it will be informative to construct a low-dimensional space to characterize protein dynamics in the best way possible. In recent years, new dimensionality reduction algorithms have been developed and can be applied to analyze protein simulations, construct representative distribution in low dimensional space, and extract intrinsic relations between protein structure and functional dynamics. These methods can be generally categorized into linear and nonlinear methods. 8, 9 Linear dimensionality reduction methods produce new variables as the linear combination of the input variables, such as principal component analysis (PCA) 10 and time-structure based independent component analysis (t-ICA). 11 Nonlinear methods construct variables through a nonlinear function, including t-distributed stochastic neighbor embedding (t-SNE) 12 and auto encoders. 13 It is reported that nonlinear methods are more powerful in reducing dimensionality while preserving representative structures. 14 Information is inevitably lost to certain degree through the dimensionality reduction process. 15 It is expected that the distances among data points in the low dimensional space resemble the original data in the high dimensional space. Markov state model (MSM) is often applied to study the dynamics of biomolecular system. MSM is constructed by clustering states in the reduced dimensional space to catch long-time kinetic information. 16 However, many dimensionality reduction methods, such as PCA and t-ICA, fail to keep the similarity characteristics in the low dimension, which would cause a misleading clustering analysis based on the projections of low-dimensional space. 17 Therefore, more appropriate dimensionality reduction methods are needed to build proper MSM. A novel framework, ivis, 18 is a recently developed dimensionality reduction method for single-cell datasets. ivis is a nonlinear method based on siamese neural networks (SNNs). 19 The SNN architecture consists of three identical neural networks and ranks the similarity to the input data. The loss function used for training process is a triplet loss function 20 that calculates the Euclidean distance among data points and simultaneously minimizes the distances between data of the same labels while maximizing the distances between data of different labels. Due to this intrinsic property, ivis framework is capable of preserving both local and global structures in low-dimensional surface. With the success in single-cell expression data, ivis framework is promising as a dimensionality reduction method for simulations of biomacromolecules to investigate their functional dynamics such as allostery. Diatom Phaeodactylum tricornutum aureochrome 1a (PtAu1a) is a recently discovered light, oxygen, or voltage (LOV) protein from the photosynthetic stramenopile alga Vaucheria frigida. 21 This protein consists of an N-terminal domain, a C-terminal LOV core, and a basic region leucine zipper (bZIP) DNA-binding domain. PtAu1a is a monomer in the native dark state. The interaction between its LOV core and bZIP prohibits DNA binding. 22 Upon light perturbation, a covalent bond forms between Therefore, the conformational changes in AuLOV are expected to differ from other LOV protein, raising the question on how the allosteric signal transmits in AuLOV. In the current study, ivis framework, together with other dimensionality reduction methods, is applied to project the AuLOV simulations onto reduced dimensional spaces. The performance of the selected methods are assessed and compared, validating the ivis as a superior framework for dimensionality reduction of biomacromolecule simulations. The crystal structures of AuLOV dark and light states were obtained from Protein Data-Bank (PDB) 25 with PDB ID 5dkk and 5dkl, respectively. The light structure sequence starts from Gly234 while the dark structure sequence starts from Phe239 in chain A and Ser240 in chain B. For consistency, residues before Ser240 were removed to keep the same number of residues in all chains. Therefore, simulations of dark state and light state can be treated similarly. Both structures contain FMN as cofactor. The FMN force field from a previous study 26 was used in this study. Two new states, named as transient dark state (forcing the Cysteinyl-Flavin C4a adduct in the dark state structure) and transient light state (breaking the Cysteinyl-Flavin C4a adduct in the light state structure) were constructed to fully explore the protein conformational space. Two monomers ( Figure 1A ) and a dimer ( Figure 1B ) were simulated in the dark states and light states, respectively. The crystal structures with added hydrogen atoms were solvated within a rectangular water box using the TIP3P water model. 27 Sodium and chlorine ions were added for charge neutralization. Energy minimization was done for each water box. The system was further subjected to 20 picoseconds (ps) of MD simulations to raise temperature from 0K to 300K and another 20ps simulations for equilibrium. 10 nanoseconds (ns) of isothermal-isobaric ensemble (NPT) MD simulation under 1 bar pressure were conducted. Canonical ensemble (NVT) is usually applied in the production runs to investigate the allosteric process 28,29 . 1.1 microseconds (µs) of canonical ensemble (NVT) Langevin MD simulation at 300K was carried out for each production run. The Langevin dynamics friction coefficient that couples the system to heat bath, was set to 1 ps −1 , 30,31 with minimum perturbation to the dynamical properties of the protein system. 32 For all production simulations, the first 100ns simulation is treated as equilibration stage and not included in the analysis. For each structure, three independent MD simulations were carried out and a total of 12 µs simulations were used in analysis. All chemical bonds associated with hydrogen atom were constrained with SHAKE method. 2 femtoseconds (f s) step size was used and simulation trajectories were saved for every 100ps. Periodic boundary condition (PBC) was applied in simulations. Electrostatic interactions were calculated with particle mesh Ewald (PME) algorithm 33 and a cutoff of 1.2 nanometers (nm). Simulations were conducted using graphics processing unit accelerated calculations of OpenMM 34 with CHARMM 35 simulation package version c41b1 and CHARMM27 force field. 36 In MD simulations, protein structures are represented as atom positions in Cartesian coordinates. However, this representation is neither rotation invariant nor feasible for analysis purpose due to the significant number of atoms with total of 3N degrees of freedom. In order to represent the protein structures with rotational invariance and essential structural information, pair-wised backbone Cα distances were selected to represent the overall protein configuration. Following our previously proposed feature processing method, 37 distances were encoded as a rectified linear unit (ReLU) 38 -like activation function and further expanded as a vector. Dimentionality Reduction Methods ivis ivis is a deep learning-based method for structure-preserving dimensionality reduction. This framework is designed using siamese neural networks, which implement a novel architecture to rank similarity among input data. Three identical networks are included in the SNN. Each network consists of three dense layers and an embedding layer. The size of the embedding layer was set to 2, aiming to project high-dimensional data into a 2D space. Scaled exponential linear units (SELUs) 39 activation function is used in the dense layers, The LeCun normal distribution is applied to initialize the weights of these layers. For the embedding layer, linear activation function is used, and weights are initialized with Glorot's uniform distribution. In order to avoid overfitting, dropout layers with a default dropout rate of 0.1 are used for each dense layer. A triplet loss function is used as the loss function for training, where a, p, n are anchor points, positive points, negative points, respectively. D and m are Euclidean distance and margin, respectively. Anchor points are points of interest. The triplet loss function aims to minimize the distance between anchor points and positive points while maximizing the distances between anchor points and negative points. The distance between positive points and negative points are also taken into account, as shown in min (D a,n , D p,n ) in the above equation. The k-nearest neighbors (KNNs) are used to obtain data for the triplet loss function. k is a tuning parameter and is set to 100. For each round of calculation, one point in the dataset is selected as an anchor. A positive point is randomly selected among the nearest k neighbors around the anchor, and a negative point is randomly selected outside the neighbors. For each training epoch, the triplet selection is updated to maximize the differences in both local and global distances. If the date set could be classified into different groups based on their intrinsic properties, ivis can also be used as a supervised learning method by combining the distance-based triplet loss function with a classification loss. Supervision weight is a tuning parameter to control the relative importance of loss function in labeling classification. The neural network is trained using Adam optimizer function with a learning rate of 0.001. Early stopping is a method to prevent overfitting in training neural network and is applied in this study to terminate the training process if loss function does not decrease after 10 consecutive epochs. Time-structure Independent Components Analysis (t-ICA) t-ICA method finds the slowest motion or dynamics in molecular simulations and is commonly used as dimensionality reduction method for macromolecular simulations. 11 For a given n-dimensional data, t-ICA is employed by solving the following equation: where K is eigenvalue matrix and F is the eigenvector matrix.C is the time lag correlation The results calculated by t-ICA are linear combinations of input features that are highly autocorrelated. PCA is a method that finds the projection vectors that maximize the variance by conducting an orthogonal linear transformation. 10 In the new coordinate system, the greatest variance of the data lies on the first coordinate and is called the first principal component. Principal components can be solved through the singular value decomposition (SVD). 40 Given data matrix X, the covariance matrix can be calculated as: where n is the number of samples. C is a symmetric matrix and can be diagonalized as: where V is a matrix of eigenvectors and L is a diagonal matrix with eigenvalues λ i in descending order. t-SNE is a nonlinear dimentionality reduction method that tries to embed similar objects in high dimensions to points close to each other in a low dimension space. 12 t-SNE has been demonstrated as a suitable dimensionality reduction method for protein simulations. 41 The calculation process consists of two stages. First, conditional probability is calculated to represent the similarity between two objects as: where σ i is the bandwidth of the Gaussian kernels. While the conditional probability is not symmetric since p j|i is not equal to p i|j , the joint probability is defined as: In order to better represent the similarity among objects in the reduced map, the similarity q ij is defined as: Combined with the joint probability p ij and similarity q ij , Kullback-Leibler (KL) divergence is used to determine the coordinates of y i as: The KL divergence measures the differences between high-dimensional data and lowdimensional points, which is minimized through gradient descent method. A drawback of traditional t-SNE method is the slow training time. In order to speed up the computational time of dimensionality reduction process, Multicore t-SNE 42 is used and abbreviated as t-SNE in this study. Several assessment criteria are applied to quantify and compare the performance of each dimensionality reduction method. The RMSD is used to measure the conformational change in each frame with regard to a reference structure. Given a molecular structure, the RMSD is calculated as: where r is a vector represented in Cartesian coordinates and r 0 i is the i th atom in the reference structure. Pearson correlation coefficient 43 reflects the linear correlation between two variables. PCC has been rigorously applied to estimate the linear relation between distances in the original space and the reduced space. 44 L2 distance, which is also called Euclidean distance, is used for the distance calculation and is shown as follows: Based on the L2 distance expression, PCC is calculated as: where n is the sample size, x i , y i ,x,ȳ are the distances and the mean value of distances, respectively. Spearman's rank-order correlation coefficient is used to quantitatively analyze how well distances between all pairs of points in the original spaces have been preserved in the reduced dimensions. Specifically, Spearman correlation coefficient measures the difference in distance ranking, which is calculated as the following: where d i is the difference in paired ranks and n equals the total number of samples. The Mantel test is a non-parametric method that is originally used in genetics, 45 which tests the correlation between two distance matrices. A common problem in evaluating the correlation coefficient is that distances are dependent to each other and therefore cannot be determined directly. The Mantel test overcomes this obstacle through permutations of the rows and columns of one of the matrices. The correlation between two matrices is calculated at each permutation. MantelTest GitHub repository 46 was used to implement the algorithm. While chemical information in the original space could be lost to a certain degree in the reduced space, dimensionality reduction methods are expected to keep the maximum information. Shannon information content is applied to test the information preservation in the reduced space, which is defined as: where P is the probability of a specific event x. To avoid the possible dependency among different features in the reduced dimensions, original space was reduced to 1 dimension (1D) to calculate the IC. The values in the 1D was sorted and put into 100 bins of the same length. The bins were treated as events and the corresponding probabilities were calculated as the ratio of the number of samples in each bin to the total number of samples. Markov state model has been widely used to partition the protein conformational space into kinetically separated macrostates 47 and estimate relaxation time to construct longtimescale dynamics behavior. 6 MSMBuilder 48 (version 3.8.0) was employed to implement the Markov state model. k-Means clustering method was used to cluster 1, 000 microstates. A series of lag time at equal interval was set to calculate the transition matrix. The corresponding second eigenvalue was used to estimate the relaxation timescale, which was calculated as: where λ 1 is the second eigenvalue and τ is the lag time. The generalized matrix Rayleigh quotient (GMRQ) 49 , generated using the combination of cross-validation and variational approach, was used to assess the effectiveness of MSM The k-means clustering was used in the reduced dimensions to partition a total number of 120, 000 frames from AuLOV MD trajectories into 1, 000 microstates. Within each cluster, the RMSDs were calculated for each structure pair. A RMSD value of each cluster is defined as the average RMSD value among all structure pairs within that cluster. The results of five dimensionality reduction models are shown in Figure 5A . The average RMSD poor performance of t-SNE model may be due to the reason that t-SNE is a nonlinear method and therefore suffers the problem that distance in the high dimensional space is not linearly projected to low dimensional space, as reported in other studies. 58, 59 While ivis models showed good ability in keeping the linear projection relation, the Spearman correlation coefficient fails to overcome the problem that features are not independent. The pair-wised distances are subjected to the molecular motion of Cα that changing the coordinate of one Cα atom would affect the distances related to this atom. Therefore, to address this issue, the Mantel test was used to randomize the Euclidean distances. Permutations of rows and columns in the Euclidean distance matrix were done for 10, 000 times while Pearson correlation coefficient being calculated at each time. The results of the Mantel test are plotted in Figure 6B . Both unsupervised ivis and supervised ivis showed remarkable results in preserving the correspondence relationship in randomized order, at the mean coefficient of 0.83 and 0.95, respectively. During the process of dimensionality reduction, information is inevitably lost to some degree. In order to measure the retaining information through the dimensionality reduction process, the Shannon information is applied to the coordinates in the low dimensional space. However, when dealing with multiple variables, especially for the dependent Cα distances, the total Shannon information is not equal to the sum of the Shannon information of each variable. To reduce the computation complexity, high dimensional features were reduced to 1D for calculation and results are plotted in Figure 6C . It shows that supervised ivis model is superior in preserving information content with the least information loss. It is also worth noting that t-SNE showed better performance than the unsupervised ivis model. To study the performance of Markov state model on dimensions and dimensionality reduction methods, the generalized matrix Rayleigh quotient was calculated for each dimension and method ( Figure 6D ). The results of four methods showed different trends. Supervised ivis and t-ICA methods were the least and most affected by the number of dimensions, respectively. For PCA and t-SNE, the optimal parameter of the number of dimensions is in The transition probabilities among macrostates in ivis projections are shown in Figure 8 . further from FMN and Phe331 moves closer. The transition probability from native dark state to macrostate 1 is 7.7% while the transition probability to macrostate 3 is 1.0%. Therefore, starting from the native dark state, the AuLOV allosteric process is more probable to go through macrostate 1 than macrostate 3. However, macrostate 1 is buried in the native dark state of PCA projections, and both macrostates 1 and 3 are buried in the native dark state of t-SNE and t-ICA projections, leading to the ambiguity of such comparison using these methods. Thus, ivis framework is proved to be superior in revealing the residue-level mechanism study. The effectiveness of MSM depends on the projected 2D space, where appropriate discrete states are produced by clustering the original data points in the projection space. The number of macrostates are determined based on the implicated timescales using different lag time in different reduced spaces. In this study, 9, 9, 7, 9, 7 macrostates were selected for unsupervised ivis, supervised ivis, PCA, t-SNE and t-ICA, respectively. The samples were clustered through Perron-cluster cluster analysis (PCCA). Dataset was further split into training set (70%) and testing set (30%). Two machine learning methods (random forest and artificial neural network) were applied to predict the macrostates of each data point based on the pair-wised Cα distances. Prediction accuracy results are plotted in Figure 10A and 10B. It shows that the supervised ivis framework is the best among the five dimensionality reduction methods. Surprisingly, while the unsupervised ivis model was trained without class labels in the loss function, the high prediction accuracy of this model demonstrates its good performance on the 2D projections. Random forest is often applied to distinguish the macrostates, since it provides feature importance, which is important for the interpretation of biological system. The accumulated feature importance of ranfom forest model on the supervised ivis model is plotted in Figure 10C . The top 490 features accounts for 90.2% of the overall feature importance. The high prediction accuracy of the supervised ivis framework suggests that supervised ivis is more promising in elucidating the conformational differences among macrostates. The In order to identify key residues and structures that are important in the dimensionality reduction process, 32, 131 feature weights on the last layer were treated as the feature importance and shown as the protein contact map in Figure 11 . Global structures are encoded to features further from the diagonal. In Figure 11 , the local information is shown in red rectangular as the Cα and Dα helices in AuLOV system, and global information is shown in black rectangular as the Gβ and Hβ strands. While region 2 (protein interactions from chain A to chain B) and 3 (protein interactions from chain B to chain A) are mostly symmetrical, we found the asymmetrical behavior (red circle in Figure 11 ) that the interaction between Jα in chain A and linkers in chain B is stronger than the interaction between Jα in chain B and linkers in chain A. To examine the important residues identified in the protein contact map, for each Cα distance, the corresponding feature weight was accumulated to the related two residues. Therefore, the significance of residues and structures are quantified. Top 20 residues were listed in Table 1 with important residues that are experimentally identified 22, [63] [64] [65] [66] shown in bold font. The accumulated importance of secondary structure is shown in Table 2 , which shows that A'α helix, Jα helix and protein linkers are important in AuLOV allostery. ivis is more computationally efficient than t-ICA and t-SNE A key factor in comparing different dimensionality reduction methods is their computational cost, for it could be prohibitively expensive when dealing with large size and highdimensional dataset. To compare the computational efficiency of different dimensionality reduction methods with regard to sample size and feature size, three randomly generated datasets with uniform distribution between 0 and 1 were applied for each dataset size. The relation between runtime and sample size, with feature size of 1, 000, is shown in Figure 12A . While t-SNE is stable and fast in small datasets (≤ 10, 000 sample size), the runtime grows the fastest among the five models and is not feasible for large dataset. t-ICA and PCA overlapped with each other since these two models are less affected by the sample size. Unsupervised ivis and supervised ivis exhibited similar runtime results. The relation between runtime and feature size with sample size of 10, 000 is shown in Figure 12B . t-ICA and t-SNE show similar trends in the runtime growth trend, as they perform fast in small feature size (≤ 10, 000) but not practical in higher dimensions. While both ivis models are slower than PCA, the runtime of these two models are acceptable for large sample size and high dimension. The training process of supervised ivis is further displayed in Figure 13 . Triplet loss was stable after 4 epochs and stopped at 32 epochs with early stopping of 10. As a deep learning-based algorithm, ivis framework is originally designed for single-cell experiments to provide new approach for visualization and explanation purposes. In this study, ivis is applied on the MD simulations of allosteric protein AuLOV for dimensionality reduction. Combined with several performance criteria, ivis is demonstrated to be effective in keeping both local and global features while offering key insights for the mechanism of protein allostery. Various dimensionality reduction methods have been used in protein systems, such as PCA, t-ICA and t-SNE. As linear methods, PCA and t-ICA aim to capture the maximum variance and autocorrelation of protein motion, respectively. However, nonlinear dimensionality reduction methods, such as t-SNE, have been shown to be superior than linear methods in keeping the similarity between high dimension and low dimension. 41 Nevertheless, limitations of t-SNE, such as being susceptible to system noise 67 and poor performance in extracting global structure, hinder further interpretations for biological systems. Compared with these dimensionality reduction methods, ivis is outstanding in preserving distances in the low-dimensional spaces and could be utilized for biological system explanation. In the process of AuLOV dimerization, several residues have been experimentally confirmed as important in promoting the allostery. However, substantial study is necessary to establish a detailed mechanism. In the 2D projections of ivis framework, two important macrostates and corresponding protein structures can be extracted for residue-level mechanism study. The comparison of native structures reveals that it is about 7 times more likely for the orientational changes (hydrogen bonds breaking) in Gln350 and Asn329 near cofactor FMN than the conformational changes in Phe331 and Gln350. However, because of the overlapping in the dark states, these two macrostates are missing in the projections from other dimensionality reduction methods. The protein contact map further demonstrates the superiority of ivis dimensionality reduction method that ivis can both retain local and global information. Unexpectedly, asymmetrical nature of the AuLOV dimer is revealed by comparing the protein-protein interactions. There are several important residues identified by ivis framework. Met313, Leu331 and Cys351 have been reported as light-induced rotamers near cofactor FMN. 22 These key residues are located on the surface of the β-sheet, which is consistent with and proves the concept of signaling mechanism that signals originated from the core of Per-ARNT-Sim (PAS) generate conformational change mainly within the β-sheet. 63, 64 Gln365 is important for the stability of Jα helix through the hydrogen bonding with Cys316. 65 Leu248, Gln250 and Asn251 were also found important in modulating allostery within single chain, reported as A'α linker while Asn329 and Gln350 function as FMN stabilizer. 66 Through the AuLOV dimerization, A'α and Jα helices undergo conformational changes and are expected to account for large importance, as shown in Table 2 . However, the protein linkers, as well as Cα helix and Hβ and Iβ strands, also showed high importance. The significance of protein linkers in the current study is consistent with both experimental and computational findings 69-72 that protein linkers are indispensable components in allostery and biological functions. Together, these unexpected structures are vital in AuLOV allostery and worth further study. Overall, several key residues and secondary structures identified by ivis framework agrees with the experimental finding, which consolidates the good performance of ivis in elucidating the protein allosteric process. Computational cost should be considered when comparing dimensionality reduction methods, since it is computationally expensive for large datasets, especially for proteins. From this prospective, different models are benchmarked using a dummy dataset. Results showed that PCA requires the least computational resource, not subjected to either sample size or feature size. This might be due to the reason that PCA implemented in Scikit-learn uses SVD for acceleration. Further, since the size of dataset was large, randomized truncated SVD was applied to reduce the time complexity to O(n 2 max · n components ) with n max = max(n samples , n features ). 73 While t-SNE is comparable with ivis regarding several assessments, the computational cost could be prohibitively expensive for large datasets as t-SNE has a time complexity of O(N 2 D), 74 where N and D are the number of samples and features, respectively. Though tree-based algorithms have been developed to reduce the complexity to O (N log N ) , 75 it is still challenging for the high-dimensional protein system. ivis exhibited less computational cost in higher sample size and dimension. Further, as shown in Figure 13 , the loss of ivis model converges fast and the overall computational cost could have been further reduced with early stopping iterations. Combined with the performance criteria and runtime comparison, ivis framework is demonstrated as a superior dimensionality reduction method for protein system and can be an important member in the analysis toolbox for MD trajectory. As originally developed for single-cell technology, ivis framework is applied in this study as a dimensionality reduction method for molecular dynamics simulations for biological macromolecules. ivis is superior than other dimensionality reduction methods in several aspects, ranging from preserving both local and global distances, maintaining similarity among data points in high dimensional space and projections, to retaining the most structural information through a series of performance assessments. ivis also shows great potential in interpreting biological system through the feature weights in the neural network layer. Overall, ivis reached a balance between dimensionality reduction performance and computational cost and is therefore promising as an effective tool for the analysis of macromolecular simulations. How Fast-Folding Proteins Fold Millisecond-Scale Molecular Dynamics Simulations on Anton OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Collective Motions in Proteins: A Covariance Analysis of Atomic Fluctuations in Molecular Dynamics and Normal Mode Simulations Allosteric Mechanism of the Circadian Protein Vivid Resolved Through Markov State Model and Machine Learning Analysis Ten-Microsecond Molecular Dynamics Simulation of a Fast-Folding WW Domain Nonlinear Dimensionality Reduction by Locally Linear Embedding A Global Geometric Framework for Nonlinear Dimensionality Reduction Quasi-Harmonic Method for Studying Very Low Frequency Modes in Proteins Slow Dynamics in Protein Fluctuations Revealed by Time-Structure Based Independent Component Analysis: The Case of Domain Motions Visualizing Data Using t-SNE Reducing the Dimensionality of Data with Neural Networks Nonlinear Dimensionality Reduction in Molecular Simulation: The Diffusion Map Approach Multi-Dimensional Reduction and Transfer Function Design Using Parallel Coordinates Accurate Estimation of Protein Folding and Unfolding Times: Beyond Markov State Models Dimensionality Reduction Methods for Molecular Simulations Structure-Preserving Visualisation of High Dimensional Single-Cell Datasets Siamese Neural Networks for One-Shot Image Recognition. ICML deep learning workshop Defense of the Triplet Loss for Person Re-Identification A Photoreceptor Required for Photomorphogenesis in Stramenopiles Blue Light-Induced LOV Domain Dimerization Enhances the Affinity of Aureochrome 1a for Its Target DNA Sequence Bacterial Bilin-and Flavin-Binding Photoreceptors The LOV Domain Family: Photoresponsive Signaling Modules Coupled to Diverse Output Domains The Protein Data Bank and the Challenge of Structural Genomics Signaling Mechanisms of LOV Domains: New Insights From Molecular Dynamics Studies Comparison of Simple Potential Functions for Simulating Liquid Water Long-Range Conformational Transition of a Photoswitchable Allosteric Protein: Molecular Dynamics Simulation Study Autoencoder-Based Detection of Dynamic Allostery Triggered by Ligand Binding Based on Molecular Dynamics A Vulnerability in Popular Molecular Dynamics Packages Concerning Langevin and Andersen Dynamics Allosteric Modulation of Binding Specificity by Alternative Packing of Protein Cores Best Practices for Foundations in Molecular Simulations A Smooth Particle Mesh Ewald Method OpenMM: A Hardware-Independent Framework for Molecular Simulations CHARMM: The Biomolecular Simulation Program Jr All-Atom Empirical Force Field for Nucleic Acids: I. Parameter Optimization Based on Small Molecule and Condensed Phase Macromolecular Target Data Deciphering the Protein Motion of S1 Subunit in SARS-CoV-2 Spike Glycoprotein Through Integrated Computational Methods Rectified Linear Units Improve Restricted Boltzmann Machines Self-Normalizing Neural Networks. Advances in neural information processing systems t-Distributed Stochastic Neighbor Embedding Method with the Least Information Loss for Macromolecular Simulations Cohen, I. Noise reduction in speech processing Quantifying Colocalization by Correlation: The Pearson Correlation Coefficient Is Superior to the Mander's Overlap Coefficient Statistical Model Selection for Markov Models of Biomolecular Dynamics Statistical Models for Biomolecular Dynamics Variational Cross-Validation of Slow Dynamical Modes in Molecular Kinetics Classification and Regression by randomForest Scikit-learn: Machine learning in Python Neural networks for perception A Method for Stochastic Optimization Everything You Wanted to Know About Markov State Models but Were Afraid to Ask Progress and Challenges in the Automated Construction of Markov State Models for Full Protein Systems Automatic Discovery of Metastable States for the Construction of Markov Models of Macromolecular Conformational Dynamics Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection Using Global t-SNE to Preserve Inter-Cluster Data Structure Dynamical Behavior of β-Lactamases and Penicillin-Binding Proteins in Different Functional States and Its Potential Role in Evolution Machine Learning Classification Model for Functional Binding Modes of TEM-1 β-Lactamase. Front GRadient Adaptive Decomposition (GRAD) Method: Optimized Refinement Along Macrostate Borders in Markov State Models Conformational Switching in the Fungal Light Sensor Vivid Structure and Signaling Mechanism of Per-ARNT-Sim Domains Blue-Light-Induced Unfolding of the Jα Helix Allows for the Dimerization of Aureochrome-LOV From the Diatom Phaeodactylum tricornutum Structure of a LOV Protein in Apo-State and Implications for Construction of LOV-Based Optical Tools A More Globally Accurate Dimensionality Reduction Method Using Triplets Dynamic Allostery: Linkers Are Not Merely Flexible An Analysis of Protein Domain Linkers: Their Classification and Role in Protein Folding Role of Linkers in Communication Between Protein Modules Finding Structure with Randomness: Stochastic Algorithms for Constructing Approximate Matrix Decompositions Interpretable Dimensionality Reduction of Single Cell Transcriptome Data with Deep Generative Models Accelerating t-SNE Using Tree-Based Algorithms Title: ivis Dimensionality Reduction Framework for Biomacromolecular Simulations Authors: Hao Tian and Peng Tao Research reported in this paper was supported by the National Institute of General