key: cord-0502937-soxgliue authors: Chen, Renli title: TDSVD: A Way to Get The Single Solution From Time-Resolved Spectroscopy date: 2020-03-31 journal: nan DOI: nan sha: 67c8393355aeacf81a8df2e2932b952173652fa2 doc_id: 502937 cord_uid: soxgliue This article reminds us the major contribution of orthogonality towards the species assignments in TR-spectroscopy. In the field of particles, complete orthogonality is ubiquitous between any two different identical particles, and these complete orthogonality (at the microscopic level) evolves into the partial spectral orthogonality between different species at the mesoscopic level. As a result, we developed SVD to TDSVD so as to reveal the relative amounts of different species in each time interval, and it is the first time that the single solution can be drew from only the TA spectroscopy data without any kinds of a priori information. In the previous articles, 'the underdetermination of GTA' has been addressed prominently, so it is no wonder that researchers are no longer keening to search the method for mathematically analyzing the TR-data in order to derive the single solution. However, TDSVD will offset the deficiency of being underdetermined in GTA and becomes an effectively autonomous method towards two-dimensional data analysis. Mathematically, as long as there is enough orthogonality between any two 'species' in any data matrix, the yield of the single solution is no longer a problem. Therefore, TDSVD can be applied to lots of fields. Several researchers have already addressed 'the underdetermination of GTA' (Nagle, Parodi et al. 1982 , Van Stokkum, Larsen et al. 2004 . The mainstream of the researchers think we cannot determine the single solution for the bare TR-spectroscopy (Van Stokkum, Larsen et al. 2004 , Ruckebusch, Sliwa et al. 2012 . And fewer and fewer researchers pay attention to other analytic methods, including SVD. Based on the current knowledge of the writer, the first try of SVD in the analysis of TR-spectroscopy is to determine the number of principal components (Warner, Christian et al. 1977) ; then people start to apply SVD to determine the values of rate constants (it seems to like GLA) (Hofrichter, Henry et al. 1985) , even the compartmental model (it seems to like GTA) (Hug, Lewis et al. 1990 ). The question is: Can SVD bring us some more information about the single solution? The writer of this paper thinks the answer is 'YES!'. A. SVD a. Where does the orthogonality come from? All the story in this article begins with identical particles. And in the realm of spectroscopy, we only concern about the distinguishability between different identical particles. As we all know, identical particles are particles that cannot be distinguished from one another, even in principle, thanks to the quantization. The orthogonality between any two identical particles resides in their 'definite' values (even probabilistic) of all kinds of properties. One significant fact is that the orthogonality between any two identical particles is the complete orthogonality (not the 'partial' orthogonality). However, what is the correlation between 'the complete orthogonality between any two identical particles' and 'the spectral distinguishability between any two species'? The answer to this question is: The former directly causes the latter, and the complete orthogonality of the former degenerates into the 'partial' orthogonality of the latter because of its scale being macro. And the 'partial' orthogonality means sometimes the observer are not sure about the origin of some spectra under the existing measurement accuracy. In other words, the 'partial' orthogonality leads to the ambiguity of distinguishability between different species. Here, the diversity of the ensemble of every different identical particle are ignored. And the case of degeneracy (only the kinetic degeneracy, because the spectral degeneracy will not happen in the real world) is ignored, too. b. The idea of PCA "Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The resulting vectors (each being a linear combination of the variables and containing n observations) are an uncorrelated orthogonal basis set." ("Principal component analysis," 2020) PCA is associated with another matrix factorization, the singular value decomposition (SVD). M is a real × matrix. Then the SVD of M is a factorization of the form (1) where  The superscript of T indicates the matrix transpose. U and V are orthogonal square matrices. Σ is a rectangular diagonal matrix with non-negative real numbers on the diagonal. The diagonal entries of Σ are known as the singular values of M. A common convention is to list the singular values in descending order in the diagonal of Σ. (" Singular value decomposition," 2020) 2. The columns , ..., of U yield an orthonormal basis of and the columns , ..., of V yield an orthonormal basis of . The former columns are called the left-singular vectors and the latter columns are called the right-singular vectors. The linear transformation has a particularly simple description with respect to these orthonormal bases: we have (" Singular value decomposition," 2020) ( ) = (3) The foremost fact in SVD is that the SV`s of M are viewed as the scaling factors of all the corresponding PC`s. And one particular important fact is that: The stronger the spectral orthogonality between any two species is, the closer the approximation between the principal components and the spectral measurements of all the species is in the manner of one-to-one correspondence. Namely, as long as there is spectral orthogonality between any two species, the SV`s indicate some information about the relative amounts of all the species. And this is the motivation of timedependent SVD (TDSVD). The definition of TDSVD The conventional utilization of SVD in the TR-spectroscopy is just targeted to the whole data matrix. Therefore, only the 'whole' SV`s are yielded. Now, the whole data matrix is divided into lots of block matrices with respect to the dimension of time (Eqn. 4). (The dimension of spectra of all the block matrices is the same as the one of the whole matrix.) Then SVD of each block matrix is calculated. Thus, there are time-dependent SV (TDSV) for every block matrix. According to 'the fundamental prerequisite of SVD' (see above), the SV`s of each block matrix indicate some information about the relative amounts of all the species in each corresponding block matrix. As a result, (with sufficient number of) TDSV`s can indicate the concentration profiles of all the species. If the number of species involved in is p and the spectral orthogonality between any two species is non-zero (This is certainly true!), then the number of non-zero SV`s in every block matrix should be identical to p (only for noise-free data set). Therefore, the number of non-zero TDSV in every block matrix is p, too. (Eqn. 4) And only the non-zero TDSV`s (after 'carefully' normalization) is plotted against the central time of every corresponding time interval. The ('carefully' normalized) curves of TDSV`s indicate the concentration profiles of all the species. The ideal case for TDSVD is like this: The spectral half of all the species is orthogonal, and the kinetic half of all the species in every block matrix is orthogonal. Then the ('carefully' normalized) curves of TDSV`s are just identical to the concentration profiles of all the species. Because in every block matrix, the left-singular vectors are the same as the normalized concentration profiles of all the species in this time interval, and the right-singular vectors are the same as the normalized difference spectra of all the species. There are 'peaks' and 'valleys' (both with valuable meanings in indications) in the curves of TDSV`s. But in the real world, the TR-data set is not noise-free and has many kinds of observational errors. As long as the S/N (signal/noise) is not outrageous, the result can be still indicative of the single solution. [ ] (One small thing here is about the names: TDSV means the time-dependent singular value(s) in a block matrix; TDSV`s mean two or more TDSV that are from different block matrices.) According to the definition of PCA, the first PC has the largest possible variance. If one species is the most occupied species, namely, this species has the highest relative content, then the first TDSV seems bigger than the rest TDSV, and the curve of the first TDSV`s looks like a 'peak' in the time interval where this species occupies the most. And when the most occupied species converts to other incoming species, namely, they are of approximately the same amount, then the relative content of the first TDSV with respect to all the TDSV together is smaller, and the curve of the first TDSV`s looks like a 'valley' in the time interval where the most occupied species converts to other incoming species. According to the definition of PCA, each succeeding PC in turn has the highest variance possible under the restriction that it is orthogonal to the preceding PC`s. Corresponding to the positions of the 'valleys' in the curve of the previous TDSV`s, 'peaks' should emerge in these positions of the curve of the following TDSV`s. In the contrary case, corresponding to the positions of the 'peaks' in the curve of the previous TDSV`s, 'valleys' should emerge in these positions of the curve of the following TDSV`s. Mostly, because the rest non-zero TDSV has the relatively small amount, and because lots of noises are deteriorating the data at about the same magnitude, so the rest non-zero TDSV has very few significances for the researchers if the S/N is not high enough. If the curves of all the TDSV's are carefully normalized in one plot, the set of all these curves should (somehow) resemble the set of all the concentration profiles together. In the non-ideal case for TDSVD, there are some deviations. Of course, most of the deviations are caused by the 'partial' orthogonality (or, non-perfect orthogonality) from both the spectral half and (especially) the kinetic half (FIG. 2,4) . For the simulated data matrix, some of the deviations are caused by truncation errors. For the experimental data matrix, some of them are caused by devastating observational errors. C. The result of the first TDSV`s (Only the curve of the first TDSV`s will be plotted in the main body of this article. The rest results are in the Appendix. This article agrees as follows. In the parentheses of ('weak/strong, weak/strong'), the left part is about the kinetic orthogonality and the right part is about the spectral orthogonality.) The intuitive examples of the first TDSV`s There are four types of the intuitive examples to demonstrate how the combination of the kinetic orthogonality and the spectral orthogonality affects the results of TDSVD (FIG. 1-2) . And the cosine of the angle between two vectors tells the magnitude of the orthogonality between these two vectors. The only kind of function used in the intuitive examples is the Cauchy distribution function, ( , 0 , ) = 1 ( − 0 ) 2 + 2 . And eight groups of parameters of Cauchy distribution function (the focused functions: 0 = 0, 0.4, 0.8, 1.2; = 0.8; the diffused functions: 0 = 0, 4, 8, 12; = 0.8) present here. The diffused function is more likely to couple with other functions, especially with itself. The other focused function is more unlikely to couple with other functions, especially with itself. In one data matrix, the 'kinetic' half are only the diffused function or the focused functions, so does the 'spectral' half. Therefore, the following statements is rational: 'the diffused/focused appearance (in either the kinetic half or the spectral half) on the contour plot  the diffused/focused functions used (in either the kinetic half or the spectral half)  the weak/strong orthogonality (in either the kinetic half or the spectral half)'. In One thing is for sure (FIG. 2) : the stronger the orthogonality in both the kinetic half and the spectral half is, the more evident the single solution from TDSVD. And obviously, the stronger the orthogonality in the kinetic half is, the better the distinguishability is. The measurements of the orthogonality in the both half and the results of the whole TDSV`s can be found in the Appendix. As long as there is enough orthogonality in both half, TDSVD should be always determinant in finding the single solution. In FIG. 3-4 , all the kinetic half are the evolution model made up of four species. And for simplicity, the spectral half is the same as before. (Because it is the degree of spectral orthogonality rather than the pattern of every spectrum that matters! And in the real world, the degrees of spectral orthogonality are usually high enough. Therefore, the real world 'favors' us to find the single solution from measurements.) In FIG. 3, The conclusion is about the same as before. However, because the kinetic half is made up of exponentials and the orthogonality between the exponentials is poor, the distinguishability of the first TDSV`s from TDSVD is not quite clear. (However, this is not the result of the underdetermination of GTA!) The measurements of the orthogonality in the both half and the result of the whole TDSV`s can be found in the Appendix. As long as there is enough orthogonality in both half (especially the kinetic half), TDSVD should be always determinant in finding the single solution. The case of ('strong, strong'). (c) The case of ('weak, weak'). (d) The case of ('strong, weak'). The legend above all is the legend for them all. It looks like that the data matrices in AREA Ⅰ, AREA Ⅱ, or AREA Ⅳ can be revealed the single solution by TDSVD, especially AREA Ⅱ and AREA Ⅳ (FIG. 5) . In TA data set, the spectral orthogonality is strong enough and the kinetic orthogonality is somehow weak (FIG. 5) . For the noise-free data set, the truncation error is the main reason for the fail of TDSVD. And for real data set, the observational errors can even be insuperable for TDSVD. For all the data matrices that is performed with SVD, because 'accidental' non-orthogonality is rare, so the numbers of the PC`s should be identical to the number of the species involved in. The SV`s can be viewed as the scaling factors of all the corresponding PC`s, and the relative amounts of the PC`s are direct correlated with the relative amounts of all the species. Therefore, the relative amounts of the SV`s can indicate the relative amounts of the corresponding concentration of all the species. And the set of curves of the non-zero TDSV`s can tell the researchers the single solution of a chemical kinetic process. Dedicated to the strongest women I know: Cuiyin Du (杜翠英), my mama. Thanks USTC to provide the essential software. Due to Covid-19 (Great gratitude for health care staff in response to the outbreak!!!), I have to stay home to finish this article. So there might be some errors. Welcome to contact me with the email above. In S1 and S2, ⃑ represents the vector (of the concentration profiles) in the kinetic half and ⃑ represents the vector (of the spectrum) in the spectral half. ⃑ ⃑ 1 2 1 2 S1: the respective measurements of orthogonality in FIG.1 and FIG.2 S2 : the respective measurements of orthogonality in FIG.3 and FIG.4 Nanosecond optical spectra of iron-cobalt hybrid hemoglobins: geminate recombination, conformational changes, and intersubunit communication Nanosecond photolysis of rhodopsin: Evidence for a new blue-shifted intermediate Procedure for testing kinetic models of the photocycle of bacteriorhodopsin Comprehensive data analysis of femtosecond transient absorption spectra: A review Global and target analysis of time-resolved spectra Analysis of multicomponent fluorescence data Principal component analysis Singular value decomposition S4: the respective whole TDSV`s in FIG.3 and FIG.4