Review and performance evaluation of trait-based between-community dissimilarity measures 1 Title 1 Review and performance evaluation of trait-based between-community dissimilarity measures 2 3 Author details 4 Attila Lengyel1* & Zoltán Botta-Dukát2* 5 *Centre for Ecological Research, Institute of Ecology and Botany, Alkotmány u. 2-4., H-2163 6 Vácrátót, Hungary 7 1 corresponding author, lengyel.attila@ecolres.hu 8 2 botta-dukat.zoltan@ecolres.hu 9 10 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 2 Abstract 11 1. In the recent years a variety of indices have been proposed with the aim of quantifying 12 functional dissimilarity between communities. These indices follow different 13 approaches to account for between-species similarities in the calculation of 14 community dissimilarity, yet they all have been proposed as straightforward tools. 15 2. In this paper we reviewed the trait-based dissimilarity indices available in the 16 literature, contrasted the approaches they follow, and evaluated their performance in 17 terms of correlation with an underlying environmental gradient using individual-based 18 community simulations with different gradient lengths. We tested how strongly 19 dissimilarities calculated by different indices correlate with environmental distances. 20 Using random forest models we tested the importance of gradient length, the choice of 21 data type (abundance vs. presence/absence), the transformation of between-species 22 similarities (linear vs. exponential), and the dissimilarity index in the predicting 23 correlation value. 24 3. We found that many indices behave very similarly and reach high correlation with 25 environmental distances. There were only a few indices (e.g. Rao’s DQ, and 26 representatives of the nearest neighbour approach) which performed regularly poorer 27 than the others. By far the strongest determinant of correlation with environmental 28 distance was the gradient length, followed by the data type. The dissimilarity index 29 and the transformation method seemed not crucial decisions when correlation with an 30 underlying gradient is to be maximized. 31 4. Synthesis: We provide a framework of functional dissimilarity indices and discuss the 32 approaches they follow. Although, these indices are formulated in different ways and 33 follow different approaches, most of them perform similarly well. At the same time, 34 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 3 sample properties (e.g. gradient length) determine the correlation between trait-based 35 dissimilarity and environmental distance more fundamentally. 36 37 Keywords 38 beta diversity, dissimilarity index, distance metric, community ecology, functional traits 39 40 Abbreviations 41 CDF = cumulative distribution function, CWM = community-weighted mean, FDissim = 42 functional dissimilarity, VIS = variable importance score 43 44 Introduction 45 Understanding and explaining the variation of living communities along dimensions of space 46 and time have been in the focus of ecological research ever since. The widely applied scheme 47 by Whittaker (1960, 1972) to tackle questions of different aspects of community variation 48 divides community diversity into alpha (within-community), beta (between-community) and 49 gamma (across-community) components. It is no exaggeration to say that among these three, 50 beta diversity sparked the most controversy due to the multitude of ways how it can be 51 formulated (Tuomisto 2010a,b, Anderson et al. 2011, Podani & Schmera 2011, Baselga & 52 Leprieur 2015). One of the most popular approaches to beta diversity builds upon 53 quantification of variation between pairs of communities using dissimilarity indices 54 (Anderson et al. 2006, Legendre & De Cáceres 2013, Ricotta 2017). A broad spectrum of 55 such dissimilarity indices are available for many specific purposes providing elementary tools 56 for different fields of ecology and beyond (see reviews by Legendre & Legendre 1998, Podani 57 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 4 2000). Nevertheless, choosing from such many options requires a more or less subjective 58 decision from the researcher which may affect the final result of the analysis. Comparative 59 reviews of dissimilarity indices (Faith et al. 1987, Koleff et al. 2003) and evaluations of 60 effects of methodological decisions (Lengyel & Podani 2015) are inevitably helpful in making 61 these decisions. 62 The most popular, yet not exclusive, interpretations of diversity for long time considered 63 species as variables which are unrelated with each other. In the last two decades, however, the 64 functional approach to ecological questions gained unprecedented attention (Díaz & Cabido 65 2001, McGill et al. 2006). This approach relies on the fact that species are not all maximally 66 different from each other, rather they can be considered related with respect to similarities in 67 their traits thought to represent their roles in ecosystems (Violle et al. 2007). The need for 68 explicitly accounting for between-species relatedness generated a wave of methodological 69 improvements that introduced new methods in the calculation of diversity. Next to a lively 70 scientific discussion on how functional alpha diversity can be appropriately quantified (Mason 71 et al. 2005, Petchey & Gaston 2006, Villéger et al. 2008, Mouchet et al. 2010), suggestions 72 were made also for the expression of functional beta diversity (Swenson 2011, Botta-Dukát 73 2018, Chao et al. 2019). Among them, a large variety of indices for calculating dissimilarity 74 between pairs of communities on the basis of the traits of their species have been proposed 75 (e.g. Ricotta & Burrascano 2008, Cardoso et al. 2014, Ricotta & Pavoine 2015). Although 76 these indices have been introduced as straightforward measures for revealing between-77 community dissimilarity on the basis of traits, they have very different concepts behind, and 78 we still lack a comparative review of them. 79 In this paper we aim to provide an overview and a conceptual framework for the pairwise 80 functional dissimilarity (hereafter called FDissim) measures available in the literature to our 81 best knowledge. We start with a (1) short overview of the concept and indices of ecological 82 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 5 (dis-)similarity without accounting for relatedness of species, then (2) we review and classify 83 FDissim indices according to their conceptual basis, and (3) we test the performance of 84 FDissim indices. 85 86 Short overview of taxon-based (dis-)similarity methods 87 Most FDissim measures are generalizations of simple indices which were originally designed 88 for expressing dissimilarity based on species composition (that is, omitting similarities 89 between species). We start the review of trait-based (dis-)similarity measures with a brief 90 summary of these species-based indices. Then, we present a framework of approaches 91 including several families of trait-based dissimilarity indices. 92 Species-based indices 93 Most indices can be written in either similarity (s) or dissimilarity (d=1-s) form but when we 94 do not see necessary to specify the form, we call them ‘resemblances’. In the case of 95 presence/absence data, these indices are based on the well-known 2×2 contingency table 96 whose cells represent the number of species shared (denoted by a), as well as the number of 97 species occurring only in one of the communities (b and c). The fourth cell of the contingency 98 table quantifying the number of shared absences is disregarded by these indices and rarely 99 used in ecological analyses (but see Tamás et al. 2001). All these indices agree that they 100 express similarity as the proportion of shared diversity to total diversity. Hence, all of them 101 range between 0 and 1. In the case of presence/absence data the number of shared species, a, 102 in the numerator stands for shared diversity for all indices, while the denominators are 103 different. In the Sørensen index (sS) the denominator is the arithmetic mean of the species 104 numbers of the two communities, in Ochiai index (sO) it is their geometric mean, in 105 Kulczynski (sK) it is their harmonic mean, while in Simpson index (sSi) it is the richness of the 106 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 6 species poorer community. If the two communities are equally species-rich, then these indices 107 are equal, otherwise sS < sO < sK < sSi. In the Jaccard index (sJ), the denominator is the total 108 number of species in the two communities, while in Sokal & Sneath index (sSS) species 109 occurring in a single community are taken into account with double weight. There is a direct 110 and monotonic relationship between Jaccard, Sørensen, and Sokal & Sneath indices (see 111 Appendix S1). Table 1 summarizes the similarity and dissimilarity forms of the above indices. 112 For abundance data, the resemblance of two communities is derived from the summation of 113 species-wise differences, with the simplest interpretation being the Euclidean and the 114 Manhattan distances, respectively: 115 Eq. 1. ��������� � �∑ ��� � ��� �� ����� 116 Eq. 2. ��� ����� � ∑ �� � ��� ����� 117 where xij and xik are the abundance of species i in communities j and k, Sjk is the total number 118 of species in j and k. For both indices, the minimum is 0 but the maximum of Euclidean 119 distance is the square-root of the sum of squared abundances, while for Manhattan distance 120 the maximum is the sum of abundances. Obviously, their dependence on total abundance 121 makes these index values difficult to compare across samples; therefore, indices including a 122 standardization have become more popular in ecological studies. The standardization is 123 possible in several ways. The first option is to standardize raw species contributions to 124 between-community dissimilarity (xij-xik), and then to sum them. Therefore, each species-level 125 difference in abundance should be divided by a scaling factor in a way that maximal species-126 level difference is 1 and this difference is maximal if species present only one of the 127 compared communities. Summing xij and xik in the denominator satisfies this requirement and 128 gives a well-known distance measure, the Canberra index: 129 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 7 Eq. 3. ��� ����� � ∑ ������������������ �� ��� 130 However, Canberra index still ranges between 0 and Sjk. According to Ricotta & Podani 131 (2017), the normalized Canberra index can be derived by unweighted averaging of species 132 contributions: 133 Eq. 4. ���� ����� � � �� ∑ ��������� ��������� �� ��� 134 Alternatively, species-level differences can be divided by max(xij, xik). It also results unity, if 135 species occur only either of the plots. Ricotta & Podani (2017) called this modified Canberra 136 index, whose normalized version follows: 137 Eq. 5. ����� ����� � � �� ∑ ��������� ��� ���,���" �� ��� 138 Calculating from binary data, both normalized Canberra and normalized modified Canberra 139 result in Jaccard dissimilarity. 140 A different way of standardization is possible if raw species-level differences are summed and 141 divided by the sum of their theoretical maxima. In this case, the denominator can follow the 142 logic of Canberra index, thus leading to the Bray-Curtis index: 143 Eq. 6. �#� � ∑ ��������� ��� ��� ∑ �������" ��� ��� 144 Analogously with the normalized modified Canberra index, instead of the sum, the 145 denominator may contain the maximum of abundance, resulting in the formula known as 146 Marczewski-Steinhaus index: 147 Eq. 7. �� � ∑ ��������� ��� ��� ∑ ��� ���,���" ��� ��� 148 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 8 Worth to note that Bray-Curtis and Marczewski-Steinhaus indices calculated on 149 presence/absence data return the values of Sørensen index and Jaccard index in dissimilarity 150 form, respectively. Moreover, several abundance-based indices can be expressed if we 151 generalize a, b, and c quantities used during the definition of indices for presence/absence 152 data (Tamás et al. 2001). 153 Eq. 8. % � ∑ min ��� , ��� � ����� 154 Eq. 9. �% � ∑ �max ��� , ��� � � �� � ����� 155 Eq. 10. �% � ∑ �max ��� , ��� � � ��� � ����� 156 Substituting a, b and c with a’, b’ and c’ into the formula of Sørensen index gives Bray-157 Curtis, and doing so with Jaccard index results in the Marczewski-Steinhaus. Abundance 158 versions of all other presence/absence indices can be created in the same manner. 159 160 A classification of FDissim indices 161 FDissim indices incorporate trait information into the calculation of dissimilarity in different 162 ways. The simplest solution is when summary statistics or distributions are calculated for the 163 two communities and a measure of distance or segregation is calculated between them. We 164 call this the summary-based class, and in our review, we include two approaches within this, 165 the typical value approach and the distribution-based approach. In the second class we 166 include indices which utilize a symmetrical species by species (dis-)similarity matrix and link 167 it directly through matrix operations with the compositional matrix. We call this the 168 dissimilarity-based class which includes the probabilistic, the ordinariness-based, the 169 diversity partitioning, and the nearest neighbour approaches. The third class includes 170 methods which make use of between-species (dis-)similarities for classification of species; 171 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 9 therefore, we call it the classification-based class. The classification either transforms the 172 original structure of the dissimilarity matrix into discrete groups of species which can be used 173 as functional types, or expresses dissimilarities in a form of a tree-graph where between-174 species dissimilarities are organized in an inclusive hierarchy. This is a widespread approach 175 for accounting for phylogenetic relatedness, since phylogenies are commonly summarized in 176 the form of cladograms. Such methods heavily rely on the algorithm chosen for the 177 classification, including the decisions about the number of clusters and the method for 178 breaking tied values. Examples are provided by Hérault & Honnay (2007), Nipperess et al. 179 (2010), and Cardoso et al. (2014), while a review is available by Pavoine (2016). As there is 180 no general recommendation for the classification method, we omit this class from the 181 framework detailed below and the comparative test. The classification of trait-based 182 dissimilarity indices and their main properties are summarized on Table 2. 183 Typical value approach 184 Indices following this approach represent each community with a typical trait value, and 185 calculate a distance metric between them. The most commonly applied typical trait value is 186 the community weighted mean (CWM; Garnier et al. 2004). The rationale behind the CWM 187 can be linked with the mass ratio hypothesis (Grime 1998) stating that the effect of species on 188 ecosystem functioning is proportional to their relative abundances. Although, several issues 189 emerged regarding its limited applicability in statistical inference (Hawkins et al. 2017, Peres-190 Neto et al. 2017, Zeleny 2018) and its negligence of within-community variation (Muscarella 191 & Uriarte 2016), difference in CWM is still considered a reliable indicator of robust changes 192 in trait composition induced by selective forces like environmental matching or succession 193 (De Bello et al. 2007, 2013, Kleyer et al. 2012). Ricotta et al. (2015) investigated the 194 relatedness of the distance between CWMs with the probabilistic approach (see therein) and 195 showed its applicability on phylogenetic data. Due to its tolerable requirements for 196 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 10 computational capacity, Lengyel et al. (2020) used the Euclidean distance between trait 197 CWMs of phytosociological relevés for the trait-based numerical classification of grasslands 198 of Poland with a sample size of 6985 sites and 885 species. Another advantage of this method 199 is its Euclidean property. Besides the community-weighted mean, other typical values, e.g. the 200 median or the mode, might be considered depending on the scaling of the trait variable and on 201 specific research aims. 202 203 Distribution-based approach 204 Instead of typical values, the distribution of trait values is considered a more reliable 205 representative of the trait composition and variability of a community. Continuous 206 distributions can be defined by a density function, while discrete distributions by the 207 probabilities of the possible values, while both types can be characterized by a cumulative 208 distribution function (CDF). A useful analogue of the distance between typical values might 209 be distance between discrete distributions, density functions or CDFs. 210 If data is available on intraspecific trait variation, trait values forms a continuous distribution. 211 First, separate density functions have to be fitted within each species. Then, density function 212 of this community-level distribution can be calculated as weighted sum of species level 213 density functions (Carmona, de Bello, Mason, & Lepš, 2016). If such data is not available, we 214 can use relative abundances as estimates of probabilities of the corresponding trait values. 215 Pairs of trait values and their probability form a discrete distribution. 216 Similarity of density functions can be measured by their overlap (see Appendix S2 for 217 overview of overlap measures). Overlap functions between within-species trait distributions 218 has already been proved useful in the quantification of between-species niche segregation 219 (MacArthur & Levins 1967, Mouillot et al. 2005) or trait-based dissimilarity of species (Lepš 220 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 11 et al. 2006, De Bello et al. 2013). Nevertheless, they are perfectly applicable to the 221 community level as well. 222 Gregorius et al. (2003) proposed an index called delta for the quantification differences 223 between discrete trait distributions. Delta is the minimal sum of frequencies shifted from one 224 trait state to another trait state, weighted by the differences between the respective states. 225 Minimizing the sum of shifted frequencies is known in linear programming as the 226 transportation problem (Hitchcock 1941). Due to its relatively high computational demand, it 227 is unfeasible for large compositional and trait data matrices typically used in ecological 228 research, therefore, we exclude this index from our comparison. 229 Difference between two CDFs can be calculated at each possible trait values (i.e. not only the 230 observed ones), then the sum of them can be used as a trait-based dissimilarity measure. In 231 Appendix S3 we introduce the distance between CDFs in more detail. 232 233 Maximally distinct communities 234 Species-based dissimilarities, except Euclidean, Manhattan and (non-normalized) Canberra 235 distances, equal unity, which is their maximum, when the two compared communities do not 236 share any species. In this context, we could call such communities maximally distinct. 237 However, when traits are considered, two communities can be similar, even if they do not 238 share any species. For example, if all species of community A is replaced by a similar species 239 in community B, the two communities have no shared species, but from functional point of 240 view, they are similar. In this context, two communities are maximally distinct, when 241 similarity of any species from the first community is zero to any species in the other 242 community. It is a desirable property for a functional similarity index to take the value 0 if 243 and only if the two compared communities are maximally distinct. 244 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 12 245 Probabilistic approach 246 This approach can be traced back to the diversity framework proposed by Rao (1982), and 247 recently extended by Pavoine & Ricotta (2014). Rao’s within community diversity is defined 248 as the expected dissimilarity between two randomly drawn individuals from a single 249 community: 250 Eq. 11. ���� � ∑ ∑ �� � δ� � 251 where pi is the relative abundance of the ith species in the community and δij is the 252 dissimilarity between species i and j. This has become a widely used index of functional alpha 253 diversity (Botta-Dukát 2005). Likewise, a between-community component of diversity, 254 Q(p,q), can be defined as the dissimilarity between two random individuals, each selected 255 from different communities: 256 Eq. 12. ���, �� � ∑ ∑ �� � δ� � 257 Between community diversity can be expressed using within community diversity of the two 258 original communities (Q(p) and Q(q)) and the community with mean relative abundances; 259 � �&�' � �. 260 Eq. 13. 2� �&�' � � � 2 ∑ ∑ (��)� � (��)� � � δ� � �� ∑ ∑ ��� � � �� � � �261 2�� � � δ� � � �"�� �" 2 � ���, �� 262 Subtracting mean within community diversity from the between community diversity leads to 263 Rao’s dissimilarity (also called DISC): 264 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 13 Eq. 14. !* � ∑ ∑ ��������� � ∑ ∑ (�(�+���� �∑ ∑ )�)�+����2 � ∑ ∑ ��������� � � �"�� �"2 �265 2� ���� 2 � � �� � � �� � 266 where pi and qi are the relative abundances of species i in the two communities. Champely 267 and Chessel (2002) proved that if δ has squared Euclidean property, Rao quadratic entropy is 268 concave function, i.e. � �&�' � � is higher than or equal to mean of ���� and ����. Thus under 269 this condition, !* " 0. If 0 $ %� $ 1, ∑ ∑ �� � %� � , which is the weighted average of 270 between-species distances, also has to be within this range. Therefore, 0 $ !* $ 1. However, 271 DQ may be much less than 1, even if the two communities are completely distinct, when ���� 272 and ���� are high. Therefore, Pavoine & Ricotta (2014) suggested dividing DQ by its 273 theoretical maximum (see equations 3 and 4 in Pavoine & Ricotta 2014). They recognized 274 that the resulting indices are representatives of a broader family of indices, hereafter called 275 dsimcom, which are actually the implementations of Rao’s between-community and within-276 community components of diversity into the similarity formulae designed for 277 presence/absence data. For this index, it is necessary to introduce the similarity between 278 species, εij=1- δij. The expected similarity between individuals of different communities, 279 ' � ∑ ∑ � � � � (���� is taken analogous with the shared diversity, a, according to the parameters 280 of the similarity indices for presence/absence data disregarding species properties, while the 281 expected similarities within communities (' � ) � ∑ ∑ � � � � (���� and ' � * � ∑ ∑ ����(���� ) are 282 analogous with the species numbers (a+b, a+c). In this way, Pavoine & Ricotta (2014) 283 presented formulae following the Sokal & Sneath, Jaccard, Sørensen, and Ochiai indices. 284 Additionally, a formula analogous with Whittaker’s effective species turnover (β=γ/α-1; 285 Whittaker 1972, Tuomisto 2010a) is suggested for two communities, which in similarity form 286 is shown to be identical with the overlap index of Chiu et al. (2014). In this formulation 287 γ=A+B+C and α=(2A+B+ C)/2. Pavoine & Ricotta (2014) showed that members of the 288 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 14 dsimcom family provide meaningful values also if absolute abundances, percentage values or 289 binary occurrences are used instead of relative abundances. 290 When εij contains taxonomical similarities, its off-diagonal elements are 0, and A=a, B=b, and 291 C=c. 292 Worth to note the inherent link between DQ and CWMdis on the basis of the geometric 293 interpretation by Pavoine (2012) and Ricotta et al. (2015). Pavoine (2012) showed that if 294 between-species dissimilarities are in the form δij=(dij2)/2 and dij is Euclidean embeddable, DQ 295 is half the squared Euclidean distance between the centroids of two communities – a function 296 monotonically related with CWMdis, the simple Euclidean distance between centroids of 297 communities. As Ricotta et al. (2015) argue, if species relatedness is only described by a 298 dissimilarity matrix, which is the common case in phylogenetic analyses, species can be 299 mapped into a principal coordinate analysis ordination using dij. Given the Euclidean 300 embeddable property of dij, this ordination should produce S-1 or fewer ordination axes, all 301 with positive eigenvalues. Ordination scores for species can be used as traits, and therefore, 302 centroids of communities, and (squared) Euclidean distances between communities can be 303 calculated. In the special case when between-species dissimilarities are Euclidean distances, 304 DQ must be equal with the Euclidean distance between the weighted averages of traits, that is, 305 CWMdis. 306 It is also notable that Swenson et al. (2011) and Swenson (2011) use the quantity Q(p, q) as a 307 standalone index of pairwise beta diversity and call it Dpw or “Rao’s D”. The latter name is 308 misleading since Rao (1982) himself noted with Dij the DISC (or DQ) index. Q(p, q) measures 309 dissimilarity between two communities but the dissimilarity of a community from itself is not 310 zero. Swenson (2011) also presents a standardized version of Q(p, q) under the name Rao’s 311 H. With this formula the dissimilarity of a community to itself is scaled to 1, however, its 312 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 15 transformation to a meaningful scale where each community has dissimilarity value zero 313 towards itself is not elaborated. Due to this drawback, we do not consider these indices in our 314 review of functional dissimilarity measures. 315 Schmidt et al. (2017) proposed probabilistic indices with weighted and unweighted versions 316 for expressing community similarity on the basis of taxa interaction networks (called TINA, 317 taxa interaction-adjusted) and phylogenetic relatedness (PINA, phylogenetic interaction-318 adjusted). TINA and PINA differ only in what type of data the interaction matrix contains. 319 Notably, the functional formula of weighted TINA is identical with the Ochiai version of 320 dsimcom. However, the unweighted TINA, abbreviated TU, is not a special case of TINA, 321 which we consider an inconsistency. Therefore, we did not include TU as a separate index. 322 323 Ordinariness-based approach 324 With respect to functional alpha diversity, Leinster & Cobbold (2012) introduced the concept 325 of species ordinariness defined as the weighted sum of relative abundances of species similar 326 to a focal species within the same community, or in other words, the expected similarity of an 327 individual of the focal species and an individual chosen randomly from the same community. 328 According to Ricotta & Pavoine (2015) it is straightforward to replace abundances with 329 ordinariness values in the species-based (dis-)similarity indices. Following this concept, 330 Ricotta & Pavoine (2015) introduced a new family of trait-based similarity measures called 331 dissABC. dissABC applies the schemes of Jaccard, Sørensen, Ochiai, Kulczynski, Sokal & 332 Sneath, and Simpson indices. Either relative or absolute abundances can be chosen as input 333 values. Species ordinariness values can be calculated either with respect to the pooled species 334 list of the two communities under comparison, or to the total species list of the data matrix. 335 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 16 For species-based analyses, Ricotta & Podani (2017) suggested a general formula of distance 336 measures in which community dissimilarity is calculated by the weighted averaging of 337 species-level differences in abundance. From this formula, a normalized Canberra distance, 338 Bray-Curtis distance, Marczewski-Steinhaus index, and an evenness-based dissimilarity index 339 (Ricotta 2018) can be derived. According to Pavoine & Ricotta (2019), replacing species 340 abundances with species ordinariness values, a meaningful dissimilarity index can be 341 designed, which is called generalized_Tradidiss. Additionally, this index contains a factor 342 which weights the contribution of each species to the overall dissimilarity between the two 343 communities. This weight can be set to give even weight to all species or to weigh them 344 proportionally to their relative abundance in the pooled communities. 345 346 Diversity partitioning approach 347 Following the work of Hill (1973), a community with diversity of order q, qD, is as diverse as 348 a theoretical community containing qD equally abundant species. The order of diversity, q, 349 expresses the weight given to differences in species abundance, q = 0 representing the 350 presence/absence case, q = ∞ considering only the relative abundance of the most abundant 351 species in the community. Without accounting for interspecific similarities, there is emerging 352 consensus that using effective numbers (also called number of equivalents) is a 353 straightforward way for partitioning diversity into within-community (alpha), between-354 community (beta) and across-community (gamma) components (Jost 2007). Of these three, 355 the between-community component, beta diversity, can be interpreted as a form of 356 dissimilarity when applied for two communities (Ricotta 2017). Beta diversity can be derived 357 from alpha and gamma diversity in a multiplicative (beta = gamma/alpha) or an additive way 358 (beta = gamma – alpha). Jost (2007) and Chao et al. (2012) argued that multiplicative beta 359 diversity is a useful way for quantifying community differentiation; however, due to its 360 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 17 scaling between 1 and N (N being the number of communities) it is not comparable across 361 samples containing different numbers of communities. To remove this dependence, they offer 362 three solutions with which the value of multiplicative beta can be normed. Although, for 363 pairwise comparisons, N is always 2, it seems straightforward to follow these 364 recommendations, since the scaling between 0 and 1 has several advantages, and most other 365 indices also share this property. The rescaling formulae of Chao et al. (2012) embody 366 different concepts of community (dis-)similarity, which together we call the family of 367 multiplicative beta indices. The first formula is the relative turnover rate per community, 368 which is a linear transformation of beta to the normed scale. 369 Eq. 15. +��� ,-�� ,�- � � +) � 1�/�/ � 1� 370 Here 0 means identical species composition, while 1 indicates totally distinct communities. In 371 the pairwise comparison (N = 2), βturnover〈q〉 = q β - 1. 372 The second index measures homogeneity, and is a linear transformation of the inverse of beta. 373 With respect to the fact that the complement term of homogeneity is heterogeneity, we call its 374 dissimilarity form βheterogeneity: 375 Eq. 16. +�����,.� ���/ ,�- � 1 � 0 �0� � ��1 �1 � ���2 376 When N = 2, βhet〈q〉 = 2-2/ q β. With q = 0 (presence/absence case) the index is identical with 377 Jaccard index, while with q = ∞ (abundance case) it is the Morisita & Horn index. 378 The third index measures overlap between communities, whose counterpart is segregation, 379 thus we call it βsegr: 380 Eq. 17. +1�.��.���, ,�- � 1 � 30 �0� 1 )�� � � � � �)��4 51 � � � � �)�� 67 381 With q = 0, +1�.��.���, ,�- � +��� ,-�� ,�-, and both gives the Sørensen index. 382 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 18 According to Leinster & Cobbold (2012), it is possible to implement species similarities in the 383 calculation of effective numbers. This way, the meaning of qDZ, is the diversity of a 384 theoretical community with qDZ equally abundant and maximally different species. Hence, 385 both unevenness in the abundance structure and the between-species similarities decrease the 386 value of effective species number. Due to measuring diversity in effective numbers, it is 387 possible to partition diversity into alpha, beta, and gamma fractions (Leinster & Cobbold 388 2012; Botta-Dukát 2018) in the multiplicative way. Then, this multiplicative beta can be 389 rescaled using the formulae proposed by Chao et al. (2012). These indices behave consistently 390 only if abundances are taken into account as relative abundances. 391 392 Nearest neighbour approach 393 The earliest representatives of this family were shown by Clarke & Warwick (1998) and Izsák 394 & Prince (2001), then Ricotta & Burrascano (2008), and Ricotta & Bacaro (2010; see DCW 395 and DIP indices). Later Ricotta et al. (2016) introduced a new, general family called PADDis. 396 All these indices were primarily defined for presence-absence data type. The approach is 397 based on a re-definition of the b and c quantities of the 2×2 contingency table. Looking at 398 species as maximally different, and taking X and Y the two communities under comparison, b 399 can be viewed as the total uniqueness of community X. The uniqueness of a single species in 400 X is 1 if it is absent in Y, otherwise it is 0. Therefore, b is the sum of species uniqueness 401 values. However, from a functional perspective, the uniqueness of a species present only in X 402 should be between 0 and 1 if it is absent in Y but a similar species present there. Therefore, it 403 is possible to define the analogue of b which accounts for similarities between species: 404 Eq. 18. � � ∑ 1 � max� ����� � � � � ∑ max� ���� � 405 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 19 The same logic applies for c, which is the uniqueness of community Y, where C expresses the 406 degree of uniqueness: 407 Eq. 19. � � ∑ 1 � max� � ����� � �� � ∑ max� � ���� 408 Ricotta et al. (2016) define the A quantify as follows: 409 Eq. 20. ' � � �� � )� � �� � *� 410 Having A, B, and C defined as analogues of a, b, and c, it is now possible to design trait-based 411 similarity measures following the logics of Jaccard, Sørensen, Sokal & Sneath, Kulczynski, 412 Ochiai and Simpson indices. It is notable that Ricotta et al. (2016) define A as a quantity that 413 ensures the components B and C to add up to a + b + c but with no explicit biological 414 interpretation. Notably, DIP and DCW are identical with the Sørensen and Kulczynski forms of 415 PADDis. The generalization of DIP and DCW to relative abundances, DCW(Q), was also derived 416 by Ricotta & Bacaro (2010). For these two versions, it is not necessary to explicitly define the 417 A component. Using the relationships between Jaccard, Sørensen, Kulczynski, Ochiai and 418 Sokal & Sneath indices, from DCW(Q) it is theoretically possible to derive the extension of 419 PADDis to relative abundances; however, the biological interpretation of A remains dubious 420 in this framework. 421 422 Methods 423 The performance of FDissim indices can be reliably tested on data sets with known 424 background processes driving community assembly which is hardly possible to satisfy with 425 real data. Therefore, we compared the performance of FDissim indices using simulated data 426 sets. The data sets were generated using the comm.simul function of the comsimitv R package 427 (Botta-Dukát & Czúcz 2016, Botta-Dukát 2020). This function follows an individual-based 428 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 20 model for a meta-community comprising N communities and a regional pool of S species. 429 Local communities include J individuals, and are distributed equidistantly along a continuous 430 environmental gradient (with gradient values between 0 and 1). Each individual possesses 431 three traits: an ‘environmental’, a ‘competitive’ trait, and a neutral trait, all ranging on [0; 1]. 432 Intraspecific variation in trait values is neglected in the simulation, that is, individuals 433 belonging to the same species are identical. The environmental trait defines the optimum of 434 the species along the environmental gradient. The closer the position of a community along 435 the environmental gradient to the environmental trait value of a species, the more suitable it is 436 for that species: 437 Eq. 21. 89:; �:<:;= � � -��, 2� � � � -��, 2� ��� �����"� 4 . 438 where σ (sigma) is adjustable so as to change the niche width of the species, and hence, the 439 length of the gradient (see later). The competitive trait represents the resource acquisition 440 strategy of the individual. The more similar the latter value between two individuals, the 441 higher the competition is between them, which means that intraspecific competition is the 442 strongest. The neutral trait has no effect on community assembly, thus it is not considered in 443 our study. The simulation starts with the random assignment of all individuals of all 444 communities to species. The second step is a ‘disturbance’ event, when one individual ‘dies’ 445 in each community. This individual is to be replaced by an offspring of other individuals 446 within the same community or those of other communities. Each individual produces one 447 offspring or does not reproduce. Probability of reproduction depends on the strength of 448 competition. The offspring remains in the same community or randomly disperses into any of 449 the other communities. Finally, the dead individual is replaced by one new individual from 450 the seeds produced and dispersed. The probability that an individual of a certain species 451 replaces the dead individual is defined by the number of seeds of that species and the 452 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 21 suitability of the habitat. Steps between the disturbance event and the establishment of a new 453 individual constitute a single ‘generation’. Community composition is evaluated after lot of 454 generations. The strength of the environmental filtering can be adjusted by the sigma 455 parameter, respectively. When sigma is 0, all species are maximally specialist, which means 456 that they can occur only at the optimum point of the gradient (that is, at the exact value for the 457 environmental trait). If sigma is infinity, species are maximally generalist and all points along 458 the environmental gradient are equally suitable for them. Therefore, sigma is the parameter 459 which defines the suitability of each point of the gradient for each species based on its 460 distance from the respective optima. We generated data sets with sigma values 0.01, 0.1, 0.25, 461 0.5, 1 and 5 in order to simulate situations with different strength of environmental filtering. 462 The number of communities was 30, each community comprised 200 individuals, the number 463 of species in the species pool was 300, the simulation iterated for 100 generations, and we 464 allowed no intraspecific trait variation. For all the other parameters, we used the default 465 options. 466 However, it needed further explanation what real situations the six simulated levels of 467 environmental filtering represent. To provide a reference and assist interpretation, we 468 calculated two species-based beta-diversity measures, the multiplicative beta (Whittaker 469 1960) and the gradient length of the first axis of a detrended correspondence analysis (DCA) 470 ordination (Hill & Gauch 1980; Appendix S5, Fig. S5.1). The former gives the number of 471 distinct communities present in the total species pool of the gradient, while the latter is 472 minimal number of average niche breadths (also called turnover units) necessary for covering 473 the total gradient length. Moreover, we plotted the abundance of species in the sample units 474 along the gradient as a visual tool for assessing gradient length (Appendix S5, Fig. S5.2). All 475 these methods indicated that with sigma = 0.01 the gradient is extremely long: there are more 476 than 10 distinct communities and near 20 turnover units along the gradient. Samples with such 477 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 22 high beta diversity are very rare and special in real ecological research; therefore, findings 478 from simulations with sigma = 0.01 are mostly of theoretical importance. Beta diversity 479 values from sigma = 0.1 to sigma = 1 are more similar to real study situations, hence they 480 should be more relevant for practice. At sigma = 5, environmental filtering is practically not 481 operating, between-community variation is driven by interspecific relations and chance. 482 We calculated between-species dissimilarities as the Gower distance between their 483 environmental trait values which in this case equals the Euclidean distance scaled to [0; 1]. 484 These distances had to be transformed to similarities according to the requirements of the 485 FDissim indices. Several formulae are available with which it is possible; however, they may 486 assume different functional relationships between similarity and distance. One of such 487 formulae we used is the linear transformation according to Similarity = 1-Distance. Besides 488 this, we also used Similarity = e-u×Distance which supposes a curvilinear function between 489 similarity and distance (Leinster & Cobbold 2012). With this exponential formula, it is 490 possible to weight the importance of small Gower distances between species relative to large 491 distances. With changing the parameter u it is possible to adjust how steeply similarity 492 decreases with increasing distance. We set u = 10 which leads to a relatively steep decline. 493 Although, after this transformation the minimal value for similarity is higher than zero, we 494 considered it negligibly low (e-10≈0.000045) so we did not apply the transformation proposed 495 by Botta-Dukát (2018). For all FDissim indices where it was necessary we used the similarity 496 matrix or a dissimilarity matrix calculated as Dissimilarity = 1-Similarity as input. The 497 dissimilarity matrix is identical with the Gower distance matrix if the similarities were 498 calculated in a linear way, but in the other case, it keeps the exponential relationship between 499 distance and (dis-)similarity. 500 Dissimilarity matrices were calculated for the four community data sets with different sigma 501 values, with the two functions transforming Gower distances, and across a broad range of 502 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 23 available FDissim indices. For indices where absolute or relative abundances could have been 503 taken into account, we opted for relative abundance for the sake of better comparability. With 504 generalized_Tradidiss, we calculated the ‘even’ and the ‘uneven’ weighting versions. The 505 entire analysis was run with abundance and presence/absence data. Some FDissim indices are 506 only suitable for binary data, thus the number of indices applied for relative abundance and 507 binary data were 25 and 31, respectively. In cases of indices handling both data types, we 508 used exactly the same version of the index as with abundance data, hence communities with 509 different numbers of species were given equal weight due to division by community totals. 510 Additionally, dissimilarity matrices were also calculated using the Bray-Curtis index (for 511 binary data: Sørensen index in dissimilarity form) to provide a contrast against the case 512 disregarding between-species dissimilarities. 513 Then for each dissimilarity matrices, we conducted two types of analyses. Firstly, we 514 compared how strongly the dissimilarity indices correlate with the environmental distance 515 using Kendall tau rank correlation. This gives an estimate of how well a dissimilarity index 516 reveals the monotonic relationship between trait composition of local communities and the 517 environmental gradient. We visually assessed the shape of relationship between dissimilarity 518 and environmental distance in the case of lowest sigma (i.e., longest gradient) when the 519 distortion of linear relationship between the two is supposed to be the strongest. Then, to 520 disentangle the effects of different methodological decisions and the sigma parameter on the 521 correlation between FDissim indices and environmental distance we calculated a random 522 forest model. In this model the dependent variable was the Kendall tau correlation coefficient, 523 while the independent variables were the sigma, the data type (abundance vs. 524 presence/absence), the transformation method for Gower distances (linear vs. exponential), 525 and the FDissim method. Within approaches FDissim methods often strongly correlated that 526 resulted in very similar Kendall’s tau values. Therefore, only the Sørensen/Bray-Curtis 527 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 24 versions of dsimcom, dissABC, PADDis/DCW, generalized_Tradidiss with uneven weights, as 528 well as βturnover, CWMdis, and the CDFdis were included into this analysis. Variable 529 importance scores (VIS) in the random forest were estimated by the permutation approach 530 based on mean decrease in log-likelihood using the varimp function of the partykit package. 531 The effects of the model terms were also illustrated by heat-maps. 532 All statistical analyses were done in R (R Core Team 2019) using the FD (Laliberté & 533 Legendre 2010, Laliberté et al. 2014), adiv (Pavoine 2020a,b), comsimitv (Botta-Dukát 2020,) 534 vegan (Oksanen et al. 2019), DescTools (Signorell et al. 2020), partykit (Hothorn et al. 2006, 535 Strobl et al. 2007, Strobl et al. 2008, Hothorn & Zeileis 2015) packages. 536 Results 537 Kendall tau correlation coefficients decreased as the strength of environmental filtering 538 decreased (that is, with increasing sigma) in all examined cases. For FDissim indices which 539 handled both data types, presence/absence data resulted in lower correlations than abundance 540 data for all indices. For most indices, this difference was highest at intermediate values for 541 sigma. These trends were consistent between the linear and the exponential transformations. 542 Correlations for all indices at all sigma values with linear transformation are shown in Table 3 543 for abundances data and in Table 4 for presence/absence data. 544 In most simulation scenarios, the FDissim indices correlated more strongly with the 545 environmental gradient than the species-based Bray-Curtis index. However, in several 546 occasions, indices belonging to the nearest neighbour family performed poorer than the 547 species-based dissimilarity. Notably, at the highest sigma and with presence/absence data, all 548 indices showed correlation near to zero but among them the Bray-Curtis index had the highest 549 correlation with environmental distance. 550 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 25 As expected, we found perfect rank correlations among Jaccard, Sørensen, Sokal-Sneath and 551 Whittaker’s beta versions of dsimcom, among Jaccard, Sørensen and Sokal-Sneath forms of 552 dissABC, between DIP and Sørensen form of PADDis (only for presence-absence data), 553 between DCW and Kulczynski form of PADDis (only for presence-absence data), and between 554 DIP and DCW (for abundance data type). 555 Dissimilarity indices showed various shapes of relationship with environmental distance 556 (Appendix S4). At strongest environmental filtering, all FDissim indices had dissimilarity 557 values near zero at minimal environmental distance, only the species-based Bray-Curtis which 558 had dissimilarity was near 0.4 at the smallest environmental distances. In case of linear 559 transformation of Gower distances and presence/absence data, approximately linear 560 relationship was found for CWMdis, CDFdis, DQ, Sørensen and Ochiai forms of dsimcom, 561 Jaccard form of dissABC, Marczewski-Steinhaus form of generalized_Tradidiss with both 562 weighting versions, βheterogeneity and βsegregation; although, most other indices showed only a 563 small degree of distortion of linear function (Figure S4.1). Exponential relationship was found 564 for the evenness-based (PE) form of generalized_Tradidiss. Notably, the taxon-based Bray-565 Curtis index had the steepest asymptotic function among all. In case of exponential 566 transformation all other indices relying on between-species dissimilarities showed an 567 asymptotic curve (Figure S4.2). 568 In the random forest, niche width (that is, sigma) acquired by far the highest variable 569 importance score (VIS=0.114). The less important variables were the data type (VIS=0.0176), 570 the dissimilarity method (VIS=0.0037) and the transformation (VIS=-0.00001). The heat map 571 (Figure 1) also revealed a strong decrease in correlation along increasing sigma. It is also 572 clearly shown that in most cases abundance data resulted in significantly higher correlation 573 than presence/absence. The difference between linear and exponential transformation methods 574 was not always visible. Regarding variation between dissimilarity indices, the most striking 575 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 26 patterns were the relatively poor performance of the PADDis/DCW indices. All but the latter 576 index combined with abundance data and linear transformation of dissimilarities lead to the 577 highest correlation with environmental distance. 578 579 Discussion 580 General patterns in the correlation with environmental distance 581 We ran different simulation scenarios with varying strength of environmental filtering. We 582 expected that the correlation between FDissim indices and environmental distance to be the 583 highest when the environmental filtering is the strongest, and the correlation to become 584 neutral when environmental filtering is not effective. When environmental filtering was 585 strongest (that is, minimal overlap of species niches along the environmental gradient), all 586 FDissim indices correlated highly with the environmental gradient. As expected, correlation 587 between trait dissimilarity and environmental distance decreased as filtering weakened, 588 moreover, differences between families of indices became more apparent. This result suggests 589 that all tested methods are able to reveal the strong environmental filtering processes. 590 As the contribution of competitive exclusion and stochastic processes approach or override 591 environmental filtering, the correlation between FDissim indices and the background gradient 592 becomes weaker. This decrease itself is not a drawback of the FDissim methods, rather it is a 593 consequence of our study design, since we applied a series of scenarios where the effect of 594 niche filtering was decaying. However, we think that the degree of the decrease reflects the 595 sensitivity of the FDissim indices to the underlying trait-environmental relationship. Indices, 596 which showed high correlation with environmental distance, could be capable of revealing the 597 environmental signal even when it is weak. Actually, in our tests, most indices reached 598 similarly high correlation, and there were only a few combinations of simulation parameters 599 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 27 which resulted in a decreased correlation with environmental distance for some dissimilarity 600 indices. 601 Determinants of the correlation based on the random forest model 602 The random forest model revealed that the effect of gradient length is the most important 603 determinant of the correlation between dissimilarity and environmental distance, while 604 methodological decisions had much lower variable importance. These observations suggest 605 that the absolute value of the correlation between dissimilarity and environmental distance is 606 primarily dependent on the sample in hand, and can be influenced by methodological 607 decisions to a limited extent. 608 Correlations were stronger with abundance than with presence/absence data. This finding is at 609 least partly attributable to our simulation design where community composition was driven by 610 individual-based processes: birth, fitness difference, reproduction, and death. As a result, 611 species relative abundances had to be proportional with their environmental suitability in the 612 local community. Transforming such data to binary scale loses meaningful information and 613 weakens the correlation between dissimilarity in trait composition and environmental 614 background. In cases when presences and absences of species respond more robustly to the 615 main environmental gradient, while relative abundances change stochastically, or abundance 616 estimations are inaccurate, the binary data type might be more straightforward. 617 Transforming between-species dissimilarities has a potential to conform distributional 618 requirements, to approximate expert intuitions about relatedness of species or to customize 619 sensitivity to functional difference with respect to specific research aims. For most indices 620 across the tested range of gradient length and data type, the exponential transformation 621 resulted a somewhat lower correlation than with linear transformation. More insight is 622 provided by examining the shape of the relationships besides the pure correlation value. After 623 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 28 linear transformation of Gower distances, most dissimilarity indices showed a linear or 624 slightly curved function along environmental distance; although the scatter of the evenness-625 based generalized_Tradidiss differed considerably from the straight line towards an 626 exponentially increasing one. After exponential transformation of between-species trait 627 dissimilarities, all indices in the direct dissimilarity-based class showed a rather steeply 628 increasing asymptotic function. This result suggests that with the exponential transformation 629 of between-species dissimilarities, it is possible to make FDissim indices more sensitive to 630 smaller differences in functional composition. Certainly, summary-based indices (CWMdis, 631 CDFdis) are not affected by this transformation, since they are not based on between-species 632 dissimilarities. 633 Comparison of taxon-based vs. trait-based dissimilarity 634 The basic assumption of functional ecology is that the traits of individuals should be in closer 635 relationship with ecological properties than their taxonomical status. Following this argument, 636 we expected that trait-based dissimilarity measures correlate more strongly with the 637 environmental background than species-based indices. In contrast, higher correlation of 638 species-based dissimilarity than trait-based dissimilarity indicates loss of information with the 639 introduction of between-species similarity – which is non-sensual since our data was 640 simulated in a way to possess a strong pattern in trait-environment relationship. We used the 641 Sørensen/Bray-Curtis index in a dissimilarity form as a reference method representing 642 species-based dissimilarity calculations disregarding traits. Our expectation was fulfilled by 643 all indices with the exception of the members of the nearest neighbour family (DIP, DCW and 644 PADDis). We suspect two potential reasons behind the low performance of these latter groups 645 of indices. The first one is the improper scaling factor used for standardizing the ‘operational 646 part’ of the indices (see the description in of the PADDis family and the discussion about it 647 under the paragraph “Within-family variation of indices”). Second, these indices rely on the 648 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 29 quantities of minimally different species in the two communities under comparison. However, 649 the minimum is a less robust descriptor of any sample distribution because of its dependency 650 on sampling error; therefore, it might provide a poor representation of total community 651 dissimilarity. 652 Although, we did not include dissimilarity values at exactly zero distance, the y-intercept (also 653 called ‘nugget’) of the dissimilarity vs. environmental distance functions can be extrapolated 654 with negligible error (Fortin & Dale 2005). Brownstein et al. (2012) argued that the nugget of 655 the distance decay relationship is a direct estimate of the amount of chance in the variation 656 between local communities. In this respect worth noting is that the nugget with species-based 657 Bray-Curtis index was near 0.4, while with all trait-based indices the nugget was near zero. 658 This suggests that without accounting for species similarities, environmental distance between 659 communities can be overestimated due to similar species replacing each other. 660 Within-family variation of indices 661 The perfect correlation between Jaccard, Sørensen and Sokal-Sneath forms of dsimcom and 662 dissABC families was expected, since the original, taxon-based Jaccard, Sørensen and Sokal-663 Sneath indices are algebraically related, too (Janson & Vegelius 1981). However, for PADDis 664 Jaccard, Sørensen and Sokal-Sneath forms showed correlation below 1. At this family, the B 665 and C components of the 2×2 contingency table are defined as measurable quantities with 666 clear interpretation: the sum of species uniqueness values within each community. The total 667 diversity (A+B+C) is defined to be equal with the species richness of the pooled pair of 668 communities (a+b+c), and the quantity A is derived by subtracting (B+C) from it. With this 669 definition, A remains a virtual quantity with no biological interpretation. In PADDis indices, 670 trait-based quantities B and C appear in the numerator (the ‘operational part’ sensu Ricotta et 671 al. 2016) of the indices, while in the denominators (i.e., in the ‘scaling factor’) the taxon-672 based quantities, a, b and c are used. We argue that the inconsistent behaviour of PADDis is 673 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 30 due to the application of taxon-based quantities for scaling factors of trait-based operational 674 parts. At the same time, we acknowledge that we either see no obvious solution to define total 675 diversity or shared diversity according to the uniqueness-based idea behind PADDis in a more 676 realistic way. In the generalized_Tradidiss family, the trait-based analogue of Bray-Curtis 677 index can be achieved by calculating generalized Canberra distance with uneven weighting of 678 species. We expected this to be perfectly correlated with Marczewski-Steinhaus form of 679 generalized_Tradidiss index with uneven weighting, since Bray-Curtis and Marczewski-680 Steinhaus indices are the abundance forms of Sørensen and Jaccard indices, respectively. 681 However, the correlation between them was lower. In the generalized_Tradidiss family, 682 between-community dissimilarity is calculated as weighted sum a standardized differences in 683 species ordinariness values. Species ordinariness is calculated on the basis of species 684 abundance and trait values; however, weights used for adjusting species-level contributions 685 are derived solely from abundances. Therefore, generalized_Tradidiss also follows a ‘hybrid’ 686 approach in accounting for taxon-based vs. trait-based information. We argue that this is the 687 reason why the algebraic relationships between the original Sørensen and Jaccard indices does 688 not apply to its Sørensen/Bray-Curtis-type and Jaccard/Marczewski-Steinhaus-type forms. To 689 sum up, we point to our observation that Jaccard, Sørensen and Sokal-Sneath forms of certain 690 families of indices do not satisfy the algebraic relationships they supposed to, opening space 691 for potential confusion. These algebraic relations hold only if A, B and C quantities are 692 explicitly and consistently defined. 693 Families of FDissim indices combine abundance difference of species between plots and 694 interspecific trait differences in a unique way, while indices belonging to the same family 695 differ in how they relate this amount of ‘unshared’ variation (summarized as the b and c 696 portions of the contingency table) to the shared (a) variation. Some indices are able to handle 697 abundances either as absolute or relative abundance (e.g. dsimcom, generalized_Tradidiss, 698 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 31 dissABC), while others divide absolute abundances by their sum over the respective 699 community, thus they work only with relative abundances. When indices in the former group 700 are set to consider absolute abundances, they become sensitive to variation in the summed 701 abundances of the communities under comparison. To place our tests on a common ground, 702 we simulated communities with equal total number of individuals, and set all indices, where 703 relevant, to work with relative abundances. Hence, we removed the effect of differences in 704 total abundance. The constant number of individuals might have increased the similarity 705 between FDissim indices belonging to the same family and the correlation with the 706 environmental gradient. The sum of abundances, let them be measured on any quantitative 707 scale, may vary considerably in real study situations due to aggregated distribution of 708 individuals or uneven sampling effort. Therefore, our findings are more likely valid for 709 settings when the sum of abundances are relatively stable, e.g. when sampling effort is 710 controlled and individuals are dispersed evenly, or when abundances are recorded on 711 percentage scale. 712 Limitations of our study 713 In our study, we simulated a research situation in a simplistic way. We applied only one 714 environmental gradient which operated as an environmental filter driving convergence on a 715 single trait. Besides this, we applied another trait which was constantly affected by a low level 716 of competitive exclusion. These two traits were uncorrelated. Nevertheless, there was some 717 effect of random drift on community composition due to the probabilistic components of the 718 simulation algorithm. We varied the strength of environmental filtering thus it had different 719 relative contribution compared with competitive exclusion and stochasticity. In real research 720 situations local trait composition is influenced by a wide range of processes, including several 721 abiotic and biotic filters acting simultaneously. Unless they are manipulated as parts of an 722 experimental system, the full set of such filters are usually unknown for the researchers. The 723 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 32 multiplicity of filters may reduce the ability of FDissim indices in recovering trait-724 environment relationships. Further research should clarify how increasing complexity of the 725 sample affects the behaviour of FDissim indices. 726 727 Conclusions 728 Considering the diversity of concepts they are built upon, FDissim indices showed 729 unexpectedly low variation in performance. CWMdis, dsimcom, generalized_Tradidiss 730 acquired the highest correlation with environmental distance in all simulation scenarios, 731 therefore they seem to be equally suitable for quantifying pairwise beta diversity based on 732 traits. Nevertheless, the most important determinant of the matching between trait-based 733 dissimilarity and environmental distance is the length of the trait gradient. Besides this, the 734 data type (presence/absence vs. abundance) also affected the correlation more strongly than 735 the choice of FDissim method. Extending the comparative tests of FDissim measure to more 736 complex gradients and real data sets could offer further insight into their behaviour. 737 738 Data availability 739 Simulated data was generated using the comsimitv R package. Own functions for functional 740 dissimilarity indices are made available through the Zenodo public repository: 741 10.5281/zenodo.4323590. 742 743 Author contributions 744 A.L. designed and carried out the analysis, lead writing, Z.B.D. discussed the concept and the 745 results, wrote parts of and commented on the manuscript. 746 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 33 747 References 748 Anderson, M. J., Crist, T. O., Chase, J. M., Vellend, M., Inouye, B. D., Freestone, A. L., 749 Sanders, N. J., Cornell, H. V., Comita, L. S., Davies, K. F., Harrison, S. P., Kraft, N. J. B., 750 Stegen, J. C. & Swenson, N. G. (2011). Navigating the multiple meanings of β diversity: a 751 roadmap for the practicing ecologist. Ecology Letters, 14(1), 19-28. doi:10.1111/j.1461-752 0248.2010.01552.x 753 Anderson, M. J., Ellingsen, K. E. & McArdle, B. H. (2006). Multivariate dispersion as a 754 measure of beta diversity. Ecology Letters, 9(6), 683-693. doi:10.1111/j.1461-755 0248.2006.00926.x 756 Baselga, A. & Leprieur, F. (2015). Comparing methods to separate components of beta 757 diversity. Methods in Ecology and Evolution, 6: 1069-1079. doi:10.1111/2041-210X.12388 758 Botta�Dukát, Z. & Czúcz, B. (2016). Testing the ability of functional diversity indices to 759 detect trait convergence and divergence using individual�based simulation. Methods in 760 Ecology and Evolution, 7, 114-126. https://doi.org/10.1111/2041-210X.12450 761 Botta�Dukát, Z. (2005). Rao's quadratic entropy as a measure of functional diversity based 762 on multiple traits. Journal of Vegetation Science, 16, 533-540. https://doi.org/10.1111/j.1654-763 1103.2005.tb02393.x 764 Botta�Dukát, Z. (2018). The generalized replication principle and the partitioning of 765 functional diversity into independent alpha and beta components. Ecography, 41: 40-50. 766 doi:10.1111/ecog.02009 767 Botta-Dukat, Z. (2020). comsimitv: Flexible Framework for Simulating Community 768 Assembly. R package version 0.1.4. https://CRAN.R-project.org/package=comsimitv 769 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 34 Brownstein, G., Steel, J.B., Porter, S., Gray, A., Wilson, C., Wilson, P.G. & Wilson, J. B. 770 (2012). Chance in plant communities: a new approach to its measurement using the nugget 771 from spatial autocorrelation. Journal of Ecology, 100, 987-996. 772 https://doi.org/10.1111/j.1365-2745.2012.01973.x 773 Cardoso, P., Rigal, F., Carvalho, J.C., Fortelius, M., Borges, P.A.V., Podani, J. & Schmera, D. 774 (2014). Partitioning taxon, phylogenetic and functional beta diversity into replacement and 775 richness difference components. Journal of Biogeography, 41, 749-761. 776 doi:10.1111/jbi.12239 777 Carmona, C. P., de Bello, F., Mason, N. W. H., Lepš, J. (2016). Traits without borders: 778 Integrating functional diversity across scales. Trends in Ecology and Evolution 31(5), 382-779 394. doi: 10.1016/j.tree.2016.02.003 780 Champely, S., Chessel, D. (2002). Measuring biological diversity using Euclidean metrics. 781 Environmental and Ecological Statistics 9, 167–177. 782 https://doi.org/10.1023/A:1015170104476 783 Chao, A., Chiu, C. and Hsieh, T.C. (2012). Proposing a resolution to debates on diversity 784 partitioning. Ecology, 93, 2037-2051. https://doi.org/10.1890/11-1817.1 785 Chao, A., Chiu, C.�H., Villéger, S., Sun, I�F., Thorn, S., Lin, Y.�C., Chiang, J.�M., & 786 Sherwin, W. B. (2019). An attribute�diversity approach to functional diversity, functional 787 beta diversity, and related (dis)similarity measures. Ecological Monographs, 89(2), e01343. 788 10.1002/ecm.1343 789 Chiu, C.-H., Jost, L. & Chao, A. (2014). Phylogenetic beta diversity, similarity, and 790 differentiation measures based on Hill numbers. Ecological Monographs, 84(1), 21-44. 791 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 35 Clarke, K.R. & Warwick, R.M. (1993). Quantifying structural redundancy in ecological 792 communities. Oecologia, 113(2), 278-289. 793 De Bello, F., Carmona, C.P., Mason, N.W.H., Sebastià, M.�T. and Lepš, J. (2013). Which 794 trait dissimilarity for functional diversity: trait means or trait overlap? Journal of Vegetation 795 Science, 24, 807-819. doi:10.1111/jvs.12008 796 De Bello, F., Lepš, J., Lavorel, S., & Moretti, M. (2007). Importance of species abundance for 797 assessment of trait composition: an example based on pollinator communities. Community 798 Ecology, 8(2), 163–170. https://doi.org/10.1556/ComEc.8.2007.2.3 799 Díaz, S., & Cabido, M. (2001). Vive la différence: plant functional diversity matters to 800 ecosystem processes. Trends in Ecology and Evolution, 16(11), 646–655. 801 https://doi.org/10.1016/S0169-5347(01)02283-2 802 Faith, D. P., Minchin, P. R. & Belbin, L. (1987). Compositional dissimilarity as a robust 803 measure of ecological distance. Vegetatio 69, 57-68. 804 Fortin, M.�J. & Dale, M.R.T. (2005). Spatial Data Analysis: a Guide for Ecologists. 805 Cambridge University Press, Cambridge. 806 Garnier, E., Cortez, J., Billès, G., Navas, M., Roumet, C., Debussche, M., Laurent, G., 807 Blanchard, A., Aubry, D., Bellmann, A., Neill, C. & Toussaint, J. (2004). Plant functional 808 markers capture ecosystem properties during secondary succession. Ecology, 85, 2630-2637. 809 doi:10.1890/03-0799 810 Gregorius, H.�R., Gillet, E.M. & Ziehe, M. (2003). Measuring Differences of Trait 811 Distributions Between Populations. Biometrical Journal, 45, 959-973. 812 https://doi.org/10.1002/bimj.200390063 813 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 36 Grime, J. P. (1998). Benefits of plant diversity to ecosystems: immediate, filter and founder 814 effects. Journal of Ecology, 86, 902–910. 815 Hawkins, B.A., Leroy, B., Rodríguez, M.Á., Singer, A., Vilela, B., Villalobos, F., Wang, X. 816 & Zelený, D. (2017). Structural bias in aggregated species�level variables driven by repeated 817 species co�occurrences: a pervasive problem in community and assemblage data. Journal of 818 Biogeography, 44, 1199-1211. 819 Hérault, B., & Honnay, O. (2007). Using life-history traits to achieve a functional 820 classification of habitats. Applied Vegetation Science, 10(1), 73–80. 821 https://doi.org/10.1111/j.1654-109X.2007.tb00505.x 822 Hill, M. O. & Gauch, H. G. (1980). Detrended Correspondence Analysis: An Improved 823 Ordination Technique. Vegetatio, 42, 47–58. 824 Hill, M. O. (1973). Diversity and evenness: a unifying notation and its consequences. 825 Ecology, 54(2), 427–432. 826 Hitchcock, F.L. (1941). Distribution of a product from several sources to numerous localities. 827 Journal of Mathematical Physics, 20: 224-230. 828 Hothorn, T., Hornik, K., Van de Wiel, M. A. & Zeileis, A. (2006). A Lego System for 829 Conditional Inference. The American Statistician, 60(3), 257–263. 830 Hothorn, T., Zeileis, A. (2015). partykit: A Modular Toolkit for Recursive Partytioning in R. 831 Journal of Machine Learning Research, 16, 3905-3909. URL 832 http://jmlr.org/papers/v16/hothorn15a.html 833 Izsák, C., & Price. R. G. (2001). Measuring b-diversity using a taxonomic similarity index, 834 and its relation to spatial scale. Marine Ecology Progress Series 215, 69–77. 835 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 37 Janson, S. & J. Vegelius (1981). Measures of ecological association. Oecologia, 49(3), 371-836 376. 837 Jost, L. (2007). Partitioning diversity into independent alpha and beta components. Ecology, 838 88, 2427–2439. 839 Kleyer, M., Dray, S., Bello, F., Lepš, J., Pakeman, R.J., Strauss, B., Thuiller, W. & Lavorel, 840 S. (2012). Assessing species and community functional responses to environmental gradients: 841 which multivariate methods? Journal of Vegetation Science, 23, 805-821. doi:10.1111/j.1654-842 1103.2012.01402.x: 1199–1211. 843 Koleff, P., Gaston, K. J. & Lennon, J. J. (2003). Measuring beta diversity for presence–844 absence data. Journal of Animal Ecology, 72, 367-382. doi:10.1046/j.1365-845 2656.2003.00710.x 846 Laliberté, E. & P. Legendre (2010). A distance-based framework for measuring functional 847 diversity from multiple traits. Ecology, 91, 299-305. 848 Laliberté, E., Legendre, P., & Shipley, B. (2014). FD: measuring functional diversity from 849 multiple traits, and other tools for functional ecology. R package version 1.0-12. 850 Legendre, P. & Legendre, L. (1998) Numerical ecology. Elsevier, Amsterdam, NL 851 Legendre, P., De Cáceres, M. (2013). Beta diversity as the variance of community data: 852 dissimilarity coefficients and partitioning. Ecology Letters 16, 951–963 853 Leinster, T. & Cobbold, C.A. (2012). Measuring diversity: the importance of species 854 similarity. Ecology, 93, 477-489. doi:10.1890/10-2402.1 855 Lengyel, A. & Podani, J. (2015). Assessing the relative importance of methodological 856 decisions in classifications of vegetation data. Journal of Vegetation Science, 26, 804-815. 857 doi:10.1111/jvs.12268 858 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 38 Lengyel, A., Swacha, G., Botta-Dukát, Z. & Kacki, Z. (2020). Trait-based numerical 859 classification of mesic and wet grasslands in Poland. Journal of Vegetation Science, 31, 319–860 330. https://doi.org/10.1111/jvs.12850 861 Lepš, J., de Bello, F., Lavorel, S. & Berman, S. (2006). Quantifying and interpreting 862 functional diversity of natural communities: practical considerations matter. Preslia, 78, 481–863 501. 864 MacArthur, R., Levins, R. (1967). Limiting similarity convergence and divergence of 865 coexisting species. American Naturalist, 101, 377–387. 866 Mason, N. W. H., Mouillot, D., Lee, W. G. & Wilson, J. B. (2005). Functional richness, 867 functional evenness and functional divergence: the primary components of functional 868 diversity. Oikos, 111, 112-118. doi:10.1111/j.0030-1299.2005.13886.x 869 McGill, B., Enquist, B. J., Weiher, E., Westoby, M. (2006). Rebuilding community ecology 870 from functional traits. Trends in Ecology and Evolution 21(4), 178-185. 871 Mouchet, M.A., Villéger, S., Mason, N.W.H. and Mouillot, D. (2010). Functional diversity 872 measures: an overview of their redundancy and their ability to discriminate community 873 assembly rules. Functional Ecology, 24, 867-876. doi:10.1111/j.1365-2435.2010.01695.x 874 Mouillot, D., Stubbs, W., Faure, M., Dumay, O., Tomasini, J.A., Wilson, J.B. & Chi, T.D. 875 (2005). Niche overlap estimates based on quantitative functional traits: a new family of 876 non�parametric indices. Oecologia, 145, 345–353. 877 Muscarella, R. & Uriarte, M. (2016). Do community-weighted mean functional traits reflect 878 optimal strategies? Proceedings of the Royal Society B, 283, 20152434. 879 https://doi.org/10.1098/rspb.2015.2434 880 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 39 Nipperess, D.A., Faith, D.P. & Barton, K. (2010), Resemblance in phylogenetic diversity 881 among ecological assemblages. Journal of Vegetation Science, 21, 809-820. 882 doi:10.1111/j.1654-1103.2010.01192.x 883 Oksanen, J., Blanchet, F.G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., Peter R. 884 Minchin, P. R., O'Hara, R. B., Simpson, G. L., Solymos, P., Stevens, M. H. M., Szoecs, E. & 885 Wagner, H. (2019). vegan: Community Ecology Package. R package version 2.5-6. 886 https://CRAN.R-project.org/package=vegan 887 Pavoine, S. & Ricotta, C. (2014). Functional and phylogenetic similarity among communities. 888 Methods in Ecology and Evolution, 5, 666--675. 889 Pavoine, S. & Ricotta, C. (2019). Measuring functional dissimilarity among plots: Adapting 890 old methods to new questions. Ecological Indicators, 97, 67-72. 891 Pavoine, S. (2012). Clarifying and developing analyses of biodiversity: towards a 892 generalisation of current approaches. Methods in Ecology and Evolution, 3, 509-518. 893 doi:10.1111/j.2041-210X.2011.00181.x 894 Pavoine, S. (2016). A guide through a family of phylogenetic dissimilarity measures among 895 sites. Oikos, 125, 1719-1732. doi:10.1111/oik.03262 896 Pavoine, S. (2020). adiv: An R package to analyse biodiversity in ecology. Methods in 897 Ecology and Evolution, 11, 1106– 1112. https://doi.org/10.1111/2041-210X. 898 Peres-Neto, P.R., Dray, S. & ter Braak, C.J.F. (2017). Linking trait variation to the 899 environment: critical issues with community�weighted mean correlation resolved by the 900 fourth�corner approach. Ecography, 40, 806-816. 901 Petchey, O. L. & Gaston, K. J. (2006). Functional diversity: back to basics and looking 902 forward. Ecology Letters, 9, 741-758. doi:10.1111/j.1461-0248.2006.00924.x 903 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 40 Podani, J. & Schmera, D. (2011). A new conceptual and methodological framework for 904 exploring and explaining pattern in presence – absence data. Oikos, 120, 1625-1638. 905 doi:10.1111/j.1600-0706.2011.19451.x 906 Podani, J. (2000). Introduction to the exploration of multivariate biological data. Backhuys, 907 Leiden, NL. 908 R Core Team (2019). R: A language and environment for statistical computing. R Foundation 909 for Statistical Computing, Vienna, Austria. https://www.R-project.org/. 910 Rao, C. R. (1982). Diversity and dissimilarity coefficients: a unified approach. Theoretical 911 Population Biology, 21, 24-43. 912 Ricotta C. & Burrascano S. (2008). Beta diversity for functional ecology. Preslia, 80, 61–71. 913 Ricotta, C. & G. Bacaro. (2010). On plot-to-plot dissimilarity measures based on species 914 functional traits. Community Ecology, 11, 113–119. 915 Ricotta, C. & J. Podani. (2017). On some properties of the Bray-Curtis dissimilarity and their 916 ecological meaning. Ecological Complexity, 31, 201–205. 917 Ricotta, C. & Pavoine, S. (2015). Measuring similarity among plots including similarity 918 among species: an extension of traditional approaches. Journal of Vegetation Science, 26, 919 1061-1067. doi:10.1111/jvs.12329 920 Ricotta, C. (2017). Of beta diversity, variance, evenness, and dissimilarity. Ecology and 921 Evolution 7, 4835– 4843. https://doi.org/10.1002/ece3.2980 922 Ricotta, C. (2018). A family of (dis)similarity measures based on evenness and its relationship 923 with beta diversity. Ecological Complexity, 34, 69-73. DOI: 10.1016/j.ecocom.2018.03.002 924 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 41 Ricotta, C., Bacaro, G., Caccianiga, M., Cerabolini, B.E.L. & Moretti, M. (2015). A classical 925 measure of phylogenetic dissimilarity and its relationship with beta diversity. Basic and 926 Applied Ecology 16(1), 10-18. https://doi.org/10.1016/j.baae.2014.10.003 927 Ricotta, C., Podani, J., Pavoine, S. (2016). A family of functional dissimilarity measures for 928 presence and absence data. Ecology and Evolution, 6, 5383–5389. DOI: 10.1002/ece3.2214 929 Schmidt, T., Matias Rodrigues, J. & von Mering, C. (2017). A family of interaction-adjusted 930 indices of community similarity. ISME Journal 11, 791–807. 931 https://doi.org/10.1038/ismej.2016.139 932 Signorell, A. et mult. al. (2020). DescTools: Tools for descriptive statistics. R package version 933 0.99.38. 934 Strobl, C., Boulesteix, A.L., Kneib, T., Augustin, T. & Zeileis, A. (2008). Conditional 935 Variable Importance for Random Forests. BMC Bioinformatics, 9(307). 936 http://www.biomedcentral.com/1471-2105/9/307 937 Strobl, C., Boulesteix, A.L., Zeileis, A. & Hothorn, T. (2007). Bias in Random Forest 938 Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinformatics, 8, 939 25. http://www.biomedcentral.com/1471-2105/8/25 940 Swenson N. G., Anglada-Cordero P. & Barone J. A. (2011). Deterministic tropical tree 941 community turnover: evidence from patterns of functional beta diversity along an elevational 942 gradient. Proceedings of the Royal Society B, 278, 877–884. 943 Swenson, N. G. (2011). Phylogenetic Beta Diversity Metrics, Trait Evolution and Inferring 944 the Functional Beta Diversity of Communities. PLoS ONE 6(6), e21264. 945 https://doi.org/10.1371/journal.pone.0021264 946 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 42 Tamás, J., Podani, J. & Csontos, P. (2001). An extension of presence/absence coefficients to 947 abundance data: a new look at absence. Journal of Vegetation Science, 12, 401-410. 948 doi:10.2307/3236854 949 Tuomisto, H. (2010a). A diversity of beta diversities: straightening up a concept gone awry. 950 Part 1. Defining beta diversity as a function of alpha and gamma diversity. Ecography, 33, 2-951 22. doi:10.1111/j.1600-0587.2009.05880.x 952 Tuomisto, H. (2010b). A diversity of beta diversities: straightening up a concept gone awry. 953 Part 2. Quantifying beta diversity and related phenomena. Ecography, 33, 23-45. 954 doi:10.1111/j.1600-0587.2009.06148.x 955 Villéger, S., Mason, N.W.H. & Mouillot, D. (2008). New multidimensional functional 956 diversity indices for a multifaceted framework in functional ecology. Ecology, 89, 2290-2301. 957 doi:10.1890/07-1206.1 958 Violle, C., Navas, M.�L., Vile, D., Kazakou, E., Fortunel, C., Hummel, I. & Garnier, E. 959 (2007). Let the concept of trait be functional! Oikos, 116, 882-892. doi:10.1111/j.0030-960 1299.2007.15559.x 961 Whittaker, R. H. (1960). Vegetation of the Siskiyou Mountains, Oregon and California. 962 Ecological Monographs, 30, 280–338. 963 Whittaker, R. H. (1972). Evolution and measurement of species diversity. Taxon, 21, 213-964 251.doi:10.2307/1218190 965 Zelený, D. (2018). Which results of the standard test for community weighted mean approach 966 are too optimistic? Journal of Vegetation Science 29, 953-966. 967 968 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 43 Tables and Figures 969 970 Table 1. Similarity and dissimilarity forms of resemblance indices for presence-absence data 971 972 Name of the index Similarity version Dissimilarity version Sørensen �� � 2�2� � � � � � � �� � ��� 2⁄ �� � � � � 2� � � � � � � � � �� � �� Ochiai �� � ���� � ���� � �� � � ����� �� � � � � ����� Kulczynski �� � 12 � � � � � � � � � � � � 2 1 ��⁄ � 1 ��⁄ �⁄ �� � 1 2 ! � � � � � � � � �" � 1 2 # � �� � � ��$ Simpson ��� � �� � min��, �� � � min ��, ��� ��� � � � � ()*���, ��� Jaccard �� � �� � � � � � � ��� �� � � � � � � � � � � � � � ��� Sokal & Sneath ��� � �� � 2�� � �� ��� � 2�� � �� � � 2�� � �� 973 974 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 44 Table 2. Classification of trait-based dissimilarity indices. In columns of input data type X-es indicate, if abundance (A), relative abundance (R), 975 and presence-absence data can be used as input. 976 Class Approach Family References Input Data tpye R function A R P/A Summary-based Typical value CWM-based Ricotta et al. (2015) X X X FD:::functcomp Distribution- based CDF-based Appendix S3 X X X our new functions, see Data availability Direct dissimilarity Probabilistic DISC/DQ Rao 1982, Pavoine & Ricotta (2014) X X X adiv::SQ dsimcom Pavoine & Ricotta (2014) X X X adiv:::dsimcom Ordinariness- based dissABC Pavoine & Ricotta (2015) X X X adiv:::dissABC generalized_Tradidiss Pavoine & Ricotta (2019) X X adiv:::generalized_Tradidiss Diversity multiplicative beta Chao et al. (2012) X our new functions, see Data (w h ich w a s n o t ce rtifie d b y p e e r re vie w ) is th e a u th o r/fu n d e r. A ll rig h ts re se rve d . N o re u se a llo w e d w ith o u t p e rm issio n . T h e co p yrig h t h o ld e r fo r th is p re p rin t th is ve rsio n p o ste d Ja n u a ry 8 , 2 0 2 1 . ; h ttp s://d o i.o rg /1 0 .1 1 0 1 /2 0 2 1 .0 1 .0 6 .4 2 5 5 6 0 d o i: b io R xiv p re p rin t https://doi.org/10.1101/2021.01.06.425560 45 partitioning availability Nearest neighbour DCW, DCW(Q) Clarke & Warwick (1998), Ricotta & Bacaro (2010) X X our new functions, see Data availability DIP Izsák & Prince (2001), Ricotta & Bacaro (2010) X X our new functions, see Data availability PADDis Ricotta et al. (2016) X adiv:::PADDis Classification- based not discussed not discussed Hérault & Honnay (2007), Nipperess et al. (2010), Cardoso et al. (2014), Pavoine (2016) 977 (w h ich w a s n o t ce rtifie d b y p e e r re vie w ) is th e a u th o r/fu n d e r. A ll rig h ts re se rve d . N o re u se a llo w e d w ith o u t p e rm issio n . T h e co p yrig h t h o ld e r fo r th is p re p rin t th is ve rsio n p o ste d Ja n u a ry 8 , 2 0 2 1 . ; h ttp s://d o i.o rg /1 0 .1 1 0 1 /2 0 2 1 .0 1 .0 6 .4 2 5 5 6 0 d o i: b io R xiv p re p rin t https://doi.org/10.1101/2021.01.06.425560 46 Table 3. Kendall tau correlations between environmental distance and the functional 978 dissimilarity measures at different values of sigma and with abundance data type 979 Sigma=0.01 Sigma=0.1 Sigma=0.25 Sigma=0.5 Sigma=1 Sigma=5 CWMdis 0.974 0.905 0.846 0.828 0.649 0.251 CDFdis 0.974 0.904 0.845 0.83 0.646 0.255 D(Q) 0.974 0.912 0.832 0.828 0.637 0.243 dsimcom.SS 0.974 0.911 0.832 0.829 0.638 0.243 dsimcom.Jac 0.974 0.911 0.832 0.829 0.638 0.243 dsimcom.Sor 0.974 0.911 0.832 0.829 0.638 0.243 dsimcom.Och 0.974 0.911 0.832 0.829 0.639 0.243 dsimcom.Beta 0.974 0.911 0.832 0.829 0.638 0.243 dissABC.Jac 0.967 0.899 0.82 0.813 0.617 0.243 dissABC.Sor 0.967 0.899 0.82 0.813 0.617 0.243 dissABC.SS 0.967 0.899 0.82 0.813 0.617 0.243 dissABC.Och 0.968 0.899 0.819 0.814 0.618 0.243 dissABC.Kul 0.968 0.898 0.819 0.814 0.619 0.243 dissABC.Si 0.954 0.867 0.789 0.791 0.616 0.243 Tradidiss.GC.even 0.974 0.908 0.816 0.829 0.626 0.245 Tradidiss.MS.even 0.974 0.908 0.814 0.828 0.623 0.243 Tradidiss.PE.even 0.974 0.907 0.828 0.831 0.639 0.25 Tradidiss.GC.uneven 0.967 0.901 0.827 0.815 0.622 0.244 Tradidiss.MS.uneven 0.966 0.899 0.823 0.813 0.618 0.243 Tradidiss.PE.uneven 0.969 0.905 0.837 0.821 0.637 0.249 βturnover 0.974 0.911 0.837 0.829 0.641 0.251 βheterogeneity 0.974 0.911 0.837 0.829 0.641 0.251 βsegregation 0.974 0.911 0.837 0.829 0.641 0.251 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 47 DIP 0.923 0.778 0.68 0.565 0.338 0.034 DCW 0.923 0.778 0.68 0.565 0.338 0.034 Bray-Curtis (species-based) 0.711 0.832 0.778 0.678 0.455 0.086 980 981 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 48 Table 4. Kendall tau correlations between environmental distance and the functional 982 dissimilarity measures at different values of sigma and with presence/absence data type 983 Sigma=0.01 Sigma=0.1 Sigma=0.25 Sigma=0.5 Sigma=1 Sigma=5 CWMdis 0.944 0.818 0.691 0.568 0.353 0.014 CDFdis 0.943 0.820 0.700 0.594 0.306 -0.001 D(Q) 0.941 0.818 0.701 0.59 0.313 -0.003 dsimcom.SS 0.944 0.821 0.705 0.593 0.316 -0.003 dsimcom.Jac 0.944 0.821 0.705 0.593 0.316 -0.003 dsimcom.Sor 0.944 0.821 0.705 0.593 0.316 -0.003 dsimcom.Och 0.944 0.822 0.707 0.592 0.323 0.000 dsimcom.Beta 0.944 0.821 0.705 0.593 0.316 -0.003 dissABC.Jac 0.946 0.819 0.704 0.592 0.292 -0.006 dissABC.Sor 0.946 0.819 0.704 0.592 0.292 -0.006 dissABC.SS 0.946 0.819 0.704 0.592 0.292 -0.006 dissABC.Och 0.946 0.819 0.704 0.591 0.293 -0.006 dissABC.Kul 0.946 0.819 0.704 0.591 0.294 -0.006 dissABC.Si 0.939 0.835 0.701 0.556 0.324 0.019 Tradidiss.GC.even 0.945 0.820 0.707 0.593 0.305 -0.005 Tradidiss.MS.even 0.945 0.819 0.707 0.592 0.304 -0.005 Tradidiss.PE.even 0.947 0.821 0.704 0.595 0.323 -0.002 Tradidiss.GC.uneven 0.946 0.819 0.702 0.592 0.308 -0.006 Tradidiss.MS.uneven 0.946 0.819 0.703 0.592 0.307 -0.006 Tradidiss.PE.uneven 0.947 0.820 0.698 0.593 0.326 -0.003 βturnover 0.943 0.817 0.699 0.585 0.331 -0.003 βheterogeneity 0.943 0.817 0.699 0.585 0.331 -0.003 βsegregation 0.943 0.817 0.699 0.585 0.331 -0.003 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 49 DIP 0.905 0.696 0.597 0.435 0.158 -0.017 DCW 0.904 0.694 0.593 0.431 0.160 -0.017 PADDis.Jac 0.904 0.679 0.575 0.418 0.154 -0.025 PADDis.Sor 0.905 0.696 0.597 0.435 0.158 -0.017 PADDis.SS 0.902 0.662 0.546 0.396 0.144 -0.034 PADDis.Och 0.904 0.697 0.596 0.436 0.159 -0.017 PADDis.Simp 0.881 0.620 0.474 0.343 0.158 0.030 PADDis.Kul 0.904 0.694 0.593 0.431 0.160 -0.017 Sørensen (species-based) 0.698 0.724 0.606 0.415 0.127 0.048 984 985 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560 50 Figure 1. Heat maps showing the interactive effects of niche width (sigma), transformation of 986 between-species dissimilarities (lin = linear, exp = exponential), data type (ABUND = 987 abundance, P/A = presence/absence), and dissimilarity index (1 – CWMdis, 2 – CDFdis, 3 – 988 DQ, 4 – dsimcom/Sørensen, 5 – dissABC/Sørensen, 6 – generalized_Tradidiss/generalized 989 Canberra, uneven weighting, 7 – βturnover, 8 – DCW) on the correlation with environmental 990 distance 991 992 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.06.425560doi: bioRxiv preprint https://doi.org/10.1101/2021.01.06.425560