key: cord-0448166-rz0pj5a3
authors: Dodds, P. S.; Minot, J. R.; Arnold, M. V.; Alshaabi, T.; Adams, J. L.; Dewhurst, D. R.; Reagan, A. J.; Danforth, C. M.
title: Probability-turbulence divergence: A tunable allotaxonometric instrument for comparing heavy-tailed categorical distributions
date: 2020-08-30
journal: nan
DOI: nan
sha: 758a4c6c55b8d21ae91f61482fe185824dbec7b6
doc_id: 448166
cord_uid: rz0pj5a3

Real-world complex systems often comprise many distinct types of elements as well as many more types of networked interactions between elements. When the relative abundances of types can be measured well, we further observe heavy-tailed categorical distributions for type frequencies. For the comparison of type frequency distributions of two systems or a system with itself at different time points in time -- a facet of allotaxonometry -- a great range of probability divergences are available. Here, we introduce and explore `probability-turbulence divergence', a tunable, straightforward, and interpretable instrument for comparing normalizable categorical frequency distributions. We model probability-turbulence divergence (PTD) after rank-turbulence divergence (RTD). While probability-turbulence divergence is more limited in application than rank-turbulence divergence, it is more sensitive to changes in type frequency. We build allotaxonographs to display probability turbulence, incorporating a way to visually accommodate zero probabilities for `exclusive types' which are types that appear in only one system. We explore comparisons of example distributions taken from literature, social media, and ecology. We show how probability-turbulence divergence either explicitly or functionally generalizes many existing kinds of distances and measures, including, as special cases, $L^{(p)}$ norms, the S{o}rensen-Dice coefficient (the $F_1$ statistic), and the Hellinger distance. We discuss similarities with the generalized entropies of R{'e}nyi and Tsallis, and the diversity indices (or Hill numbers) from ecology. We close with thoughts on open problems concerning the optimization of the tuning of rank- and probability-turbulence divergence.

Driven by an interest in developing allotaxonometry [1] -the detailed comparison of any two complex systems comprising many types of elements-we find ourselves needing to compare Zipf distributions: Heavytailed categorical distributions of type frequencies [2] [3] [4] [5] . We take a relaxed definition of what a heavy tail means for a distribution: A slow decay over orders of magnitude in type rank. Though not required, power-law decay tails are emblematic signatures of heavy-tailed distributions commonly presented by complex systems [6] [7] [8] [9] [10] , both observed and theoretical, and provide important examples to contemplate in our efforts to develop a comparison tool.

Across fields, efforts to measure and explain how two probability distributions differ have led to the development of a great many probability divergences [11] [12] [13] [14] . Divergences have been constructed for a host of motivations quite apart from our focus here on allotaxonometry, with example families scaffolded around L p -norms, inner products, and information-theoretic structures. As we will discuss, for heavy-tailed distribution comparisons * peter.dodds@uvm.edu which exhibit variable 'probability turbulence' [1, 15] , we find these divergences lack appropriate adaptability.

Here, we introduce a tunable, interpretable instrument that we call probability-turbulence divergence (PTD) along with related allotaxonographs-visualizations which show in detail how two categorical distributions differ according to a given measure. We refer the reader to Ref. [1] for our motivation for creating allotaxonometry and allotaxonographs, the notion of rank turbulence, and a detailed justification for the form of rankturbulence divergence (RTD) we developed there. We establish probability-turbulence divergence using largely the same arguments. We will therefore be concise in our presentation and expand only when the probability version's behavior departs from that of its rank counterpart.

In Sec. II, we formally define probability-turbulence divergence. We describe the divergence's general analytic behavior as a function of its single parameter, α, and we determine its form for the two limits of the parameter, α=0 and α=∞. When α=0, in particular, we find an interesting departure from the equivalent tuning for rank-turbulence divergence, and which we will later connect to the Sørensen-Dice coefficient [16, 17] and the F 1 score [18] .

In Sec. III, we then provide realizations of probabilityturbulence divergence as an instrument through example allotaxongraphs. For three disparate examples, we con-Typeset by REVT E X arXiv:2008.13078v1 [physics.soc-ph] 30 Aug 2020 sider 1. Frequency of n-gram use in Jane's Austen's Pride and Prejudice, 2. Frequency of n-gram use on Twitter, and 3. Tree species abundance [19] . We show how for the kinds of heavy-tailed distributions we are interested in that a probability-turbulence divergence histogram can be constructed to accommodate both a logarithmic scale and the presence of zero probabilities. Similar to rankturbulence divergence histograms, these graphs clearly show whether or not a probability-based divergence is a suitable choice for any given comparison. To feature the tunability of probability-turbulence divergence, we present flipbooks as part of our Online Appendices (compstorylab.org/allotaxonometry/) for allotaxonometry.

In Sec. IV, we show that probability-turbulence divergence is either a generalization of or may be connected to a number of other kinds of divergences and similarities (e.g., the Sørensen-Dice coefficient), and then discuss limited functional similarities with the Rényi entropy and diversity indices [20] [21] [22] [23] [24] .

We outline data and allotaxonometry code in Sec. V and offer some concluding thoughts in Sec. VI.

We aim to compare two systems Ω 1 and Ω 2 for both of which we have a list of component types and their probabilities. For simplicity, we will speak of probability, acknowledging that relative frequency, rate of usage, or some other term may be the more appropriate descriptor for a given system. We denote a type by τ and its probability in the two systems as p τ,1 and p τ,2 . We represent the probability distributions for the two systems as P 2 and P 1 . We call types that are present in one system only 'exclusive types'. In general, we will use expressions of the form Ω (1) -exclusive and Ω (2) -exclusive to indicate to which system a type solely belongs.

In general, we are interested in divergences that are some function of a sum of contributions by type. Here, we will consider a single parameter family of divergences that are of the simplest form, i.e., a direct sum of contributions:

By the rank-ordered set R 1,2;α , we indicate the union of all types from both systems, sequenced such that the contributions δD P α,τ (P 1 P 2 ) are monotonically decreasing (hence the necessity of an α subscript). We impose this order for general good housekeeping, secondarily allowing us to handle possibilities such as truncated summations due to sampling, or convergence issues for theoretical examples.

Following on from Ref. [1] , we define probabilityturbulence divergence as:

. (2) where the parameter α may be tuned from 0 to ∞ and N P 1,2;α is a normalization factor. Per Ref. [1] and below in Sec. II B, the roles of the prefactor (α + 1)/α and the power 1/(α + 1) are to govern the behavior of PTD in the limit α → 0.

By construction and regardless of the choice of normalization factor, we can see from Eq. (2) that probabilityturbulence divergence will equal 0 when both distributions are the same. (We show below that for α = 0, distinct distributions can also register D P α (P 1 P 2 ) = 0.) The core of Eq. (2) is the absolute value of the difference of each type τ 's probability raised to the power of α:

This α-tuned quantity controls the order of contributions by types to the overall value of PTD. As α → 0, lower probabilities-corresponding to the rare types-are relatively accentuated. For α → ∞, the higher of the two probabilities will dominate (unless they are equal), meaning the most common types will come to the fore.

As for rank-turbulence divergence, we choose N P 1,2;α so that when the two systems are entirely disjoint-that is, they share no types-then probability-turbulence divergence maximizes at 1. The normalization is thus specific to the two distributions being compared. We imagine that the types in each system have an extra descriptor specifying belonging to Ω (1) or Ω (2) . With no matching types, the probability of a type present in one system is zero in the other, and the sum can be split between the two systems' types:

where R 1 and R 2 are the Zipf ordered sets of types for each system. We can more compactly express the normalization as:

where, as for the definition of probability-turbulence divergence in Eq. (2), the sum for the normalization is again over the ordered set R 1,2;α . Types that appear in both systems will have their contribution [ p τ,1 ] α/(α+1) and [ p τ,2 ] α/(α+1) counted appropriately.

B. Limit of α=0 for probability-turbulence divergence

The limit of α=0 requires some care and will vary from the equivalent limit for rank-turbulence divergence [1] . First, at the level of individual type contribution, if both p τ,1 > 0 and p τ,2 > 0 then

If instead a type τ is exclusive to one system, meaning either p τ,1 = 0 or p τ,2 = 0, then the limit diverges as 1/α, which would seem problematic. We nevertheless will arrive at a well-behaved divergence through the normalization term N P 1,2;0 . Requiring as we have that the extreme of disjoint systems have a divergence of 1, We observe that each of the types in the case of disjoint systems would contribute 1/α. Therefore, in the α → 0 limit, we must have:

Because the normalization also diverges as 1/α, the divergence will be zero when there are no exclusive types and non-zero when there are exclusive types. We can combine these cases into a single expression:

The term δ pτ,1,0 + δ 0,pτ,2 returns 1 if either p τ,1 = 0 or p τ,2 = 0, and 0 otherwise when both p τ,1 > 0 and p τ,2 > 0. (By construction, we cannot have p τ,1 = p τ,2 = 0 as each type must be present in either one or both systems. ) We see then that D P 0 (P 1 P 2 ) is the ratio of types that are exclusive to one system relative to the total possible such types, N 1 + N 2 . If and only if all types appear in both systems with whatever variation in probabilities, then D P 0 (P 1 P 2 ) = 0. The limit of α = 0 therefore exhibits special behavior as for α > 0, probabilityturbulence divergence only scores 0 for exactly matching distributions.

C. Type contribution ordering for the limit of α=0

In terms of contribution to the divergence score, all exclusive types supply a weight of 1/(N 1 + N 2 ). We can order them by preserving their ordering as α → 0, which amounts to ordering by descending probability in the system in which they appear.

And while types that appear in both systems make no contribution to D P 0 (P 1 P 2 ), we can still order them according to the log ratio of their probabilities, Eq. (6) .

The overall ordering of types by divergence contribution for α=0 is then: (1) exclusive types by descending probability and then (2) types appearing in both systems by descending log ratio.

D. Limit of α=∞ for probability-turbulence divergence

The α → ∞ limit is straightforward and in line with that of rank-turbulence divergence [1] :

where the normalization from Eq. (5) has become

The dominant contributions to probability-turbulence divergence in the α → ∞ limit therefore come from the most common types in each systems, providing they are not equally abundant.

We assert that the successful use of our rankand probability-turbulence divergences is best achieved through consideration of rich graphical representations by what we have called allotaxonographs. In this section, we present and describe three sets of allotaxongraphs comparing probability distributions using probabilityturbulence divergence for:

• Normalized usage frequencies of 2-grams in the first and second halves of Jane Austen's Pride and Prejudice [29] for α=3/4, 0, and ∞ (Figs. 1, 2, and 3);

• Normalized usage frequencies of n-grams in all English-identified tweets on 2020/03/12 and 2020/05/30 for n = 1, 2, and 3 and for α = 1/4, 3/4, and ∞ (Figs. 4, 5, and 6);

• Relative abundances of tree species on Barro Colorado Island for five year census concluding in 1985 and 2015 (Fig. 7) .

The Pride and Prejudice examples show how PTD may be adjusted to fit well (α=3/4) or poorly (α=0 and ∞). We bin all non-zero probability pairs (log 10 pτ,1, log 10 pτ,2) in logarithmic space. Colors indicate counts of 2-grams per cell, and we highlight example 2-grams along the edges of the histogram. For pairs where one of the probabilities is zero, we add a separate rectangular panel along the bottom of each axis (lighter gray and lighter blue). Contour lines indicate where probability-turbulence divergence is constant (the jump to the zero probability region necessitates a break in smoothness). Based on the histogram, we choose α=3/4 to engineer an approximate fit to the histogram's periphery. The gray scale for 2-grams is indexed by their percentage contribution to probability-turbulence divergence, δD P 3/4,τ , showing a mixture of rare and common 2-grams. Ranked list on the right: We order the most salient 2-grams according to their overall contribution δD P 3/4,τ which we mark by bar length. We show the rank pair for each 2-gram in light gray opposite each 2-gram. Corresponding Flipbook: Flipbooks S1, S2, and S3 in the paper's Online Appendices (compstorylab.org/allotaxonometry/), show how the instrument changes for the same comparison with α being tuned from 0 to ∞ for 1-, 2-, and 3-grams. See Ref. [1] for a general introduction and motivation for allotaxonometry and allotaxonographs in the context of rank-turbulence divergence. The examples for 2-grams and 3-grams can also be seen as demonstrations of possible comparisons of features of complex networks and systems (e.g, 2-grams in text as directed edges).

As for rank-turbulence divergence [1] but with some key modifications, our allotaxonographs for probabilityturbulence divergence pair two complimentary visualizations: A map-like histogram and a ranked list.

In isolation, both the histogram and the ranked list have important but limited descriptive power. The histogram helps us see how well our choice of α performs, information that is entirely lost by the ranking process. And the ranked list would be difficult to intuit from the histogram alone. Many aspects of our allotaxonographs are configurable. On Gitlab, we provide our universal code for generating allotaxonographs for rank-turbulence divergence, probability-turbulence divergence, and other probability divergences (see Sec. V B).

In the paper's Online Appendices (compstorylab.org/allotaxonometry/), we complement all of our allotaxonographs with PDF flipbooks which move systematically through a range of α values. Compare with the good fit afforded by α=3/4 for the allotaxonograph in Fig. 1 . The contour lines for α=0 do not conform well to the histogram. At this extreme of the parameter's range, probability-turbulence divergence elevates exclusive types above all types that appear in both systems, and the ranked list on the right comprises only system-exclusive 2-grams. Per Sec. II B and Eq. (8), exclusive types each equally contribute 1 (N 1 +N 2 ) to D P 0 while types appearing in both systems have zero weight. The ordering of 2-grams is determined by maintaining their contribution order as α approaches 0. We force the contour lines in the main body of the histogram to remain equally spaced, even as they all represent 0 in the α → 0 limit. See Sec. IV for the connection between D P 0 and the Sørensen-Dice coefficient [16, 17, 25] and the F1 score [18, 26] . See also Flipbooks S1, S2, and S3 in the paper's Online Appendices (compstorylab.org/allotaxonometry/).

For a primary, familiar example to help us explain our probability-turbulence divergence allotaxonographs, we compare the normalized usage frequency distributions of 2-gram usage between the first and second halves of Pride and Prejudice [29] . We note that we are mostly intent here on showing how allotaxonographs function. We are not attempting to reveal astonishing insights into one of the most well regarded and well studied novels of all time.

In Fig. 1 , we show an allotaxonograph with α=3/4 which we will contend below provides a good fit.

First, the histogram on the left bins all pairs (log 10 p τ,1 , log 10 p τ,2 ). The form we see here is typical of comparisons between systems with heavy-tailed Zipf distributions. We rotate the axes so as not to privilege one system over the other, as this might lead to a false sense of an independent-dependent variable relationship [1, 30] . We indicate counts per cell using the perceptually uniform colormap magma [31] . Because of the logarithmic scale, the cells start to separate for lower values of probability, corresponding to counts of 0, 1, 2, and so on.

In general, types that are common in both systems will be located towards the top of histogram, those that are in both will be at the bottom, and those appearing more prevalently in one system will appear further away from the vertical midline. Exactly how much this latter category matters is a function of the divergence at hand.

The most extreme cells can be readily understood. The bottommost pair of cells represent all 2-grams that appear once in one half of Pride and Prejudice and zero The contour lines show constant values of δD P 3/4,τ . We provide a key for the contour lines in the upper right with the histogram removed. As a guide, the gray-scale for the annotations varies with each 2-gram's contribution to the overall divergence score, with darker meaning higher contribution. For α=3/4, 'Miss Bingley' stands out in particular, while 'irretrievable that' on the left is backgrounded.

For all of our allotaxonographs, we place 10 contour lines on each half of the histogram's diamond. These contour lines are evenly spaced not by height but are rather anchored to the bottom axes of the main histogram where they are evenly spaced in logarithmic space.

While it may appear that we have omitted annotations internal to the histogram for convenient purposes of visualizing the histogram more cleanly, our annotations are intentional. Because individual 2-grams internal to the histogram will never dominate standard divergences, highlighting them would be badly misleading [14, 32] . For our allotaxonographs, the annotations along the bottom of the histogram potentially fall into this trap: 'irretrievable that' and 'were during', each appearing once overall, are just two examples of tens of thousands of such 2-gram hapax legomena. Such types may matter in aggregate but not individually. Now, our allotaxonographs for probability-turbulence divergence must depart from those of rank-turbulence divergence because we have to accommodate instances of log 10 p when p = 0. For ranks, types with 0 counts in one system-exclusive types-are assigned a tied rank for last place, necessarily a finite number. Here, on a logarithmic scale our exclusive types would have to be located on one axis at log 10 p = −∞. We end the main histogram's domain for the lower value of log 10 p τ,i such that log 10 p τ,i > 0. We then add lighter colored regions to the bottom of both sides of the histogram, and locate log 10 p τ,i = 0 along their midlines. The transition is a discrete jump (we do not smoothly interpolate), and we connect the contour lines with a dotted line. As can be seen in Fig. 1 , the jump is increasingly clear for contour lines for lower values of δD P 3/4,τ , nearer the histogram's vertical midline.

The last piece for the histogram is the list of balances at the bottom right. These summary quantities are intended to be both informative and diagnostic, and they have important if subtle differences. The first bars show the balance of total 2-gram counts which for our example of Pride and Prejudice is 50/50 by construction. The second and third balances refer to sizes of lexicons, and these are well balanced too. If we create a lexicon of all 2-grams for Pride and Prejudice, about 58.3% of them appear in the first half and 58.4% in the second. If we instead create separate lexicons of 2-grams for the two halves of Pride and Prejudice, the third line of the balances records the percentage of 2-grams that are exclusive to each half.

While it could be said that we are ultimately creating a simple 2-d histogram for a joint probability distribution with heavy tails, to our knowledge, there have been relatively few other attempts to do so [14, 32] . As we have described them, we believe our histograms are crafted with a number of special details that make them well suited to their task.

The ranked list on the right maps the two dimensions of the histogram onto an ordered single dimension of divergence contributions, largest first. The left-right arrangement is solely done to be consistent with the histogramall contributions are positive. The light gray numbers opposite each 2-gram (e.g., 430 31 for 'Miss Bingley') indicate the 2-gram's Zipf rank in the first and second FIG. 5. Allotaxonograph using probability-turbulence divergence to compare normalized 2-gram usage ranks on two days of English-language Twitter, 2020/03/12 and 2020/05/30. Details are the same as for Fig. 1 and 4 . We see that the comparison of 2-gram distributions produces different, broader histogram that that formed by 1-gram distributions (Fig. 4) . We choose α=3/4 to provide a balance of 2-grams across five orders of magnitude for non-zero probability. In contrast to the 1-gram version, the top 2-grams are more evenly distributed on both sides of the list. While some 2-grams are function words combined with the 1-grams we saw in Fig. 4 , meaningful 2-grams also appear ('Tom Hanks','toilet paper', 'George Floyd', and 'police brutality'). See Flipbook 5 at the paper's Online Appendices (compstorylab.org/allotaxonometry/) for the instrument's variation as a function of α.

half of Pride and Prejudice (and in general, Ω (1) and Ω (2) ). We see dominant contributions from character names and references to characters (first half: 'her uncle') functional 2-grams (first half: 'she had', second half: 'in the'), and non-specific references to places and people (second half: 'the room', 'young ladies', 'young man');

In the ranked list, we add open triangles to types if they are exclusive to one system, corresponding to those appearing in the zero probability expansions of the histogram. For example, 'to Brighton' appears only in the first half of Pride and Prejudice, and 'the Parsonage' only in the second.

Finally, we see that the choice of α=3/4 generates a list with 2-grams from across the rare-to-common spectrum. The balanced darker shadings of annotations in the histogram add further support.

In reducing such a high dimensional categorical space-where each unique type represents a dimension-we have first collapsed the data to a 2-d histogram, and then to a 1-d list. Being able to find the shape in the histogram to which can apply an instance of probabilityturbulence divergence gives us some suggestive proof in the pudding. We see in both allotaxonographs that the contour lines do not match well with the edges of the histogram, in contrast to those realized by the choice of α=3/4. We also see how α=0 favors exclusive 2-grams (i.e., those that do not appear in either the first or second half of the novel), while α=∞ privileges 2-grams that are common 6 . Allotaxonograph using probability-turbulence divergence to compare 3-gram usage ranks on two days of English-language Twitter, 2020/03/12 and 2020/05/30. Details are the same as for Fig. 1 and 4 . The histogram has broadened even further out from the 2-gram, and now is well suited to probability-turbulence divergence with the extreme of α=∞, D P ∞ . Fragmentary and meaningful 3-grams appear alongside each other, including 'World Health Organization' and 'black lives matter'. Social amplification is also apparent as 3-grams for highly retweeted tweets dominate the rank list. See Flipbook 6 at the paper's Online Appendices (compstorylab.org/allotaxonometry/) for the instrument's variation as a function of α.

in one or both halves.

As we showed in Sec. II B, when α=0, the divergence contribution for all types that appear in both systems is δD P 0,τ = 0, while for all exclusive types, δD P 0,τ = 1 (N1+N2) (as reflected in the equal bars in the ranked list). Thus, the vertical contour lines in Fig. 2 , which again are present because we anchor them at evenly spaced locations along the bottom of the main histogram, all correspond to a divergence of δD P 0,τ =0. The dashed parts of the visible contour lines then collapse to the bottom zero point, showing how the exclusive types provide the only non-zero contributions to δD P 0,τ . Finally, in Flipbooks S1, S2, and S3 in the paper's Online Appendices (compstorylab.org/allotaxonometry/), we give the reader a view into how our allotaxonometric comparisons of the usage frequencies of 1-grams, 2-grams, and 3-grams in the two halves of Pride and Prejudice behave as a function of α. All of the flipbooks are especially informative in showing how contour lines and ranked lists change with α.

For our second set of allotaxonographs, we compare two key dates of two major events through the lens of English-speaking Twitter: 2020/03/12, the date that COVID-19 became the major story in the United States, and 2020/05/30, four days after the murder of George Floyd in Minneapolis, Minnesota by police officer Derek Chauvin. We compare day-scale normalized usage frequency distributions for 1-, 2-, and 3-grams for these two dates in Figs. 4, 5, and 6.

We choose 2020/03/12 as a key date for the COVID-19 pandemic for several reasons. First and primarily, the World Health Organization (WHO) officially declared the COVID-19 outbreak to be a pandemic on 2020/03/11, a decision that was amplified immediately online but discussion of which most strongly appeared in the news and on Twitter on the following day.

The date of 2020/03/11 also saw a confluence of three major events that jolted the United States and dramatically elevated the story of the pandemic, all occurring tightly around a 15 minute period between 9 and 10 pm EDT (2 am to 3 am UTC). First, the National Basketball Association (NBA) abruptly suspended its season. The central event was the abandoning of a game just before tipoff between the Utah Jazz and Oklahoma Thunder, upon the league learning that Rudy Gobert, a center for the Utah Jazz, had tested positive for COVID-19. Other players would test positive in the coming days and weeks, as would staff for teams and members of the media. Just a few days earlier, Gobert had joked with the media about his perception of institutional overreaction to the coronavirus, by touching microphones at an interview.

Second, Tom Hanks announced that both he and his wife Rita Wilson had tested positive for COVID-19 while Hanks was working on a Baz Lurhmann film in Australia. Hanks was at the time the most high profile figure known to have contracted COVID-19.

Third, President Donald Trump gave an Oval Office Address, the second of his presidency, "On the Coronavirus Pandemic." The address marked a strong shift in Trump's rhetoric regarding the danger of the COVID-19 outbreak. The main decision announced was the ban of travel from Europe to the US for 30 days, which was needed to be clarified later to not also mean a ban on trade. Futures on US stock market dropped during the speech.

Combined, these disparate events were a major part of the COVID-19 pandemic becoming the dominant story for what would become weeks and then months ahead.

The murder of George Floyd on 2020/05/25, Memorial Day in the US, precipitated Black Lives Matter protests and civilian-police confrontations in Minneapolis. The protests would grow over the following weeks, and begin to spread around the world. And, at least in the first week, George Floyd's murder overtook coronavirus as the dominant story in the US [33] .

With the above context in mind, we can sensibly examine the allotaxonographs of Figs. 4, 5, and 6

Our primary observation is that the three histograms vary considerably as we move through 1-, 2-, and 3grams. The histograms broaden with increasing n, with the 3-gram histogram losing a scaling form and squaring up in the axes. The rapidly growing combinatoric possibilities of n-grams with increasing n means that we see more and more exclusive n-grams as we look across the three allotaxonographs. For 1-grams, around 60% of each date's lexicon are exclusive, for 2-grams, the percentage increase to around 70%, and 3-grams we reach 80% (see the bottom of the three balance summaries in each allotaxonograph).

The maximum counts per cell is 10 6 for 1-grams, is 10 7 for 2-grams, and 10 8 for 3-grams. The cells with the most n-grams are of course the the hapax legeomenathe bottommost two cells in the histogram-those n-grams which appear once on one of the dates and not at all on the other.

To obtain good balance for the most dominant ngrams, we select α=1/4, 3/4, and ∞. Different kinds of terms dominate depending on n with 'coronavirus', 'the coronavirus', and 'tested positive for' leading on 2020/03/12, and 'Minneapolis' 'George Floyd' and 'of George Floyd' at the top on 2020/05/30.

Because social amplification is encoded in Twitter's data stream through retweets, dominant 2-grams and especially 3-grams are liable to belong to the most retweeted messages of the day, and may lead to some variation in the dominant n-grams. (By contrast, we do not have a measure of popularity of individuals phrases or sentences within Pride and Prejudice with just the bare text.) For example, 'toilet paper' and 'World Health Organization' appear as dominant 2-grams and 3-grams but none of their five distinct 1-grams are near the top of the ranked list in Fig. 4 . On the other hand, some dominant 1-grams may be used in diverse 2-grams and 3-grams and thus may not appear in the ranked lists for 2-grams and 3-grams. Examples from Fig. 4 are 'antifa' and 'Breonna'.

For all three n-gram comparisons of these two dates on Twitter, we provide Flipbooks 4, 5, and 6 at the paper's Online Appendices (compstorylab.org/allotaxonometry/). Readers may use these to easily explore how the choice of α affects the fit for the contour lines in the histogram and the ordering of which n-grams dominate probability-turbulence divergence.

We include one final allotaxonograph from an entirely different field of research, ecology. In Fig. 7 we show a probability-turbulence divergence allotaxonograph for tree species abundance in Barro Colorado Island for censuses completed in 1985 and 2015. This example also shows how allotaxonographs can be used to inspect how well divergence measures perform for data sets that are much smaller than our examples from literature and Twitter. The species that dominates the overall divergence score is one that has diminished in abundance, Piper cordulatum [34] [35] [36] [37] . In Ref. [1] , we compared these distributions with rank-turbulence divergence, and the overall orderings of dominant species are broadly consistent.

In Flipbook 7 at the paper's Online Appendices (compstorylab.org/allotaxonometry/), we show how the dominant contributions of species vary as a function of α. Allotaxonograph using probability-turbulence divergence to compare tropical forest tree species abundance on Panama's Barro Colorado Island (BCI) for 5 year censuses completed in 1985 and 2015 [19] . The choice of α=5/12 produces a set of dominant species reasonably well balanced across the abundance spectrum. See Ref. [1] for the corresponding rank-turbulence divergence allotaxonographs. See Flipbook 7 in the Supplementary Information for the instrument's variation as a function of α.

A. Links to existing probability-based divergences Probability-turbulence divergence shares some characteristics with other divergences (see Refs. [11] and [12] for two example compendia). In particular, we find known distances and similarities which correspond with or function similarly to probability-turbulence divergence for α=0, 1/2, 1, and ∞.

For α=0, probability-turbulence divergence partners the similarity measure Sørensen-Dice coefficient, S SD (P 1 P 2 ) [16, 17, 25] , which was independently developed in the context of ecology by Dice (1945) and Sørensen (1948) (see also Ref. [38] ). For two systems, the Sørensen-Dice coefficient is the number of shared types relative to the mean of the number of types in each system. Using our notation, and referring back to Eq. (8), we have:

where we are again summing over the union of types R 1,2;0 . The quantity 1 − δ pτ,1,0 − δ pτ,2,0 is 1 when a type appears in both systems and 0 otherwise.

The Sørensen-Dice coefficient has arisen in many settings, with different names. For example, in statistics, the Sørensen-Dice coefficient is the F 1 score of a test's accuracy [18, 26] .

Examples of divergences matching the internal structure of D P 1/2 include the Hellinger [39] , Mautusita distance [40] , and Squared-chord distance [12] .

In terms of the probability-turbulence divergence's internal structure of |[ p τ,1 ] α −[ p τ,2 ] α |, a large selection of divergences match up with the the α=1 instance. These include L (p) -norm type constructions of the form:

While the overall divergence values for these various divergences will differ, the rank orderings of the contributing types will be identical to that of D P 1 . Finally, in the α=∞ limit, D P ∞ agrees, up to a normalization factor of 2, with the Motyka distance [11] .

While none of these other divergences provide direct tunability of the type probability-a severe limitation, as we hope our examples have conveyed-there are well established quantities which do.

As we observed for rank-turbulence divergence in [1] , the parameter α's effect is similar to its counterparts in various kinds of generalized entropy [20] [21] [22] and, more directly, the diversity indices (or Hill numbers) from ecology [23, 24] .

Rényi entropy, α H, and the associated diversity index, α N , are defined as:

where α ≥ 0. We acknowledge that, at the risk of a minor dislocation from relevant literature, we have had to confront some notation peril here as a standard notation for the diversity index is α D. We have also already used N in our present paper but this choice tracks sensibly: As α → 0, we retrieve the natural logarithm of the number of distinct types N (species richness in ecology) for Rényi entropy, and therefore the diversity index is 0 N = N . As α → ∞, the most abundant type will dominate, with min-entropy the limit: ∞ N = min τ 1/p τ = 1/max τ p τ . In the α → 1 limit, we recover Shannon's entropy, H, as well as 1 N = e H1 = e H . There are similar aspects for probability-turbulence divergence and the diversity index in the limits of α=0 and ∞. For α=0, for example, both reduce to quantities involving simple counts of distinct types. Nevertheless, we note that we cannot construct probability-turbulence divergence from manipulations of Rényi entropy or the diversity index. We can, roughly speaking, only create a difference of sums whereas we need a sum of absolute differences with suitable exponents.

Pride and Prejudice: We sourced a plain text version of Jane Austen's Pride and Prejudice from Project Gutenberg (http://www.gutenberg.org/ebooks/1342).

Normalized n-gram usage frequency on Twitter: We collected around 10% of all tweets sent on these dates based on Coordinated Universal Time (UTC) meaning they covered 4:00:00 am to 3:59:59 am Eastern Daylight Time (EDT) and 7:00:00 am to 6:59:59 am Pacific Daylight Time (PDT). For the eastern and central time zone's in the United States, especially, this shift provides a better, more functional coverage of people's activity. We provide historical access to the top 10 6 1-grams, 2-grams, and 3-grams across more than 100 languages as part of our Storywrangler for Twitter project [28, 41] .

Species abundance on Barro Colorado Island: We accessed the dataset for BCI censuses performed roughly every 5 years over 35 years through the online repository described in Ref. [19] .

All scripts and documentation reside on Gitlab: https://gitlab.com/compstorylab/allotaxonometer.

For the present paper, we wrote the scripts to generate the allotaxonographs in MATLAB (Laboratory of the Matrix). We produced all figures and flipbooks using MATLAB Version R2020a. We welcome ports to other languages.

As is, the core script is highly configurable and can be used to create a range of allotaxonographs as well as simple unlabeled rank-rank and probabilityprobability histograms.

Instruments accommodated by the script include rank-turbulence divergence [1] , probability-turbulence divergence, and generalized symmetric entropy divergence which includes Jensen-Shannon divergence as a special case.

We have defined, analyzed, and demonstrated the use of probability-turbulence divergence as an instrument of allotaxonometry. As the probability-based analog of our rank-turbulence divergence, the instrument is able to perform well when comparing heavy-tailed Zipf distributions of type frequencies. We have shown further that probability-turbulence divergence generalizes a range of existing probability-based divergences, either in matching in exact form or equating in how types are ordered by type contribution.

While we view rank-turbulence divergence as our most general, interpretable instrument, for systems in which probabilities (or rates) of types occurring are well defined, and the resulting distributions involved are heavy-tailed, probability-turbulence divergence provides a more nuanced instrument.

We also favor divergences which compare distributions in as transparent a way as possible. To that end, we have made the core of probability-turbulence divergence a simple difference of powers of probabilities (Eq. (3)). By contrast, we view some divergences as being problematic in being overly constructed. We venture that Jensen-Shannon divergence (JSD), which we ourselves have used elsewhere, is one such instrument. The creation of an artificial mixed distribution is a contrivance we avoid here, and is perhaps indicative of taking information theory too far [42, 43] .

In our experience, we have also found that the visual information delivered by our allotaxonographs, especially in their coupling of histograms and ranked lists, has been essential to working effectively with divergences of all kinds.

One caution we make is that in the examples we have explored in the present paper, we have taken distributions as they are. That is, we have not contended with issues of sub-sampling and missing tail data [44] . We can say that types appearing with high rate (e.g., common n-grams on Twitter) will not be affected by accessing more data, as they are well estimated rates. In our paper on rank-turbulence divergence, we examined how truncation of distributions affects allotaxonographs, and such an approach is always available for any divergence.

Finally, in our present paper and in Ref. [1] , we have so far made choices of α based on inspection of the relevant histogram. A clear next step is to find ways to determine an optimal α for any given pair of distributions, and to do so only when sufficiently robust scaling is apparent. From a storytelling perspective, we are concerned with finding an α that returns a ranked list of distinguishing types for two distributions such that the list comprises a balance of types from across the full range of observed probabilities [45] . We have performed some preliminary work for such an optimization, and note here that simple regression is made difficult by the overwhelming weight of rare types relative to common ones.

Allotaxonometry and rank-turbulence divergence: A universal instrument for comparing complex systems

On a class of skew distribution functions

Power laws, Pareto distributions and Zipf's law

Powerlaw distributions in empirical data

Emergence of scaling in random networks

Complexity: A Guided Tour, complexity

Growth, innovation, scaling, and the pace of life in cities

Measuring Biological Diversity

The Mechanics of Earthquakes and Faulting

Dictionary of distances

Comprehensive survey on distance/similarity measures between probability density functions

Families of Alpha-Beta-and Gamma-divergences: Flexible and robust measures of similarities

Fightin' words: Lexical feature selection and evaluation for identifying the content of political conflict

Is language evolution grinding to a halt? The scaling of lexical turbulence in English fiction suggests it is not

Measures of the amount of ecologic association between species

A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons

Information retrieval

Complete data from the barro colorado 50-ha plot: 423617 trees, 35 years, v3, dataone dash, dataset

On measures of entropy and information

Nonextensive statistical mechanics and thermodynamics: Historical background and present status

Simpson diversity and the Shannon-Wiener index as special cases of a generalized entropy

Diversity and evenness: A unifying notation and its consequences

Entropy and diversity

Adaptation of Sørensen's k (1948) for estimating unit affinities in prairie vegetation

The truth of the f -measure

The growing amplification of social media: Measuring temporal and social contagion dynamics for over 150 languages on twitter for

Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter

Pride and prejudice

Why scatter plots suggest causality, and what we can do about it

Somewhere over the rainbow: An empirical assessment of quantitative colormaps

Scattertext: A browser-based tool for visualizing how corpora differ

Computational timeline reconstruction of the stories surrounding Trump: Story turbulence, narrative control, and collective chronopathy

The Piperaceae of Panama

The flora of Barro Colorado Island, Panama

Phenology of neotropical pepper plants (Piperaceae) and their association with their main dispersers, two short-tailed fruit bats, Carollia perspicillata and C. castanea (Phyllostomidae)

Hierarchical fruit selection by neotropical leaf-nosed bats (Chiroptera: Phyllostomidae)

Stigler's law of eponymy

Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen

Decision rules, based on the distance, for problems of fit, two samples, and estimation

(2020), storywrangling.org

The bandwagon

On the limitations of jensen-shannon divergence and its generalizations: Allotaxonographs and critique

Robust estimation of microbial diversity in theory and in practice

Text mixing shapes the anatomy of rank-frequency distributions

The authors are grateful for the computing resources provided by the Vermont Advanced Computing Core which was supported in part by NSF award No. OAC-1827314, financial support from the Massachusetts Mutual Life Insurance Company and Google Open Source under the Open-Source Complex Ecosystems And Networks (OCEAN) project.