key: cord-0110979-2ecfg11j authors: Nielbo, Kristoffer L.; Haestrup, Frida; Enevoldsen, Kenneth C.; Vahlstrup, Peter B.; Baglini, Rebekah B.; Roepstorff, Andreas title: When no news is bad news -- Detection of negative events from news media content date: 2021-02-12 journal: nan DOI: nan sha: 8695d64ba8beb88f4f39b4b206bffce7aaac1bd2 doc_id: 110979 cord_uid: 2ecfg11j During the first wave of Covid-19 information decoupling could be observed in the flow of news media content. The corollary of the content alignment within and between news sources experienced by readers (i.e., all news transformed into Corona-news), was that the novelty of news content went down as media focused monotonically on the pandemic event. This all-important Covid-19 news theme turned out to be quite persistent as the pandemic continued, resulting in the, from a news media's perspective, paradoxical situation where the same news was repeated over and over. This information phenomenon, where novelty decreases and persistence increases, has previously been used to track change in news media, but in this study we specifically test the claim that new information decoupling behavior of media can be used to reliably detect change in news media content originating in a negative event, using a Bayesian approach to change point detection. A peculiar behavior could be observed in news media when the first wave of Covid-19 virus spread across the world. In response to this pandemic event, the ordinary rate of change in news content was disrupted because every story became associated with Covid-19. On the one hand, content novelty went down, because every story became more similar to previous stories, but on the other hand, the Covid-19 association became more prevalent, resulting in, at least initially, an increase in content persistence. A recent study [1] argues that this behavior is an example of the news information decoupling (NID) principle, according to which information dynamics of news media are (initially) decoupled by temporally extended catastrophes such that the content novelty decreases as media focus monotonically on the catastrophic event, but the resonant property of said content increases as its continued relevance propagate throughout the news information system. The authors further argued NID can be used to detect significant change in news media that originate in catastrophic events. Previous studies have shown that variation in newspapers' word usage is sensitive to the dynamics of socio-cultural events [2, 3, 4] , can detect event-driven shifts [5] , and accurately can model effects of change in comprehensive collections of newspapers [6] . Furthermore, the associative structure of newspapers has been shown to accurately capture thematic development [7] , and, when modelled dynamically, is indicative of the evolution of cultural values and biases [3, 8] . Adaptive fractal analysis of word frequencies over time has been used to discriminate between different classes of catastrophic events that display class-specific fractal signatures in, among other things, word usage in newspapers [9] . Several studies have shown that information theoretical construct can be used to detect fundamental conceptual differences between distinct periods [2] , concurrent normative and ideological movements [10] , and even, development of ideational factors (e.g., creative expression) in temporally dependent writings [11, 12, 13] . More specifically, a set of methodologically related studies studies have applied windowed relative entropy to thematic text representations to generate signals that capture information novelty as a reliable content difference from the past and resonance as the degree to which future information conforms to said novelty [10, 11] . Three recent studies have found that successful social media content show a strong association between novelty and resonance [14] , that legacy news media under normal conditions display a remarkably similar medium to strong association between novelty and resonance across the political spectrum [1] , and, finally, that variation in the novelty-resonance association can predict significant change points in historical data [15] . This study specifically tests the claim of [1] that NID-like behavior can provide input for change point detection algorithms. Specifically, we propose to test the claim that two change points are observable in news media during the first phase of Covid-19, Lockdown and Opening respectively, using a Bayesian approach to change point detection. Figure 1 displays a prototypical example of NID during the first phase of Covid-19 [1] . Although Covid-19 news items date back to December 2019, 'W uhan', newspaper content is not impacted until the period after the first national outbreak (in this case in Denmark). 'V irus'. From the phase 1 lockdown 'Lockdown' to the opening, 'Opening', the newspaper shows a valley in novelty and, initially, a peak in resonance until both processes approximately return to normal after the opening. Class Table 1 : Estimated temporal change points at 94% high density intervals for novelty. Column one contains the name of the newspaper, columns two its class (Broadsheet or T abloid), NID Start and End is the beginning and end of the lockdown as represented in the newspaper, and the final column indicated if the specific source supported the NID principle. To validate the observed behavior, we tested for two change points in novelty using a Bayesian model, see Appendix A for methods. The first change point, 'N ID Start' should separate prelockdown from lockdown centered on week 11, and the second lockdown, 'N ID End' from post opening (centered on week 16). Table 1 shows the estimated change points for six national newspapers, two of which are T abloid newspapers (Class) and the remainder Broadsheet. From the model, it can be observed that all broadsheet newspapers seem to support the NID principle in novelty. The first change point is placed in weeks 10-11, the second, however, is more a matter of contention. The opening change point lies within April and displays a month's delayed response. Finally, it can be observed that tabloid press shows no indication of NID behavior. That novelty decreases during a catastrophic event is nevertheless only half the story. For NID to be supported by the data, resonance should increase during the lockdown such that the medium to strong association between novelty and resonance is momentarily weakened. Following [1] , we inspected the time-windowed linear fits of resonance on novelty, N × R, in order to confirm this, see figure 3. All broadsheet newspapers display a slope decrease during the lockdown, thereby conforming to the NID principle 3. Tabloids on the other hand, follow an inverse pattern, such that the N × R slope increases during the lockdown period. In conclusion, this study sought to validate the news information decoupling (NID) principle on a sample of six national Danish newspapers during the first phase of Covid-19. Using a Bayesian approach to change point detection, we found that content novelty in broadsheet newspapers does indeed display statistically reliable points of change during the Covid-19 lockdown. NID was further corroborated by the N × R pre slopes that indicated a decoupling of resonance from novelty during the lockdown. Several observations can be made from the findings. First, the estimated change points for the 'Pre-lockdown → Lockdown' are spread over a two week interval, which reflects that a lockdown could be reasonably predicted already from the first Covid-19 incident in Denmark. Second, in a similar vein the 'Lockdown → Opening' change points are spread over an entire month from April 8 to May 8. The Danish government during the period was center-left and model's uncertainty in determining the opening may reflect political observations [1] , where center-right newspapers (e.g., Berlingske and Jyllands-Postern) were more sceptical towards the government's implementation of an opening than the center-left (e.g., Politiken). In other words, the center-right might have been more reluctant to acknowledge the opening as a return to normal. Third, tabloid newspapers do not show any indication of a news decoupling, on the contrary, their N × R pre slopes momentarily increases during the lockdown . This increase in slopes does, however, not provide much useful information, because, as shown by the change point detection model, the periodization is not meaningful to the two tabloid newspapers. Validation of the NID principle is still needed for multilingual data and its value for crisis management should be further tested. For change detection, the scope of the principle needs additional testing; does NID generalize beyond a small set of negative events to, for instance, temporally extended significant events (e.g., moon landing, fall of the Berlin Wall). Finally, several comparisons already hinted that left vs. right-wing newspapers, tabloid vs. broadsheet newspapers, silly season and other seasonal effects, are interesting venues for media and journalism researchers. The data set consists of all linguistic content (title and body text) from front pages of six Danish national newspapers Berlingske, BT, Ekstrabladet, Jyllands-Posten, Kristligt Dagblad, and Politiken. The newspapers were sampled during December 1, 2019 to July 1 2020. Content not produced by the newspaper, e.g., advertisements, was excluded from the sample. In order to normalize linguistic content, numerals and highly frequent function words were removed, and the remaining data were lemmatized and casefolded. Subsequently, the data were represented as a bag-of-words (BoW) model using latent Dirichlet allocation in order to generate a dense low-rank representation of each article. Note that with a few modifications to equations (4) and (5), the approach works for any probabilistic or geometric vector-representation of documents. Novelty and resonance were estimated for in windows of one week (w = 7). Two related information signals were extracted from the temporally sorted BoW model: Novelty as an article s (j) 's reliable difference from past articles s (j−1) , s (j−2) , . . . , s (j−w) in window w: and resonance as the degree to which future articles s (j+1) , s (j+2) , . . . , s (j+w) conforms to article s (j) 's novelty: where T is the transience of s (j) : The novelty-resonance model was originally proposed in [10] , but here we propose a symmetrized and smooth version by using the Jensen-Shannon divergence (JSD): with M = 1 2 (s (j) + s (k) ) and D is the Kullback-Leibler divergence: Finally, in order to describe the information states before and after an events (e.g., Lockdown, Opening), we fit resonance on novelty to estimate the N × R slope β 1 in the specific time windows: For the estimation of change points, a Bayesian approach was used. Following previous considerations, we assume that the time series contains two change points, τ 1 and τ 2 . Aside from change points, the series is assumed to be stable and follow a normal distribution with varied mean, µ i , and singular variance, σ. This gives us the following model given the observed Novelty, N i : for which we wish to estimate the location of the change points τ i , means µ i and variance σ, i.e. the following posterior: P (µ i , σ, τ i |N t ) = P (µ 1 , µ 2 , µ 3 , σ, τ 1 , τ 2 |N t ) For estimation of the posterior, we have used NUTS sampling as implemented in pyMC3 [16] using 4000 samples. The estimation was done using using naive to slightly conservative priors assuming that the change points, τ i , can be anywhere in the sequence (with τ 2 > τ 1 ) and that the variance, σ, is stable across change points. Note that the half Cauchy prior distribution has series of beneficial properties [17, 18] including its fat tail which allows for extreme values. These assumptions were modelled using the following priors: All data are proprietary and have been collected through Infomedia's API: https://infomedia. dk/. For inquiries regarding models and derived data, please contact kln@cas.au.dk. The source code for methods is available on Github: https://bit.ly/3beahFd. More details on NID detection can be found at NeiC's NDHL website: https://bit.ly/3bfeW9C. News Information Decoupling: An Information Signature of Catastrophes in Legacy News Media The Measures of Modernity: The New Quantitative Metrics of Historical Change Over Time and Their Critical Interpretation The Eurocentric Fallacy. A Digital-Historical Approach to the Concepts of 'Modernity', 'Civilization' and 'Europe' (1840-1990), International Journal for History Workers of the World' ? A Digital Approach to Classify the International Scope of Belgian Socialist Newspapers Mining the Twentieth Century's History from the Time Magazine Corpus Quantifying "Pillarization": Extracting Political History from Large Databases of Digitized Media Collections Probabilistic topic decomposition of an eighteenth-century American newspaper Using Word Embeddings to Examine Gender Bias in Dutch Newspapers Culturomics meets random fractal theory: insights into long-range correlations of social and natural phenomena over the past two centuries Individuals, institutions, and innovation in the debates of the French Revolution Exploration and Exploitation of Victorian Science in Darwin's Reading Notebooks Automated Compositional Change Detection in Saxo Grammaticus' Gesta Danorum A curious case of entropic decay: Persistent complexity in textual cultural heritage Trend Reservoir Detection: Minimal Persistence and Resonant Behavior of Trends in Social Media Composition and Change in De Civitate Dei: A Case Study of Computationally Assisted Methods Probabilistic programming in python using pymc3 Prior distributions for variance parameters in hierarchical models On the Half-Cauchy Prior for a Global Scale Parameter This research was supported the "HOPE -How Democracies Cope with COVID-19"-project funded by The Carlsberg Foundation with grant CF20-0044, NeiC's Nordic Digital Humanities Laboratory project, and DeiC Type-1 HPC with project DeiC-AU1-L-000001. The authors would like to thank Berlingske Media, JP/Politkens Hus, and Kristeligt Dagblad for providing access to proprietary data.