key: cord-0255024-8lxs78vj authors: Urban, Federico R.; Camera, Stefano; Alonso, David title: Detecting ultra-high energy cosmic ray anisotropies through cross-correlations date: 2020-05-01 journal: nan DOI: 10.1051/0004-6361/202038459 sha: 24a0e2b261bf86d42a4aef6be8e978beedcd8374 doc_id: 255024 cord_uid: 8lxs78vj We propose an observable for ultra-high energy cosmic ray (UHECR) physics: the harmonic-space cross-correlation power spectrum between the arrival directions of UHECRs and the large-scale cosmic structure mapped by galaxies. This cross-correlation has not yet been considered in the literature, and it permits a direct theoretical modelling of the main astrophysical components. We describe the expected form of the cross-correlation and show how, if the distribution of UHECR sources trace the large-scale cosmic structure, it could be easier to detect with current data than the UHECR auto-correlation. Moreover, the cross-correlation is more sensitive to UHECR anisotropies on smaller angular scales, more robust to systematic uncertainties, and it could be used to determine the redshift distribution of UHECR sources, making it a valuable tool in determining their origins and properties. Ultra-high energy cosmic rays (UHECRs), impacting the atmosphere of the Earth with energies in excess of 1 EeV (10 18 eV), have remained a mystery since their discovery 59 years ago [1, 2] . We do not know what they are: observational data can not yet fully distinguish between several variants of pure and mixed primary compositions [3, 4] . We do not know where they come from: the astrophysical sources that generate and accelerate UHECRs have not been identified yet; the type of acceleration mechanism that is responsible for their formidable energies has not been discovered, either [5] . What we do know is that the highest energy rays are most likely extra-galactic. First, if UHECRs were produced within the Galaxy, their arrival directions in the sky would be very different from what we observe [6] [7] [8] . Second, barring a cosmic conspiracy that puts an end to the injection spectrum at that very energy, UHECR interactions with cosmological background photons produce a sharp cutoff (the Greisen-Zatsepin-Kuzmin limit) in the spectrum corresponding to ∼ 60 EeV [9, 10] , and a cutoff is indeed observed in the data [11, 12] . If the sources of UHECRs are extra-galactic, they most probably correlate with the large-scale distribution of matter (large-scale structure, or LSS). The interactions with the background cold photons limit UHECR propagation to roughly a few hundreds of Mpc (for a review, see Ref. [5] ). Therefore, the UHECR flux distribution in the sky should be to some extent anisotropic, since below 100 Mpc, roughly comparable with the scale of homogeneity expected in the standard cosmological model, LSS are anisotropic [13] [14] [15] . How the anisotropy of UHECR sources manifests itself in the observed flux on Earth then depends on the original anisotropy of the sources, the UHECR chemical composition, and the properties of intervening magnetic fields -Galactic (GMF) and extra-galactic (xGMF) -that deflect UHECRs and distort the original anisotropic patterns. Chemical composition and magnetic fields are degenerate when it comes to UHECRs deflections, since the latter depends on ZB/E, where Z is the atomic number, B the strength of the magnetic field, and E is the UHECR energy: doubling the field strength is equivalent to doubling the charge (or halving the energy). Chemical composition instead is the only factor that determines the UHECR propagation length at a given energy: different nuclei come from different portions of the Universe and carry different anisotropic imprints, but the relationship between the two is non-monotonic and non-trivial (see, e.g., [16, 17] ). To a large extent, the statistics of the anisotropies in the distribution of UHECRs can be characterized by the UHECR angular auto-correlation (AC), which, in harmonic space, takes the form of the angular power spectrum coefficients C . Here, the -th multipole quantifies the variance of the anisotropies on angular scales θ ∼ π/ [18, 19] (see Appendix A for further details). To date, the number of UHECRs collected at the highest energies is low -of the order of a hundred above the cutoff [2] . Because of this, the UHECR flux is dominated by Poisson statistics: the AC is mostly determined by shot noise, making the underlying correlation with the LSS very hard to detect. Indeed, the indications for anisotropy in the data are tenuous: save for a low-energy dipole [8] and a high-energy hot-spot [20] , the angular distribution of UHECR arrival directions appears to be nearly isotropic [21] . Moreover, no anisotropies have been detected at small scales 10 [22, 23] , although there are hints at intermediate scales [24] . In this work, we quantify the possibility of detecting the anisotropy in the UHECR flux through the harmonic-space power spectrum of the cross-correlation (XC) between UHECR counts and the distribution of galaxies. Such XC technique was previously proposed to study the anisotropy of the γ-ray sky by Refs. [25] [26] [27] , and proved successful for several tracers of the LSS [28] [29] [30] [31] . A search for a XC between UHECRs and high-energy photons was performed in [32] . If UHECR sources statistically trace the LSS, then the positions of these sources, and the arrival directions of UHECRs (if not strongly affected by intervening magnetic fields) should have a non-zero correlation with a galaxy sample up to a given distance. Therefore, the detection or non-detection of the XC signal with galaxies at different redshifts would allow us to test whether UHECR sources are distributed according to the LSS, and to quantify the extent to which the UHECR transfer function, determined by energy losses and intervening magnetic fields, does not depend on direction. There are at least three features that differentiate the XC from other methods (see for instance [33] and references therein). First, systematic uncertainties of different 'messengers', or observables, should not cross-correlate, and, under some conditions, statistical noise should also not strongly cross-correlate. This is because different experiments are different machines exploiting different physical effects. However, within a single data set, for instance the set of arrival directions of UHECRs, the AC of the noise and systematic errors for that set are certainly non-zero, and contribute to hiding any underlying 'true' signal. Examples of these systematics for UHECRs would be perturbations in the arrival directions due to deflection by the GMF 1 , or spatial fluctuations in the energy calibration giving rise to leakage from different energy bins 2 . Thus, in this sense the XC is an experimentally cleaner observable. Secondly, in the limit where the UHECR sources are numerous, but UHECR detections themselves are not, we can assume that we observe at most one UHECR per source (as seems to be the case given to the lack of obvious UHECR multiplets [34, 35] ). The much higher number of galaxies leads to a significant improvement in the signal-to-noise ratio of this cross-correlation (see the discussion in section 3). This effectively allows us to probe the anisotropies on smaller scales through the XC than the AC, underlying the importance of using both observables. There are several reasons why those smaller scales ( > 10) are interesting. First of all, the experimental angular resolution of UHECR events is around 1 • , which corresponds to ∼ 200: from an experimental perspective we are not fully taking advantage of the data we already have. Furthermore, small-scale power in the LSS angular distribution is comparable to that at large scales: if UHECRs bear the imprint of the LSS this small-scale power is not completely suppressed by the GMF, especially once the structured component of the GMF is taken into account [36] ; moreover, the sub-structures of the GMF themselves imprint small-scale anisotropies in PeV cosmic rays [37] and it is possible that such structures can be present at higher energies. Lastly, small-scale anisotropies can be separately detected in different regions of the sky, allowing us to probe, for example, different GMF structures independently. Third, while most analyses have looked at the real-space correlation between UHECRs and the large-scale structure [38] [39] [40] [41] [42] ), we will express our results here in terms of harmonicspace power spectra. These are common observables in cosmological studies, based on a natural decomposition of the celestial sphere. They also allow for a straightforward visualization of the main components of the astrophysical model (radial kernels, details of the galaxy-matter connection), which is one of the main novel aspects of this work. In this paper, we will introduce a formalism to model the AC and XC, and apply it to a vanilla proton-only model for UHECR injection in order to quantify the differences between the two observables and the detectability of the anisotropies on different scales with existing experimental facilities. We defer the more detailed discussion of the dependence of the XC on UHECR injection and source properties, a realistic treatment of the UHECR experimental setup, such as non-uniform sky coverage, as well as a full treatment of the effects of the GMF and xGMF on the signal, to upcoming work. This paper is organized as follows. In Section 2 we introduce the formalism to describe the UHECR flux, the distribution of galaxies, and the AC and XC. We apply this formalism to a hypothetical UHECR dataset in Section 3, where we obtain and compare the AC and the XC. We summarize our findings and conclude with an outlook for future work in Section 4. Appendix A collects useful standard formulae pertaining to angular power spectra. Let E(E inj ) be the (angle-integrated, isotropic) emissivity 3 of cosmic rays (CRs) for a given galaxy (number of CRs of energy E inj emitted per unit energy, per unit time): The subscript 'inj' (injection) here indicates quantities evaluated in the rest frame of the emitting source. Due to the expansion of the Universe and to interactions between CRs and cosmic background light, the injected energy of a CR, whose energy at detection is E, is given by E inj (E, z) with z the redshift of the source. In the absence of scattering processes the energy losses are adiabatic E inj = (1 + z)E. The differential emissivity (i.e., per unit solid angle) is := E/4π assuming isotropic emission. We will parameterise the emissivity as a power-law of energy: Energies will always be expressed in EeV for convenience. The quantity measured on Earth is the observed number of events per unit time, energy interval, detector area, solid angle on the sky and (assuming source redshifts can be measured), redshift interval. We can relate this number to the emissivity through where H(z) is the Hubble parameter, n s,c is the volumetric number density of CR sources, we have set c = 1, and we have ignored subdominant light-cone and relativistic effects [43, 44] . We will be interested in the number of UHECRs detected above a given energy threshold E cut (defined in the observer's frame) and integrated over source redshifts, from the direction n: where χ(z) is the radial comoving distance. We can write the number density of sources as n s,c (z, χn) =n s,c (z) [1 + δ s (z, χn)], where δ s is the galaxy overdensity. Assuming a non-evolving galaxy population, namelȳ n s,c (z) =n s,c (0), and a power-law UHECR spectrum (as in Eq. 2.2) we obtain: (2.5) The attenuation factor α(E cut , z; γ, Z) is defined as the number of events reaching the Earth with E > E cut divided by the number of events which would have reached the Earth if there were no energy losses at a given distance: The attenuation α is a function of the energy cut and redshift, as well as the injection spectral slope and chemical composition. In terms of α, the direction-dependent integral flux is In this paper, to introduce the formalism, we choose to work with a toy proton-only model with injection slope γ = 2.6 as in model (4) of [16] , or model (i) of [17] . In order to obtain the attenuation factor for our injection model we have followed 10 6 events with SimProp v2r4 [45] with energies above E = 10 EeV (with an upper cutoff of E = 10 5 EeV), for redshifts up to z = 0.3, and counted the number of events reaching the Earth with E > E cut for different values of E cut . With SimProp we have accounted for all energy losses, adiabatic and interactions with cosmic microwave background (CMB) photons and extra-galactic background photons according to the model [46] . The UHECR radial kernels, defined in the next section, obtained from the attenuation factor α for different energies are shown in Fig. 1 . Note that we assume that cosmic ray energy losses are to first order isotropic, that is, we ignore angular anisotropies in in the CMB and extra-galactic background light, which are completely negligible for our analysis. Moreover, for simplicity here we work with full-sky uniform coverage, but the analysis can be readily generalized to non-uniform and partial sky coverage. We will define the anisotropy in the UHECRs distribution as the over-density of rays detected as a function of sky positionn as whereΦ(E cut ) is the sky-averaged UHECR flux. From the results in the previous section, this quantity is related to the three-dimensional overdensity of UHECR sources δ s (z, χn) through where the UHECR radial kernel (or window function) is . (2.10) Figure 1 shows the radial kernels for UHECRs with energy thresholds E cut = 40, 63, and 100 EeV; as expected the lower the energy the farther UHECRs propagate. shows the approximate redshift distribution of galaxies in the 2MRS sample using the fit found by [47] . The red, yellow, and blue lines show the radial kernel for the UHECR flux (Eq. (2.10)) for the three energy thresholds studied here (40 EeV, 63 EeV, and 100 EeV respectively). We will consider the AC of the UHECR anisotropy, Eq. (2.9), and its XC with the galaxy number count fluctuations. In particular, we will work with the projected overdensity of sources for a given galaxy sample, where N g (n) is the number of galaxies in a given directionn, andN g its average over the celestial sphere. This is related to the three-dimensional galaxy overdensity δ g (z, χn) via where φ g (χ) is the weighted distribution of galaxy distances. In general, we will assume that we have redshift information for all galaxies in the catalog, and that we can use that information to apply a distance-dependent weight w(χ). In that case, the galaxy overdensity kernel φ g (χ) is given by wheren g,c is the comoving number density of galaxies in the sample. If no weights are applied -namely, w(χ) = 1 -then dχ χ 2 w(χ)n g,c (χ) =N Ω,g , (2.14) whereN Ω,g is the angular number density of galaxies (i.e., number of galaxies per steradian). Figure 1 shows the radial kernel for a low-redshift galaxy survey, modelled after the 2MASS Redshift Survey (2MRS) [48] . This constitutes one of the most complete full-sky spectroscopic low-redshift surveys, and we will use it as our fiducial galaxy sample in this paper. In this work we consider full-sky data sets for simplicity, but the generalization of our results for an incomplete sky coverage is straightforward. In the case of a realistic setup based on 2MRS, a sky coverage around 70% will only degrade the signal by a factor of √ 0.7 0.86. We are interested in detecting the intrinsic anisotropies in the distribution of UHECRs by considering the different two-point functions built from ∆ CR and ∆ g . A given observation of any of these fields will consist of both signal s and noise n: ∆ a = s a + n a (where a, b ∈ {CR, g}). Assuming signal and noise to be uncorrelated, the corresponding power spectra can be split into both components, namely where S and N are the power spectra of s and n respectively. In our case, the signal is the intrinsic clustering of both UHECRs and galaxies due to the underlying large-scale structure, while the noise is sourced by the discrete nature of both tracers as Poisson noise. A brief review of the mathematics behind angular power spectra is given in Appendix A. The angular power spectrum S ab between two projected quantities ∆ a and ∆ b is related to their three-dimensional power spectrum P ab (z, k) by where φ a and φ b are the radial kernels of both quantities. The final piece of information needed in order to estimate the expected AC and XC signals is the power spectrum of the three-dimensional overdensities δ s and δ g . In general, the clustering properties of galaxies and UHECR sources will depend on the specifics of the relationship between galaxies and dark matter, and on the astrophysical properties of the UHECR sources. To simplify the discussion, here we will assume that all UHECR sources are also galaxies of the 2MASS sample (i.e. δ s = δ g ). At this point, one might be tempted to use a linear bias prescription [49] to relate the galaxy and matter power spectra. However, as we show in Section 3, since the UHECR radial kernel peaks at z = 0 and covers only low redshifts, the cosmic ray flux auto-correlation probes mostly sub-halo scales for which a non-perturbative description of structure formation is necessary. To achieve this, we use here a halo model prescription [50] , based on the halo occupation distribution model used by Ref. [47] to describe the 2MRS sample. In this model, the galaxy power spectrum is given by two contributions, being the so-called 1-halo and 2-halo terms. The former dominates on small scales and describes the distribution of galaxies within the halo, while the latter is governed by the clustering properties of dark matter haloes. The halo occupation distribution is then based on a prescription to assign central and satellite galaxies to haloes of different masses. Although we have summarized this model in Appendix C, we refer the reader to [47] and references therein, for further details about the specifics of the halo occupation distribution model used. Both projected overdensities, ∆ CR and ∆ g , are associated to discrete point processes, represented by the angular positions of the UHECRs and the galaxies in each sample. In that case, even in the absence of intrinsic correlations between the different fields, their power spectra receive a non-zero white contribution, given by whereN Ω,a (N Ω,b ) is the angular number density of points in sample a or b, andN Ω,a b is the angular number density of points shared in common. In our case this would correspond to the number of UHECRs originating from galaxies in the galaxy sample. For simplicity we will assume that the galaxy survey under consideration is sufficiently complete, so that all UHECRs are associated to an observed galaxy. In this case, the shot-noise contributions to the power spectra are (2.20) Since typicallyN Ω,CR N Ω,g , then N g CR N CR CR , and therefore we will neglect N g CR in what follows. We have explicitly checked that indeed the cross-noise can be safely neglected in all our estimates and numerical results. Note that, when non-flat weights are applied to the galaxy catalog, the resulting noise power spectrum reads For w(χ) = 1, Eq. (2.14) holds, and we recover the result in Eq. (2.19). We can use the results in this section to derive optimal weights w(χ) to maximize the signalto-noise of the galaxy-UHECR cross-correlation. Let us pixelise the celestial sphere and consider the UHECRs in a given pixel p, Φ p , as well as the vector N p,i containing the number of galaxies along the same pixel in intervals of distance χ i . The optimal weights w i := w(χ i ) can be found by maximising the likelihood of Φ p given N p,i [51] , and are given by the so-called Wiener filter, i.e. Here Cov(x, y) is the covariance matrix of two vectors x an y. Assuming Poisson statistics, we can use the results from the previous section to show (see Appendix B) that In hindsight, this result is obvious: by inspecting Eqs. (2.10) and (2.13), we see that the optimal weights modify the radial galaxy kernel φ g to make it identical to the UHECR kernel φ CR , thereby building the most likely estimate of the UHECR flux map from the galaxy positions. As we will see, this involves up-weighting galaxies at low redshifts, from where it is more likely that UHECRs that reach the Earth originate, but few galaxies can be found due to volume effects. Notice that the weights are completely driven by the theoretical model for UHECR propagation, and do not depend on the actual data (and their errors). We will show how the use of optimal weights can improve the signal-to-noise ratio for the XC in section 3.2. The Milky Way is host to a magnetic field of a few µG [52] , which is the screen that befogs UHECR sources. The variety of parametric models of the GMF, which disagree on the GMF functional forms and parameters, particularly on GMF substructures, reflects the complexity of the GMF, and, at the moment, cannot be taken at face value [52, 53] . As a guideline, we can expect the GMF to deflect a UHECR with energy E = 100 EeV by a few degrees for the most part of the sky, except for certain directions close to the Galactic plane [16] (see also Ref. [54] ). UHECRs will also be affected by any intervening xGMF, whose strength, shape, and filling factors vary by several orders of magnitude for different models and estimates [55] ; however, the xGMF is believed to have a subdominant effect on large-scale UHECR propagation [56] . Simple prescriptions to account for part of the effects related to the GMF and the xGMF include smearing the map of sources below a certain angular scale, or mixing that map with an isotropic one (similarly to what is done to take into account catalogue incompleteness beyond a certain distance), see for instance [33] . Smearing the sources map in our language is as simple as introducing a (Gaussian) beam in the signals as for the AC and XC, respectively, where and θ smear is the smearing angle. However, these solutions tend to be rather artificial and inaccurately destroy potential signal or structures in the spectra we are looking at. More precisely, we know that the largest effects due to the small-scale GMF are not isotropic, and in fact vary quite considerably across the sky, see, e.g., [17, 54] . More precisely, in [54] it was found that the UHECR deflections are majorated by the function where b is elevation. Therefore, if we smear the whole sky with the same smearing angle we are not faithfully representing the sky, and, depending on the smearing angle, we either underestimate the deflections in certain regions or overestimate them in other regions, or both. Moreover, these solutions do not account for the large-scale galactic field, which is the dominant effect and can not be described by a simple smearing. For these reasons, and in order to best introduce the method, in this first theoretical work we take a pragmatic approach and, keeping in mind all the caveats listed above, we only discuss briefly the effect of a (constant) smearing angle on the AC and XC (see also Appendix D), whereas we neglect all other effects of intervening magnetic fields. We estimate the signal-to-noise ratio (SNR) of the UHECR anisotropies as the square root of the Fisher matrix element corresponding to an effective amplitude parameter A CR multiplying the signal component of ∆ CR with a fiducial value A CR = 1 [57] , namely where S is a vector containing the signal contribution to the power spectra under consideration, Cov is the covariance matrix of those power spectra, and SNR is the SNR of a single mode. If the fields being correlated are Gaussian (∆ CR , ∆ g in our case), the covariance matrix can be estimated using Wick's theorem to be with ∆ the size of the multipole bin. At this point we can consider three different cases: 1. AC only. In this case we only have a measurement of the UHECR AC, C CR CR . The SNR is given by (3.4) 2. XC only. In this case we only use the XC, C g CR . The SNR is given by (3.5) 3. All data. We use all available data, i.e. a data vector S = (S CR CR , S g CR , S g g ). Although this is the manifestly optimal scenario, XCs are arguably safer than ACs in terms of systematic errors, and therefore it is interesting to quantify the loss of information if only XCs are used. Studying these three cases allows us to explore the benefits of using XCs vs ACs, as well as the relative amount of information in each of the different two-point functions. Given the relatively small number of UHECRs currently measured, shot noise in the UHECR flux is the dominant contribution to the uncertainties. Comparing Eqs. (3.4) and (3.5), we can see that the SNR scales like N −1 CR and N −1/2 CR for cases 1 and 2 respectively, highlighting the potential of XCs to achieve a detection. The energy E cut at which we choose to cut the UHECR integral spectrum determines the UHECR propagation horizon, which in turns determines the strength of the anisotropy. Moreover, the choice of E cut , for a given UHECR spectrum, also determines the number of UHECR events we have to sample the anisotropic angular distribution. We expect a trade-off between the two. At low energies the UHECR sample contains many more events than at high energies because the UHECR spectrum is very steep (soft/red); however, for the range of energies we are interested in, the galaxy sample is much larger, so this does not have as strong an effect for the XC as it does for the traditional AC (whose noise is determined by the number of UHECR events). Moreover, at low energies UHECRs propagate further, and the larger line-of-sight averaging can dilute the expected anisotropy. Lastly, the effects of intervening magnetic fields are stronger-this is expected to have a significant impact on the anisotropies, albeit less so for the XC compared to the AC thanks to its stability against systematics. At high energies the UHECR horizon is smaller, UHECRs undergo smaller deflections, and the anisotropy should be more pronounced, but the number of events drops dramatically. In order to determine at which energy we have the best chances of detecting the XC we chose to work with three energy cuts at: E cut = 10 19.6 eV 40 EeV, E cut = 10 19.8 eV 63 EeV, and E cut = 10 20 eV = 100 EeV. In a realistic scenario, based on data currently available [2] , we can expect to have about N CR = 1000, N CR = 200, and N CR = 30 over the full sky, for the three energy cuts defined above, respectively. While this does not fully reflect a realistic situation, mostly because of magnetic deflections which we do not take into account, and because current experimental facilities are limited in their field of view, our results nonetheless present a fair comparison between the two measures (AC and XC). This is owing to the fact that all the salient information regarding UHECR data sets is represented in our estimates, namely the energy cut and with it all the UHECR energy losses, the available or expected number of events at that energy, and the angular resolution representative of what current experiments can do. In Fig. 2 , we show the expected signal for the AC (left panel) and the XC (right panel). Colours refer to the three energy cuts discussed above, namely red for E cut 40 EeV, yellow for E cut 63 EeV, and blue for E cut = 100 EeV. The dashed and dotted curves show the 1-halo and 2-halo contributions to the total power spectrum, with the sum of both shown by the solid curves. For simplicity, we have not included any beam smoothing in the plot. We can see how the signal for the XC is lower than the AC, as is expected from the fact that the XC mixes two different radial kernels. If we employ optimal weights for the XC the signal would become identical to that of the AC. In our simplistic linear treatment of perturbations, this happens because the UHECR and galaxy kernels would be identical. The statistical uncertainties for both correlation functions, however, would be different, given their different shot-noise levels. To understand better the role of the different uncertainties on the theoretical signal, in Fig. 3 we show again the expected signal as in Fig. 2 and include a 1 • Gaussian smoothing beam to account for the angular resolution of UHECR experiments (for reference, we also show the beam-free prediction as dashed lines). On top of it, we present the corresponding -binned 1σ error bars as shaded boxes for 20 log-spaces multipole bins between min = 2 and max = 1000. If we compare the leftmost and central panels, namely AC vs XC, it is easy to see how the range of multipoles where error bars are small enough to allow a detection is larger for XC than for AC for the E cut 40 EeV and E cut 63 EeV cases. However, for the sparser UHECR sample with E cut = 100 EeV the opposite applies; more precisely, the detectable range of multipoles for the XC is smaller and pushed towards higher compared to the AC. This is due to a combination of two factors: for the higher end of UHECR energies the propagation horizon of UHECRs is small, and the UHECR sky looks more anisotropic, boosting the AC. At the same time, the mismatch in kernels is prominent, the more so the higher the energy, and it drives the XC signal down. Combined with the larger shot noise in the UHECR data, this can explain the performance of the 100 EeV case -indeed, the UHECR shot noise is the main factor that prevents a detection of the signal at mid-values (the per-signal is 1σ compatible with zero, see below). In the rightmost panel of Fig. 3 we show the XC signal when we apply theoretical optimal weights. In this case the highest energy set performs the best, and this is expected from the previous arguments: the signal is boosted back up to the same level of the AC because the kernels of galaxies and UHECRs now coincide. Additionally, while the uncertainty increases with energy as both samples become sparser, it is not large enough to hide the XC signal. It is worth noticing that the increase in galaxy power that we expect towards lower redshifts, is significantly less relevant than the matching of the radial kernels. In practice, using optimal weights may not be possible given the uncertainties in the radial kernel for UHECRs (we do not know yet the actual injection spectrum). The availability of redshift information in the galaxy catalog, however, would allow us to turn this into an advantage: the UHECR kernel could be reconstructed by modifying the galaxy weights to maximize the signal-to-noise, essentially following the 'clustering redshifts' method used to reconstruct unknown redshift distributions in weak lensing data [58] . To quantify the improvement in detectability brought by the XC, in Fig. 4 we present the cumulative SNR for all the data combinations discussed in Sect. 3.1, viz. AC alone (leftmost panel), XC alone (central panel), and all the data combined in a single data vector S (rightmost panel). In each panel, the left half shows the cumulative SNR as a function of the maximum multipole, max , whilst the right half is for the cumulative SNR as a function of the minimum multipole, min . In both cases, the case with all the data combined has unsurprisingly the largest SNR, but the contributions from AC and XC come from different angular scales, which in turn are sensitive to different redshift ranges, depending upon E cut , which sets the propagation depth for UHECRs. This highlights the complementarity of the two observables. The aforementioned sensitivity to different angular scales can be captured better by looking at Fig. 5 , where we show the contribution to the total SNR from each integer multipole, Figure 5 : SNR per multipole, SNR , for the AC signal, the XC signal with both normal and optimal weights, and their combination AC+XC (leftmost, central, and rightmost panel, respectively). Different colours refer to different energy cuts, and the three horizontal, dashed lines show the thresholds for 1, 2, and 3σ detection. SNR . The colour code is the same as throughout the paper, and we mark with horizontal dashed lines the thresholds corresponding to 1, 2 and 3σ evidence for a one-parameter amplitude fit. These panels can be interpreted as the evidence for anisotropy on a given scale, for which it is clear that the XC with galaxies helps to push the detectability of the signal to smaller scales, i.e., larger values. This per-SNR is a useful quantity to assess whether the AC or the XC is the best observable to detect the anisotropy in UHECRs, assuming that UHECRs trace the LSS. The sensitivity of the XC to small-scale anisotropies can be precious in realistic situations for two further reasons. First, a single Earth-based experiment is blind to a large fraction of the sky (roughly speaking one celestial hemisphere); galaxy catalogues can also have incomplete sky coverage. Moreover, it might be advantageous, see our discussion of the direction-dependent magnetic deflections in Sec. 2.4, to restrict the UHECR data set to a portion of the sky to maximise the chances for a clean detection. In all these situations the low harmonic multipoles are the most affected by these sky cuts. Second, if two experiments join their data set as the Telescope Array and Pierre Auger collaborations have done in their harmonic AC analysis, they need to cross-calibrate their sets, and this cross-calibration introduces errors that are significantly larger for low multipoles than for high multipoles [21, 22, 59, 60] . As we have argued in Section 2.4, there is no shortcut to account for the effects of the GMF on the AC and XC. Nonetheless, it is instructive to look at how the signal degrades with a simple smearing of the source map. To this end, we have plotted the total SNR for = [2, 1000] as a function of the smearing angle θ smear of the galaxy map, for the same energy cuts we have used so far in Fig. 6 . According to Eq. (2.26) the deflections for 40 EV rigidity peak at around 7 • near the Galactic centre, whereas more than half of the sky would be well described by a 2.5 • smearing-note that, as we have mentioned in the introduction, the composition of UHECRs at the highest energies is not known [3, 4] , a heavier composition would imply lower rigidities and larger deflections. The smearing impacts the high-multipole regions in the XC more than it does for the AC, as expected, and degrades the XC more Figure 6 : Total SNR, max = min SNR 2 , as a function of the smearing angle for the AC signal, the XC signal with both normal and optimal weights, and their combination AC+XC (leftmost, central, and rightmost panel, respectively). Different colours refer to different energy cuts, and the horizontal, dashed line shows the thresholds for 3σ detection. prominently at larger smearing angles (see also Appendix D). The XC is strongly dependent on energy and choice of weights. This means that we can disentangle small-scale anisotropies caused by the propagation through the GMF from the intrinsic anisotropies inherited from the LSS. In particular, any GMF-induced signal is erased at higher energy because UHECRs go more straight, whereas the LSS-inherited signal is enhanced because of the smaller propagation horizon. Moreover, any GMF-induced signal is indifferent to the weights we apply, whereas the signal from LSS anisotropies is strongly enhanced with the use of optimal weights. Before closing this section, let us remark that in a real experiment there will be modelled and unmodelled systematic errors to take into account. Systematic errors are expected to contribute to the AC more significantly than to the XC, particularly on large scales (lowend), e.g., the cross-calibration of two UHECR data sets. On the other hand, biassed redshift information in galaxy catalogues or UHECR injection properties will affect both observables. To be clear, whereas the galaxy catalogue and the optimal weights, which depend on UHECR data for the reconstruction of the injection properties, do not enter in the AC obtained from UHECR data alone, they are needed once we test the source model, e.g., that UHECRs correlate with the LSS. Hence, once systematic effects are taken into account, the SNR for the AC may decrease more than that of the XC. This is one further motivation to explore the possibilities and improvements from the use of cross-correlations in UHECR anisotropy studies. In this work, we have introduced a new observable for UHECR physics: the harmonic-space cross-correlation between the arrival directions of UHECRs and the distribution of the cosmic LSS as mapped by galaxies, Eq. (2.16). We have developed the main theoretical tools that are necessary to model the signal and its uncertainties. The take-away points of this study are: • The cross-correlation can be easier to detect than the UHECR auto-correlation for a range of energies and multipoles (see Figs. 3 and 5) . This performance is mostly driven by the sheer number of galaxies that can trace the underlying LSS distribution, which is assumed to be the baseline distribution for both the UHECR flux and the galaxy angular distribution. • The cross-correlation is more sensitive to small-scale angular anisotropies than the autocorrelation, and vice versa. It can, therefore, be instrumental in understanding properties of UHECR sources that would not be accessible otherwise. • It is in principle possible to optimize the cross-correlation signal by assigning optimal redshift-dependent weights to sources in the galaxy catalog, to match the UHECR radial kernel as determined by UHECR energy losses. Since matching the kernels has a strong impact on the cross-correlation, it could be possible to use this effect to reverse-engineer the injection model (which defines the radial kernel). • The great disruptor of UHECR anisotropies is the GMF. The cross-correlation, with its higher signal-to-noise ratio and sensitivity to small angular scales, could be very useful in understanding the properties of the GMF (although we have not explored this angle here). Moreover, it may be possible, in the near future, to exploit a tomographic approach to disentangle the effects of intervening magnetic fields from different injection spectra, and study different regions of the sky separately. In our treatment, we do not take any experimental uncertainties into account, besides the experimental UHECR angular resolution. Moreover, we limit ourselves to a proton-only injection model and do not include the effects of the intervening magnetic fields. This choice was made in order to underline the physics behind our proposal and method, and can be readily generalized and extended to include the (theoretical and experimental) properties of the different galaxy and UHECR catalogs, different injection models, and to separate the number of events and energy cut, in order to best forecast the possibilities of present and upcoming UHECR data sets. Moreover, in this first work we have made the case for the XC between UHECRs and galaxies, but the logic and methods we have developed can be applied to other XCs with different matter tracers and different messengers. The distribution of visible matter in the sky can be traced not only by galaxies, but also by the thermal Sunyaev-Zeldovich effect. The latter is produced by the inverse Compton scattering of CMB photons by hot electrons along the line-of-sight. Because a thermal Sunyaev-Zeldovich map is a map of CMB photons, it is very accurate down to angles much smaller than a degree, and its signal peaks at low redshifts [61] . This cross-correlation could therefore be useful in further disentangling the astrophysical properties of UHECR sources. Charged UHECRs are not the only high-energy messengers whose production mechanisms and sources are not known. Recently, the IceCube collaboration has detected a few high energy astrophysical neutrinos, with energies above a PeV [62] . Such neutrinos are expected to be produced in the same extreme astrophysical sources as UHECRs and/or in their immediate surroundings. The cross-correlations between neutrinos and the LSS will then inform about the properties of the highest-energy astrophysical engines, see [63] (see also [64] ). Without the use of cross-correlations, because of the very small number of neutrino events in present data, and in the foreseeable future [63, [65] [66] [67] , the detection of the anisotropic pattern could be challenging. Since neutrinos interact extremely weakly, they can propagate unhampered for long distances: their horizon is almost the entire visible Universe. Therefore, in addition to galaxies, complementary information could be extracted from cross-correlating neutrinos with other tracers, including CMB lensing [68] and cosmic shear surveys [69] , both of which trace the overall matter distribution in the Universe, including both its dark and luminous components, out to higher redshifts with broader kernels (see Refs. [28] [29] [30] [31] for the analogous analysis with γ rays). Measuring these cross-correlations could reveal whether the most energetic particle accelerators in the Universe preferentially reside in high-density visible or dark environments. Three-dimensional fields δ a (x) can be decomposed into their Fourier modes δ a (k) := dk 3 δ a (x) e −ik·x , (A.1) whose covariance is the power spectrum P ab (k). Assuming statistical homogeneity and isotropy, it is implicitly defined by where the angle brackets denote averaging over ensemble realizations of the random fields inside them. Equivalently, two-dimensional fields ∆ a (n) can be decomposed into their harmonic coefficients where Ω = (θ, ϕ) is the solid angle on the sky, Y m are the spherical harmonic functions,n is the line-of-sight direction, and in this work a = {CR, g}. The covariance of the ∆ m is the angular power spectrum S ab , defined as For two projected fields, ∆ a and ∆ b , associated to three-dimensional fields δ a and δ b via radial kernels φ a and φ b (as in Eqs 2.9 and 2.12), their Fourier-and harmonic-space power spectra are related through where j is the spherical Bessel function of order . For broad kernels, we can use the Limber approximation, j (x) ∼ π/(2 + 1)δ( +1/2−x), in which case the previous relation simplifies to Eq. (2.16). Here we derive the choice of optimal weights discussed in Section 2.3.3. The derivation is a standard result in statistics and follows the discussion in Appendix A of [51] . Consider a vector of N measurements x = (x 1 , ..., x N ) and the problem of finding the linear combination of this vector that provides the best estimator of a given quantity y. Assuming Gaussian statistics, the conditional probability is where z ≡ (y, x 1 , ..., x N ), and C ab is the covariance matrix between a and b 4 . The minimumvariance estimator for y given this distribution coincides with its mean, which is given bŷ The linear coefficients w are the so-called Wiener filter. If y is the UHECR flux and x is a set of galaxy overdensity maps at different radial shells with comoving width dχ, in the shot-noise dominated regime C xy and C xx are given by where we have ignored all r-independent prefactors. Therefore, the Wiener filter in this case is . (B.5) Halo occupation distribution models have been used profusely in the literature to describe the connection between the galaxy number density and the matter overdensity. We describe briefly the specifics of the model used here to describe the low-redshift 2MRS sample, which follows [47] . Within the halo model [50, 70] , all matter in the Universe can be found in haloes of different masses, and therefore the fluctuations of a given quantity x can be described in terms of its distribution around haloes as a function of halo mass u x (r, M ) (also called the halo profile), and the correlated distribution of haloes on large scales. In this formalism, the power spectrum between two quantities x and y receives two contributions, coming from the correlations between mass elements belonging to the same halo and mass elements in different haloes (the so called '1-halo' and '2-halo' terms), as P xy = P 1h xy + P 2h xy , with where n h (M ) and b h (M ) are respectively the halo mass function and the halo bias, P lin (k) is the linear matter power spectrum, and u x (k, M ) is the Fourier transform of the halo profile. In our case, we want to model the galaxy overdensity δ g , and therefore we need to specify the halo galaxy density profile. For this, we use the formalism used in [47] : u s (r, M ) = Θ(r max,g − r) (r/r s,g )(r/r s,g + 1) 2 , (C. 6) where Θ is the Heaviside function. The free parameters of the model are M min , M 1 , σ log M , r max,g /r s , r s,g /r s and α, where r s is the mass-dependent halo scale radius. We use the best-fit values found by [47] for these parameters in our calculation. Extremely energetic cosmic-ray event Open Questions in Cosmic-Ray Research at Ultrahigh Energies, Front Highlights from the Pierre Auger Observatory (ICRC2019) Combined Fit of the Spectrum and Composition from Telescope Array (ICRC2019) The Astrophysics of Ultrahigh Energy Cosmic Rays A signature of EeV protons of Galactic origin Search for EeV Protons of Galactic Origin Observation of a Large-scale Anisotropy in the Arrival Directions of Cosmic Rays above 8 × 10 18 eV End to the cosmic ray spectrum? Upper limit of the spectrum of cosmic rays First observation of the Greisen-Zatsepin-Kuzmin suppression Observation of the suppression of the flux of cosmic rays above 4 × 10 19 eV Large scale cosmic homogeneity from a multifractal analysis of the pscz catalogue The WiggleZ Dark Energy Survey: the transition to large-scale cosmic homogeneity Homogeneity and isotropy in the Two Micron All Sky Survey Photometric Redshift catalogue Anisotropy expectations for ultra-high-energy cosmic rays with future high statistics experiments How isotropic can the UHECR flux be? Cosmic ray anisotropy analysis with a full-sky observatory Full sky harmonic analysis hints at large ultra-high energy cosmic ray deflections Indications of Intermediate-Scale Anisotropy of Cosmic Rays with Energy Greater Than 57 EeV in the Northern Sky Measured with the Surface Detector of the Telescope Array Experiment Telescope Array collaboration, Full-sky searches for anisotropies in UHECR arrival directions with the Pierre Auger Observatory and the Telescope Array Large-Scale Distribution of Arrival Directions of Cosmic Rays Detected at the Pierre Auger Observatory and the Telescope Array (ICRC2015) Arrival Directions of Cosmic Rays at Ultra-High Energies Anisotropies of the Highest Energy Cosmic-ray Events Recorded by the Pierre Auger Observatory in 15 years of Operation A Novel Approach in the Weakly Interacting Massive Particle Quest: Cross-correlation of Gamma-Ray Anisotropies and Cosmic Shear Particle dark matter searches in the anisotropic sky, Front Synergies across the spectrum for particle dark matter indirect detection: how HI intensity mapping meets gamma rays Evidence of Cross-correlation between the CMB Lensing and the Γ-ray sky Dark Matter Searches in the Gamma-ray Extragalactic Background via Cross-correlations With Galaxy Catalogs Cross-correlating the γ-ray sky with Catalogs of Galaxy Clusters Detection of cross-correlation between gravitational lensing and gamma rays Searches for correlation between UHECR events and high-energy gamma-ray Fermi-LAT data Testing large-scale (an)isotropy of ultra-high energy cosmic rays Search for signatures of magnetically-induced alignment in the arrival directions measured by the Pierre Auger Observatory The Pierre Auger Observatory: Contributions to the 36th International Cosmic Ray Conference Anisotropies of Ultra-high Energy Cosmic Rays Dominated by a Single Source in the Presence of Deflections Local Magnetic Turbulence and TeV-PeV Cosmic Ray Anisotropies Searching for a Correlation Between Cosmic-Ray Sources Above 10 19 eV and Large-Scale Structure A Search for Correlation of Ultra-High Energy Cosmic Rays with IRAS-PSCz and 2MASS-6dF Galaxies Update on the Correlation of the Highest Energy Cosmic Rays with Nearby Extragalactic matter Searches for Anisotropies in the Arrival Directions of the Highest Energy Cosmic Rays Detected by the Pierre Auger Observatory Cross-Correlation between UHECR Arrival Distribution and Large-Scale Structure The linear power spectrum of observed source number counts What galaxy surveys really measure SimProp v2r4: Monte Carlo simulation code for UHECR propagation Intergalactic photon spectra from the far ir to the uv lyman limit for 0 < Z < 6 and the optical depth of the universe to high energy gamma-rays Angular power spectrum of galaxies in the 2MASS Redshift Survey The 2MASS Redshift Survey-Description and Data Release An Analytic model for the spatial clustering of dark matter halos Halo occupation numbers and galaxy bias Detecting the anisotropic astrophysical gravitational wave background in the presence of shot noise through cross-correlations IMAGINE: A comprehensive view of the interstellar medium, Galactic magnetic fields and cosmic rays Uncertainties in the Magnetic Field of the Milky Way Mapping UHECRs deflections through the turbulent galactic magnetic field with the latest RM data The origin, evolution and signatures of primordial magnetic fields New limits on extragalactic magnetic fields from rotation measures Statistical techniques in cosmology Calibrating Redshift Distributions Beyond Spectroscopic Limits with Cross-Correlations Pierre Auger Observatory and Telescope Array: Joint Contributions to the 33rd International Cosmic Ray Conference Searches for Large-Scale Anisotropy in the Arrival Directions of Cosmic Rays Detected above Energy of 10 19 eV at the Pierre Auger Observatory and the Telescope Array Planck's view on the spectrum of the Sunyaev-Zeldovich effect Observation of High-Energy Astrophysical Neutrinos in Three Years of IceCube Data A Cross-Correlation Study of High-energy Neutrinos and Tracers of Large-Scale Structure Tomographic Constraints on High-Energy Neutrinos of Hadronuclear Origin High energy neutrino astronomy with KM3NeT Performance of two Askaryan Radio Array stations and first results in the search for ultrahigh energy neutrinos Arianna: Current developments and understanding the ice for neutrino detection Planck 2018 results. VIII. Gravitational lensing Weak lensing for precision cosmology Halo Models of Large Scale Structure Acknowledgments FU wishes to thank A. di Matteo for useful correspondence, and P. Tinyakov The effects of the GMF deflections are not the same across harmonic multipoles. We expect that, for both the AC and the XC, small scales would be more affected by the deflections. We visualise this in Fig. D.1 , where we show the equivalent of Fig. 6 but for multipoles in the four half decades: ∈ [3, 10[, ∈ [10, 33[, ∈ [33, 100[, ∈ [100, 333[. As anticipated, the smearing suppresses the power at small scales more incisively, for both the AC and the XC, with the XC being relatively more affected. Nonetheless, in region of the sky where the GMF is small, for example around the Galactic polar cups, the XC has better chances to be detected than the AC at large (small scales). In reading this figure one should keep in mind at least three simplifications: the larger but physically different effects of the large-scale GMF are not included; the smearing angle is constant across the sky-cf. Eq. (2.26); the choice of a Gaussian smearing is arbitrary, as other types of beams might reproduce the actual deflections more faithfully.