key: cord-0494175-fohi7syu authors: Porth, Lucas; Smith, Robert E.; Simon, Patrick; Marian, Laura; Hilbert, Stefan title: Fast estimation of aperture-mass statistics I: aperture mass variance and an application to the CFHTLenS data date: 2020-06-15 journal: nan DOI: nan sha: baf4ef937bb5a1be6f85097df77897a4df3f3cb5 doc_id: 494175 cord_uid: fohi7syu We explore an alternative method to the usual shear correlation function approach for the estimation of aperture mass statistics in weak lensing survey data. Our approach builds on the direct estimator method of Schneider (1998). In this paper, to test and validate the methodology, we focus on the aperture mass dispersion. After computing the signal and noise for a weighted set of measured ellipticites we show how the direct estimator can be made into a linear order algorithm that enables a fast and efficient computation. We then investigate the applicability of the direct estimator approach in the presence of a real survey mask with holes and chip gaps. For this we use a large ensemble of full ray-tracing mock simulations. By using various weighting schemes for combining information from different apertures we find that inverse variance weighting the individual aperture estimates with an aperture completeness greater than 70 per cent coverage yields an answer that is in close agreement with the standard correlation function approach. We then apply this approach to the CFHTLenS as pilot scheme and find that our method recovers to high accuracy the Kilbinger (2013) result for the variance of both the E and B mode signal, after we correct the catalogue for the shear bias in the lensfit algorithm for pairs closer than 9". We then explore the cosmological information content of the direct estimator using the Fisher information approach. We show that there is a only modest loss in cosmological information from the rejection of apertures that are of low completeness. This method unlocks the door to fast and efficient methods for recovering higher order aperture mass statistics in linear order operations. Weak gravitational lensing of the light from galaxies is a key tool for constraining the cosmological parameters and distinguishing between competing models of the Universe (Blandford et al. 1991; Kaiser 1998; Zhang et al. 2007 ). The first measurements of the correlations in the shapes of distant background galaxy images are now over two decades old (Bacon, Refregier & Ellis 2000; Kaiser, Wilson & Luppino 2000; Van Waerbeke et al. 2000; Wittman et al. 2000) and the field of cosmic shear has rapidly matured from these early pioneering studies that mapped of the order a square degree, to the modern surveys KiDS 1 , DES 2 and HSC 3 , which are mapping thousands of square degrees (Hildebrandt et al. 2017; Troxel l.porth@sussex.ac .uk † r.e.smith@sussex.ac.uk 1 kids.strw.leidenuniv.nl 2 www.darkenergysurvey.org 3 hsc.mtk.nao.ac.jp/ssp/ et al. 2018; Aihara et al. 2018; Hikage et al. 2019 ). The next decade will herald in new surveys like Euclid 4 and LSST 5 that will map volumes close to the entire physical volume of our observable Universe (Laureijs et al. 2011; LSST 2009 ). This will mean that our ability to extract information from such rich data sets will depend almost entirely on our ability to understand and model the complex nonlinear physics involved and our ability to optimally correct or mitigate the systematic errors. In the last decade, much effort has been invested in extracting cosmological information from the two-point shear correlation functions, and attempts have been made to carefully account for all systematic effects, such as PSF corrections, bias in the ellipticiy estimator, intrinsic alignments Massey et al. 2013; Troxel & Ishak 2015) . The two-point shear correlation functions are the lowest order statistics that are of interest and if the convergence field were a Gaussian random field, then they would contain a complete description of the statistical properties of the cosmic shear signal. However, the distribution of observed galaxy ellipticities are non-Gaussian due to various effects: firstly, the nonlinear growth of large-scale structure induces the coupling of density modes on different scales (Schneider et al. 1998 ); secondly, the estimator for shear from ellipticity is a nonlinear mapping; thirdly, the violation of the Born approximation and the lens-lens coupling also lead to non-Gaussianity in the shear maps. This all leads to a 'flow' of information into the higher order statistics (Taylor & Watts 2001) . A consequence of this is that the errors on measurements of the convergence power spectrum become highly correlated on small scales, limiting the amount of additional information that can be recovered by pushing down to smaller scales (Sato et al. 2011; Hilbert et al. 2012; Kayo, Takada & Jain 2013; Marian et al. 2013) . The need to go beyond the simple two-point analysis of the data has been highlighted by a number of authors (see for example Sefusatti et al. 2006; Byun et al. 2017) . For example, it is well known that ξ+ and ξ− exhibit a degeneracy between the amplitude of matter fluctuations σ8 and the matter density parameter Ωm, which scales as σ8Ω 0.5 m . One way to break this degeneracy is by combining the information from the 2-point and 3-point shear correlation functions (Kilbinger & Schneider 2005; Semboloni et al. 2011; Fu et al. 2014 ); another way is through adding in the information found in the statistical properties of the peaks in the shear field (Marian et al. 2013; Kacprzak et al. 2016) . Given the potential of the non-Gaussian probes to tighten constraints on cosmology and break model and nuisance parameter degeneracies, it is important to study how to optimally measure them, determine how systematics affect them, and to improve the modelling of them. This work will be essential to undertake, if we are to take full advantage of surveys like KiDS, DES, HSC, Euclid and LSST. One of the bottlenecks for accessing the information in the higher-order statistics is that they are challenging quantities to work with. For example, owing to the fact that the shear is a spin-2 field, there are in principle 2 n correlation functions to measure for each nth order cumulant (Schneider & Lombardi 2003; Takada & Jain 2003; Jarvis, Bernstein & Jain 2004; Kilbinger & Schneider 2005) . Building the necessary computational tools to measure the 3-and 4point shear correlation functions is technically challenging and will require large amounts of CPU time to compute all possible configurations (Jarvis, Bernstein & Jain 2004; Kilbinger, Bonnett & Coupon 2014) . This is especially true if measurement-noise covariance matrices are to be derived from mock catalogues. In addition, the shear correlation functions are not necessarily the best quantity to measure since they are not E/B mode decomposed Schneider & Kilbinger 2007) . A powerful method to disentangle systematic effects from cosmic shear signals is the E/B decomposition (Crittenden et al. 2001; . At leading order, pure weak lensing signals are sourced by a scalar lensing potential, which means that their deflection fields are curl free. Equivalently, the ringaveraged cross component of the shear is expected to be zero (the B mode), while the tangential one contains all the lensing signal (the E mode). Thus B modes enable a robust test for the presence of systematic errors. One method to take advantage of this E/B decomposition is the so-called 'aperture mass statistics' first introduced by Schneider et al. (1998) . 'Aperture mass' (Map) and 'Map-Cross' (M×) are obtained by convolving the tangential and cross shear with an isotropic filter function. Therefore by construction they are E/B-decomposed. Taking the second moment leads to the variance of aperture mass, the third to its skewness etc. The standard approach for measuring the aperture mass statistics in data utilises the fact that, for the flat sky, any n-point moment can be expressed in terms of integrals over the n-point shear correlation functions, modulo a kernel function Jarvis, Bernstein & Jain 2004 ). The reason for adopting this strategy stems from the fact that for a real weak lensing survey, the survey mask is a very complicated function: firstly there are survey edges; next, due to the fact that bright stars and their diffraction halo need to be drilled out, chip gaps, if not accounted for in the survey dither pattern, can lead to additional holes. This small-scale structure in the survey mask means that in order to make the most of the survey data one should measure the correlation functions. However, this approach is not without issue: for example, for the correlation function estimator of the aperture mass dispersion to be accurate and E/B decomposed, one needs to measure ξ+ and ξ− in angular bins sufficiently fine for the discretisation of the integrals to be reliable (Fu et al. 2014 ). Further, one also needs to measure the correlation function on scales ϑ ∈ [0, 2ϑ] for the polynomial filter function of Schneider et al. (1998) . Owing to galaxy image blending, signal-to-noise issues and the finite size of the survey, the lower bound is never possible and the upper bound means that biases can occur due to edge effects. This leads to so called E/B leakage (Kilbinger & Schneider 2005) . In addition, while the mean estimate is unbiased, the covariance matrix does require one to carefully account for the mask Friedrich et al. 2016 ). More recent developments that also make use of the shear correlation functions, while circumventing the issues of E/B leakage on small scales are the ring statistics and COSEBIs (Schneider & Kilbinger 2007; Schneider, P., Eifler, T. & Krause, E. 2010) . In this paper, we take a different approach and explore the direct estimators of the aperture mass statistics, which were first proposed in Schneider et al. (1998) . Rather than measuring the correlation functions of the shear polar, only to reduce them by integration to a scalar, we instead directly measure Map for a set of apertures and then use an optimised weighting scheme to average the estimates. As we will show in what follows, this approach has some significant advantages over the correlation function approach. In addition to the variance, one can also measure higher order statistics, such as the skewness and kurtosis, with very little additional computational complexity, code modification or CPU expense (see Porth et al. in preparation) . These efficiencies will also potentially enable fast computation of covariance matrices and thus rapid exploration of the likelihood surface for such statistics. The possible down sides to this approach, which we explore, are the potential loss of cosmological information arising due to the fact that some incomplete apertures will be rejected. On the other hand, we will also explore the possibility of not rejecting all incomplete apertures, but accepting/weighting apertures based on criteria such as coverage factor and the signal-to-noise. This will lead to E/B leakage, however, as we will show the levels of leakage can be made sufficiently small so that the statistic is accurate within the required errors. As a practical demonstration of this approach we apply it to the CFHTLenS data and present a careful comparison of it with the two-point correlation function method. Lastly, we make use of a large suite of mock catalogues to study the cosmological information content of the two methods for a nominal CFHTLenS like survey and show that there is no substantial loss of information. The paper breaks down as follows: In §2 we define the key theoretical concepts for weak shear and introduce our notation. In §3 we define the aperture mass and give expressions for the aperture mass variance in terms of the matter power spectrum, we also give the alternative relation between it and the shear correlation func-tions. In §4 we develop the direct estimator methodology, giving an explicit computation for the mean and variance in the presence of ellipticity weights and also show how the direct estimator can be accelerated and made effectively linear order in the number of galaxies and number of apertures. We discuss various strategies for combining estimates from an ensemble of apertures that give both, high signal-to-noise and a small bias induced by including incomplete apertures. In §5 we turn to the analysis of the CFHTLenS data. We give an overview of the data we use and also the mock catalogues that we generate to test for systematic errors. As a preliminary analysis we present the aperture mass maps for the survey. In §6 we investigate the bias of the direct estimator induced by the CFHTLenS mask through measuring the aperture mass variance on the mock catalogues and comparing it to the results obtained when using the correlation function method. After determining the weighting scheme that induces the smallest bias we use it to measure the aperture mass variance on the true CFHTLenS data and compare it to the analysis presented in Kilbinger et al. (2013) . We also check how the results change when removing blended sources from the data. In §7 we use the mock catalogues to investigate the cosmology dependence and the information content of both estimators via the Fisher information. Finally, in §8 we summarise our findings, conclude and discuss future work. In this paper we are principally concerned with the weak lensing of distant background galaxy shapes by the intervening large-scale structure (Blandford et al. 1991; Kaiser 1998; Seitz, Schneider & Ehlers 1994; Jain & Seljak 1997; Schneider et al. 1998 ); see Bartelmann & Schneider (2001) ; Dodelson (2003 Dodelson ( , 2017 for reviews. The two fundamental quantities describing this mapping from true to observed galaxy images are the convergence κ and the shear γ which, assuming a metric theory of gravity, are both derived from an underlying scalar lensing potential. In a cosmological setting the convergence at angular position θ and radial distance χ can be connected to the density contrast δ(χθ, χ) as: where Ωm,0 is the total matter density, H0 denotes the Hubble constant, a is the scale factor and c the speed of light. In a real survey we will not necessarily have access to the precise redshifts of each source galaxy. Instead, we will typically have the redshift distribution of sources determined through photometric redshift estimates. Hence, the effective convergence will be obtained by averaging over the source population pχ: where χH is the comoving distance to the horizon and the weight function g(χ) is defined as where we used that the weight function can be equivalently written in terms of the differential number counts by noting that pχ(χ)dχ = pz(z)dz = (dN/dz)/NTOT. In the left panel of Figure 1 we show the redshift distribution of galaxies in the CFHTLenS for the four fields W1, W2, W3 and W4 and the total obtained for the combination of all fields. They were obtained by averaging over the BPZ posterior including the lens weights. One can clearly see that there are significant field to field variations in the redshift distributions, with the W2 and W4 fields showing the largest deviations from the mean in the range z ∈ [0.2, 0.4] for W2 and z ∈ [0.5, 0.7] for W4, respectively. The shaded region in the plot shows the standard error region on the mean. This was estimated using a jackknife resampling of the data. The right panel of Figure 1 shows the lensing weight function g(z) computed using the estimated distribution function p(z) shown in the left panel for the CFHTLenS. We see that while there are features in the p(z) distribution, these are effectively washed out when computing the lensing weights for the population. In fact, the most significant outlier is the W2 field, which appears to have a slightly high amplitude at for redshifts z > 1.5. Owing to the fact that gravity only 'excites' certain shear patterns we wish to rotate the shear into the frame where we can more easily separate out the these modes. This is done by decomposing the shear into 'tangential' and 'cross' components. Consider the shear field at a position vector θ + θ0, where θ0 is an arbitrary location and θ is a radial vector centred on θ0. We may rotate the shear field by the polar angle of the separation vector θ to obtain the tangential and cross components (Bartelmann & Schneider 2001) : (4) where φ is the polar angle associated with the vector θ. The main advantage of this transformation is that for an axially-symmetric mass distribution, the shear is always tangentially aligned relative to the direction towards the origin of the mass distribution and the cross component will vanish. This result is not true for any randomly selected point for the origin. However, if we average the tangential shear over a ring it can be related to the enclosed surface mass density κ: γt(θ; θ0) circ = κ(θ; θ0) − κ(θ; θ0) circ . On the other hand, if we ring average the cross-shear it will vanish: γ×(θ; θ0) circ = 0 (Kaiser 1995; Schneider 1996) . In this paper we are primarily concerned with the statistical properties of the ring averaged tangential shear integrated over a filter function with compact support -the aperture mass. Aperture mass was developed by Schneider (1996) as a technique for using a weighted set of measured shears within a circular region to estimate the enclosed projected mass overdensity. It can be defined as follows: consider an angular position vector in the survey θ0, and let us compute the tangential shear field around this point. Aperture mass is now defined as the convolution of the tangential shear with a circularly-symmetric filter function Q, with a characteristic scale ϑ, above which the filter functions are typically set to zero. It can be expressed as: In a similar vain one can also define the cross component of aperture mass, which we refer to as 'map-cross': In the absence of systematic errors (B-modes) in the lensing data, map-cross should vanish. Map and M× are therefore said to be E/B decomposed . As was proven by Schneider (1996) , owing to the fact that the shear and convergence are sourced by the same scalar potential, one can derive an equivalent relation to that above, but computed by convolving the convergence κ with a different filter function U : It is important to note that the filter functions Q and U are not independent of one another, but are related (Schneider 1996) : Also, it is worth noting that the U filter is a compensated function (Bartelmann & Schneider 2001) . For this work we will be using a polynominal filter function introduced in Schneider et al. (1998) : where H(x) is the Heaviside function. For cosmic shear, the expectation of the aperture mass around a randomly selected point vanishes, since κ = γt = 0. Thus, the lowest order non-zero quantity of interest is the variance. Using Eq. (6) the variance of the aperture mass can be written as: Using Eq. (8) we see that this can be equivalently written as: The Fourier transform of the convergence,κ, is defined as follows: On using the above transform in Eq. (13) we find: We next use the statistical homogeneity and isotropy of the correlations of κ(θ) to define the convergence power spectrum: On inserting this into Eq. (15) and integrating over the Dirac delta function we see that the aperture mass variance can be written: wherẽ U ( ; ϑ) = R 2 d 2 y e i ·y U (|y| ; ϑ) . To progress we need to relate the convergence power to the matter power spectrum P that, in the small-scale limit and under the Limber approximation, can be related to the convergence power spectrum as: On inserting this relation into Eq. (17) and using the Schneider polynomial filter function Eq. (11) such thatŨ ( ; ϑ) = 24J4( ϑ)/( ϑ) 2 we have (Schneider et al. 1998) : As discussed earlier, the standard method for estimating the aperture mass variance is through the two-point shear correlation functions. Let us make that connection explicit. The complex shear field has two non-vanishing two-point correlation functions that can be written in terms of its tangential and cross-components as : where in this subsection θ ≡ |θ1 − θ2|. It can be shown that ξ+ and ξ− can be written in terms of the convergence power spectrum as: Using the orthogonality of the Bessel functions we can invert the above expressions to obtain the convergence power spectrum: The important consequence of the above relations is that we can now rewrite the aperture mass variance using the shear correlation functions. On substitution of Eqs. (25) and (26) into Eq. (17) one finds : On reordering the integrals over and θ, we see that the above can be written more compactly as: where Once again, on adopting the Schneider polynomial filter Eq. (11) we see that the above kernels have an analytic form : where in the above y ≡ θ/ϑ. There are several important things to note about this: first, for the case of the Schneider polynomial filter function, one needs to measure ξ+ and ξ− over the range θ ∈ [0, 2ϑ], meaning that we need information from scales close to zero separation. The correlations on small scales can not be accurately measured and will be dominated by image blending issues and shape noise (Kilbinger, Schneider & Eifler 2006) . Second, the integration to obtain the variance from Eq. (28) can only be approximately done using a set of discrete bins which need to be sufficiently dense and non empty. The result of all of this is that there will be some amount of E/B leakage, which will lead to a suppression of the signal on small scales (Kilbinger, Schneider & Eifler 2006) . The first issue is also a problem for the direct estimator, but the second is not. As discussed in the previous section, there are two approaches to estimating the aperture mass statistics. The correlation function approach outlined in the previous section has been studied in great detail. The direct estimator approach that we explore in this work has not been as well explored, we therefore now describe our extension of this approach in some detail. Here we follow Schneider et al. (1998) , but extend the work to include a set of arbitrary weights for each source galaxy. Let us first introduce the direct estimator of the aperture mass dispersion for a single field. Consider an aperture of angular radius ϑ, centred on the position θ 0,k . The aperture contains N k galaxies with positions θi with complex ellipticities i. For the case of weak lensing the observed ellipticities and intrinsic ellipticities In complete analogy to the definition of tangential and cross shear defined in Eqs. (4) and (5) we define the same quantities for the tangential and cross components of ellipticity: t = − e −2iφ and × = − e −2iφ , where the polar angle φ is relative to the origin θ 0,k . Our estimator for the aperture mass variance is defined as: where wi are weights assigned to the ith galaxy, the Qi ≡ Q(|θi|; ϑ) and where t,i is the observed tangential ellipticity of the ith galaxy measured with respect to the origin θ 0,k . Note that since the double sum will occur repeatedly, we will use the short-hand notation N k i N k j =i → i =j for brevity. We will also suppress the origin θ 0,k and also take N k = N . We show that this provides an unbiased estimator for the true aperture mass dispersion. This can be done through applying three averaging processes: averaging over the intrinsic ellipticity distributions A; then the source galaxy positions P ; and then the ensemble average over the cosmic fields E (following the notation of Schneider et al. 1998) . Ignoring the prefactor and the denominator for a moment, if we perform the A average then we get: Note that in the above we assumed that each galaxies' intrinsic ellipticity is indiviually drawn from the same Gaussian distribution G 0, σ 2 with zero mean and the shape noise σ 2 as variance, i.e. no intrinsic alignments. Next, we perform the average over the spatial positions of the source galaxies: In the first step we took the joint PDF of spatial positions to be simply the product of the independent 1-point PDFs for a uniform random distribution. In the second step, on noting that γt,i = γt,i(θi; θ0), where θ0 is the same for all the galaxies, we used the fact that the spatial integral will yield the same result no matter of the indiceshence the change (i, j) → (1, 2). In the last step, we integrated out the remaining PDFs and rewrote the domain. Finally, we perform the expectation over the cosmic fields: In Appendix A we calculate the variance of the estimator Eq. (33) and, for a moment supressing ϑ, we find that it can be written as: where G and Ms,2 are as defined as: Importantly, in the limit where all of the source galaxy weights are equal we recover the expression derived in Schneider et al. (1998) . It is interesting to obtain an approximate form for the above variance. Firstly, let us consider the case where the number of galaxies per aperture is large such that N 1, whereupon we see that all of the partial sums are approximately equivalent to the full sums, e.g. i j =i wiwj ≈ ( i wi) 2 . Consequently, all of the prefactors involving the weights can be dramatically simplified to give: where we defined S2 ≡ i w 2 i /( i wi) 2 . Let us inspect the quantity S2 in more detail: the Cauchy-Schwarz inequality tells us that where the elements of the sets {ui} and {vi} are drawn from the reals. If we take vi = v for any i, then we see that On applying this to our ratio S2 we see that: where w 2 = i w 2 i /N and w = i wi/N . This insight leads us to make our next approximation, since Ms,2 ∼ M 2 ap we see that S2Ms,2 M 2 ap , since S2 ∝ 1/N . This, then, leads us to write: Thirdly, let us further assume that the underlying shear field is Gaussian and hence M 4 ap = 3 M 2 ap 2 . Under these circumstances, which will be fulfilled for large apertures, the variance can be written as: The first term in the bracket is cosmic variance and the last term denotes the shape noise contribution. The left panel of Figure 2 shows the error on a given estimate from a single aperture, using Eq. (42). The right panel shows the corresponding prediction of the signal-to-noise on the aperture mass variance, per aperture, again using Eq. (42). In order to generate these prediction we have used (20) as a model for the cosmic variance contribution. If we were to naively implement the direct estimator approach as given by Eq. (33) then we see that in order to compute the estimate of the variance for a single aperture we need to compute the sum from N (N − 1) galaxies. Thus one might conclude that the method scales as typical N 2 pair counting approach for galaxies inside the aperture. However, we now show that the method can be made to scale linearly with the number of galaxies. Let us consider again the estimator from Eq. (33), and we notice that if we put back the term that has i = j and explicitly subtract it then we have: If we now introduce the estimators for aperture mass and Ms,2 as discretised versions of their definition, we see that Eq. (43) can be rewritten as: where for brevity we used the notation ( Map) k ≡ Map(ϑ|θ 0,k ) and ( Ms,2) k ≡ Ms,2(ϑ|θ 0,k ). Note that both terms in the brackets receive an identical contribution of shape noise, hence the second term should not be neglected. In general, the estimator Eq. (46) is mathematically identical to that of Eq. (33) and therefore is also an unbiased estimator for the variance of the aperture mass. However, algorithmically it has a significant advantage in that it is linear in the number of galaxies. This owes to the fact that all of the terms on the right-hand-side of Eq. (46) are linear in N . For example, the estimate of ( Map) k is linear, so too are the correction factors ( Ms,2) k and S. As we will show in the second paper in this series (Porth et al., in prep.), it can be shown that this can be naturally extended to higher order aperture mass statistics. This acceleration of the method to linear order opens the door to a significant advantage in speed for estimation of aperture mass statistics at all order. The estimator Eq. (46) is for a single aperture k and as such will provide a single low-signal-to-noise, albeit unbiased, estimate. We now wish to make use of the full area of the survey available to us. We are therefore confronted as to how to best achieve this. As proposed by Schneider et al. (1998) , one simple approach would be to sample well seperated apertures such that the shear in one field is statistically independent from another. This would yield the estimator: where W k are weights and the sum extends over the Nap apertures. Since the estimates can be considered to be statistically independent then the noise can be minimised by choosing the weights to be given by: However, this approach would be suboptimal in that it does not take advantage of the full area of the survey. In this case, the signal-tonoise on the estimate for the full field can be achieved by multiplying the estimates for the aperture mass variance per single aperture by the square root of the number of independent apertures. A much better approach, which makes better use of the full survey area, is to oversample the apertures. Since the estimate for the survey is still given by Eq. (47) and since it is a linear combination of the estimates for the single field, it too is unbiased: However, the variance of the estimate for the survey is no longer trivial to determine. This in turn means that the weights W k from Eq. (48) are no longer optimal. Computing the optimal weights will be further complicated if we include incomplete apertures in the estimate -which we discuss next. We next turn to the problem of aperture completeness. In real surveys there are regions of the survey that are masked out due to bright stars, chip gaps and the survey boundaries. The question now arises: what do we do if an aperture has some fraction of its area overlapping with the mask? The simple answer would be that we exclude all such apertures from the estimator. The problem with this approach is that depending on the size of the aperture this may significantly impact the available survey area from which to compute the estimate and thus make the approach sub-optimal. Here we will explore the idea of effectively including all apertures that fit within the survey boundary, but apply weights to each of the form: (42) as well as the combined result. Note that in adopting (42) we have assumed that the convergence is Gaussianly distributed. The theoretical signal-to-noise per aperture as a function of the aperture size ϑ in the CFHTLenS data. The black dotted line shows the cosmic variance limit. where c k ≡ A k /A is the completeness factor for the kth aperture, where A k is the available area of the aperture and A is the unmasked area of the aperture, such that we have c k ≤ 1. Var is related to the variance of the estimate in the aperture. The ellipsis denotes that in general the weights could depend on other factors. In this work we will explore three distinct choices: The first case corresponds to accepting all apertures whose completeness factor c k ≥ α and for those that do we combine them in a simple average with equal weights to arrive at our estimate for M 2 ap . The second case corresponds to accepting all apertures, irrespective of the completeness factor, but combining all of the estimates using an inverse variance weighted estimate, where the variance is approximated by Eq. (42). The third case is simply the product of the first and second case. It is important to note that unless α = 1 our estimator given by Eq. (47) will formally become biased. This means that we will expect some leakage of E/B modes. Postponing a thorough analytical and numerical analysis of incomplete aperture coverage and de-biasing strategies to a companion paper, we will content our selves by investigating the degree of bias that is introduced by computing the aperture-cross statistics. For reference, these are defined in direct analogy with Eq. (33): It can be proven that the expectation of this estimator vanishes; that is provided we have no bias in the estimate we have M 2 × (ϑ|θ 0,k ) = 0. However, the variance of the estimator does not vanish and it should be given by the pure shape noise contribution to Eq. (37). Under the approximations of Eq. (42) this is: Finally, before moving on, we note that it is important to appreciate that the weights W k apply to how different fields are combined, and that the weights wi from Eq. (33) apply to how the source galaxies are combined in arriving at an estimate for a single field. We assume that these latter weights have been pre-computed by the method for estimating galaxy ellipticities. Before moving on to the computation of the estimator with real data let us estimate the computational cost for an evaluation of M 2 ap . As described above, the actual implementation is built from a series of algorithmic blocks. (i) We first construct a KD-tree data structure for the galaxy catalogue. (ii) The full survey is tiled with overlapping apertures, where the centres are separated by a distance d. (iii) The aperture coverage map is computed to give the c k values for every aperture. (iv) For apertures that pass the selection cut (H(c k − α)), a KDtree range search locates all particles that lie inside the aperture radius ϑ. (v) Estimate the aperture mass statistics and its variance according to Eq. (37) for the kth aperture. (vi) Combine the Nap estimates through a weighted mean of the resulting estimates. In the above algorithm we shall assume that Steps (i)-(iii), and Step (vi) are performed once and therefore are not the limiting factors for the execution of the code. We do note, however, that the construction of the KD-tree may have a large memory footprint and will take some non-negligible time for the initial construction. The parts of the method that require some consideration are Steps (iv) and (v). Step (iv) is a range search routine and Step (v) is a routine that evaluates the sums in Eq. (46). To compute the complexity for these steps we first identify some parameters: let p specify the order of the statistics; ϑ describe the aperture radius; and ζ be a parameter that determines the spacing d between apertures: d ≡ ϑ/ζ. We note that for a non-overlapping field of apertures whose circumferences just touch each other, we would set ζ = 1/2. Further, the number of apertures is thus a function of ϑ and ζ, Nap(ϑ, ζ). The order for the complexity can thus be computed as follows: statistic|N = 1, p) . (55) The first thing to notice is that the number of apertures scales all of the computations, so if we fix the parameter ζ, then the total number of apertures will scale as Nap ∝ Ωs/ϑ 2 , where Ωs is the survey area. The first term in the square brackets gives the computational time for a range search to deliver back the N k galaxies in the aperture. If the distribution of source galaxies is roughly randomly distributed on the sky, then we make the approximation N k ≈ N = NTotA/Ωs. Each such range search operation then takes of the order O(log NTot) time to execute, but this factor will also scale with the aperture radius and how clustered the source galaxy data are, and also the depth we need to go in the tree walk. Considering the second term in square brackets, this is the required time for computation of the estimate for the pth order aperture mass statistic. As was described earlier for M 2 ap , the estimator scales linearly with the number of galaxies in the aperture, thus in the second line we simply scale up the complexity to estimate the statistic for a single galaxy by the number of galaxies in the aperture. As we will explore in our companion paper, owing to this linear scaling, there is no additional overhead in extending the method to compute higher order statistics, beyond the variance, such as the skewness p = 3 and kurtosis p = 4. We now turn to estimating the aperture mass variance in the CFHTLenS data and in a large series of mocks generated by raytracing through N -body simulations. The Canada-France-Hawaii Telescope Lensing Survey (hereafter CFHTLenS) is a weak lensing survey that was completed around 2010. It covers 154 deg 2 of the sky in five optical bands {u * , g , r , i , z } with a ∼ 5σ point source limiting magnitude in the i band of i AB ∼ 25.5. The survey measures galaxy ellipticities for use in weak lensing analysis from multicolour data obtained as part of the CFHT Legacy Survey 6 . The survey data is distributed into four well spaced fields, three of which (W1, W2 & W4) lie close to the equatorial plane, and the third (W3) lies at high declination. Full details of the survey can be found in Heymans et al. (2012) . In this work we make use of the final public data release. The combined data set of W1, W2, W3 and W4 contains ellipticity measurements for 4,121,680 galaxies. In Table 1 we provide a summary overview of the data. Associated with each galaxy are: the angular positions RA and DEC, in radians; the x-and y-pixel coordinates in the projected tangent map; the ellipticity estimates 1 and 2, the lens weights w; the shear bias correction c and the multiplicative bias correction m; photometric redshift estimate z phot . Figure 1 shows the redshift distribution and lensing efficiency for the sources in each of the four CFHTLenS fields. In order to understand the statistical properties of the data we have generated a large set of simulated CFHTLenS skies. These mock data were generated from ray-tracing through N -body simulations. We used the zHORIZON simulations, performed on the zBOX-2 and zBOX-3 supercomputers at the University of Zürich, described in detail in Smith (2009) . Each of the zHORIZON simulations was performed using the publicly available Gadget-2 code (Springel 2005) , and followed the nonlinear evolution under gravity of N = 750 3 equal-mass particles in a comoving cube of length Lsim = 1500 h −1 Mpc; the softening length was l soft = 60 h −1 kpc. For all realizations 11 snapshots were output between redshifts z = [0, 2]; further snapshots were at redshifts z = {3, 4, 5}. The transfer function for the simulations was generated using the publicly available cmbfast code (Seljak & Zaldarriaga 1996) , with high sampling of the spatial frequencies on large scales. Initial conditions were set at redshift z = 50 using the serial version of the publicly available 2LPT code (Scoccimarro 1998; . The simulations correspond to several cosmological models, with parameters varying around a fiducial model. The latter closely matched the results of the WMAP experiment (Komatsu et al. 2009 ). There are 8 simulations of the fiducial model, and 4 of each variational model, matching the random realization of the initial Gaussian field with the corresponding one from the fiducial model. Table 2 summarizes the cosmological parameters that we simulated. From each zHORIZON simulation, 16 large fields of view were generated by choosing different observer positions within the simulation box. These large fields have side lengths of 12 deg and are covered by a regular mesh of 4096 2 pixels. For each pixel, a light ray was traced back through the simulation by a multiple-lens-plane algorithm (see Hilbert, Metcalf & White 2007; Hilbert et al. 2009 ), and its distortion due to gravitational lensing was recorded for a set of 45 source redshifts between 0 and 4. Each of the large fields was used to create four simulated mock-CFHTLenS wide field source galaxy catalogues for each of the different CFHTLenS fields W1 to W4 (i.e., 64 mock catalogues per CFHTLenS field and zHORIZON simulation). The basis for the simulated source galaxy catalogues are the actual CFHTLenS source catalogues, from which the angular positions and redshifts were taken. The reduced shear g ≡ γ/(1−κ) for each galaxy in the mock catalogues was computed by multi-linear interpolation of the sim- ulated lensing distortions onto the source galaxy's angular position and redshift (using a different angular offset within the 12×12 deg 2 simulated fields for each mock catalogue). The 'observed' source galaxy ellipticities in the simulated catalogues were then computed by combining the reduced shear from the ray-tracing and the randomly rotated observed ellipticities from the actual source galaxy catalogue. As a first step in our analysis of the CFHTLenS data, we compute several aperture based maps for the four fields of the CFHTLenS. To generate these maps the survey area was pixelated and the corresponding maps were computed for apertures located at the pixel centers. We furthermore only include apertures that are at least 20 per cent complete, the map values for all less complete apertures are set to the minimum value and therefore appear as blue pixels. Images of the official CFHTLenS masks are presented in Appendix B. In Figure 3 we show the aperture mass map and its corresponding signal-to-noise map for the W1 field. The aperture masses are estimated using Eq. (44) while the noise for each aperture was estimated as (Hetterscheidt, M. et al. 2005) : The results are shown for the Schneider filter Eq. (11) with aperture extent in the set ϑ ∈ [5 , 10 , 20 ]. It is interesting to note that near the survey mask boundaries the value of the aperture mass obtains large positive and negative values. This is due to the fact that for incomplete apertures the ring averaged tangential shear becomes biased as the mask induces non-vanishing B-modes (cross-shear). How to deal with the effects of masked apertures will be discussed in detail below and in a companion paper. In Figure 4 we show the aperture completeness map as well as a map of the aperture weights derived from Eq. (53), using the shot noise dominated limit of Eq. (42). We rescaled the latter map by its inverse mean such that the mean weight becomes unity. From there we can explicitly see that such aperture weights depend on the aperture completeness, cosmic structures and on the local depth of the survey. Analogous maps for the other three fields are presented in Appendix C. We now turn to the estimation of the aperture mass variance from the CFHTLenS data using the direct and correlation function estimators. Before performing our statistical analysis of the real CFHTLenS data we first make a study of the direct and correlation function estimators as applied to the mocks. From this we will be able to determine whether the methods are consistent with one another to within the errors and also which of the three weighting schemes given by Eqs. (51) and (53) provides the better method for estimating M 2 ap (ϑ). The left panel of Figure 5 shows the variance of the aperture mass estimated from the mocks, as a function of the aperture radius. For the results presented in this section we used a spacing of d = 0.25 ϑ between the apertures. The thin coloured lines show the results from the direct estimator approach where the estimates from individual apertures are combined with equal weight, but where a completeness thresholds c k has been adopted -this is equivalent to weight scheme W1 c.f. Eq. (51). For example the magenta lines show a conservative case, where only apertures with completeness c k 90% are taken, whereas the red lines are the most relaxed where all apertures with c k 10% are allowed. The thick dashed line shows the results obtained from the correlation function method, where the pair counts have been measured using the TreeCorr code of Jarvis, Bernstein & Jain (2004) 7 . The grey shaded regions shows the standard error for the correlation function method and the error bars are the errors on direct estimator. In all cases these were determined from the ensemble of 512 mocks of our fiducial model. The right panel of Figure 5 shows the same as the left panel, however the results from the direct estimator have been binned in completeness. Both panels also show the theoretical prediction evaluated via Eq. (20). The figure shows that there is a clear bias in the direct estimator approach when the apertures have a low completeness. This is manifest as an increased amplitude of the signal on all scales. However, for apertures with c k 75% completeness (blue line) we find that the results are in good agreement with the case of c k 90% (magenta line) and that these are fully consistent with the correlation function results on large scales, to within the errors. We note that on scales smaller than ∼ 5 the results from TreeCorr appear to be biased slightly low to those from the direct estimator approach. We furthermore note a difference between the theoretical prediction and the measurements for small aperture radii which is due to shot noise and the line-of-sight discretization from which the simulated data suffers (Hilbert et al. 2020) . The left panel of Figure 6 again shows the aperture mass variance estimated from the mocks for the two methods, but this time the estimates from individual apertures are combined using the inverse variance weighting schemes with a completeness thresholdthat is we now employ W2 and W3, c.f. Eqs. (52) and (53). The right panel of Figure 6 shows the variance of the cross component of the aperture mass M 2 × (ϑ) from Eq. (54), which for B-mode free fields should vanish. There are a number of interesting points to note from this analysis. First, we see that all of the estimates from the direct estimator are consistent with one another and that they all lie within the error bars of the TreeCorr result. Nevertheless, for scales ϑ > 20 we still observe that the direct estimator approach appears to be slightly biased high on large scales for aperture completeness values c k 40% compared to the correlation function method. However, for aperture completeness levels c k 75% we see excellent agreement between the two methods. On small scales, ϑ < 5 , the correlation function method gives slightly lower results than the direct estimator. Here we believe that the direct estimator is correct, since as was noted in Kilbinger, Schneider & Eifler (2006) , the correlation function method is biased low on scales of the order ϑ ∼ 1 due to the absence of correlation function bins on small scales. In the TreeCorr code the small-scale cutoff in the pair counts is set at ϑ > 6 . Note that in the mock data there is no image blending and so the direct estimator should not suffer from this suppression. For the observed CFHTLenS data this is not necessarily the case. We also note that, as shown in the right panel of Figure 6 , the cross component of the aperture mass variance is consistent with zero to within the error bars for all completeness fractions used. However, the c k 40% shows a small positive offset from zero at the level of below 5 × 10 −7 . On large and small scales the bias is very small for c k 75%. This gives us further confidence that the discrepancy between the direct estimator and TreeCorr on small scales is due to the bias in the correlation function approach. Another point of interest, is that we see the error bars on the most conservative completeness cuts, c k 95%, are significantly larger than those obtained from the correlation function method. This makes sense, since, for the case of the most conservative cuts, one can find only a few apertures that meet the criterion. On the other hand, for completeness fractions of the order c k 70%, the error bars between the two methods are comparable. Based on the discussion above, we will be using the weighting scheme W3 for the analysis of the observed CFHTLenS data. We now turn to the analysis of the observed data. As discussed in §4 the aperture mass variance for a set of apertures can be directly estimated using Eqs. (47) and (33). However, owing to calibration errors in the lensfit shape estimation algorithm (Miller et al. 2013 ), each galaxies ellipticity has a corresponding multiplicative m and additive c bias. Hence, the observed and true ellipticity components of the ith galaxy are related through: obs 2,i = (1 + mi) true 2,i + c2,i . For the CFHTLenS c1 was found to be consistent with zero, however c2 was found to have a S/N and size dependent bias that was subtracted from each galaxy. On average this correction was of the order 2 × 10 −3 . Correction for shear correlation functions: following Miller et al. (2013) , the shear correlation functions can be corrected for multiplicative bias through the following procedure. The 'raw' shear correlation functions are first estimated from the 'observed' shears using the estimator: where θij ≡ |θi − θj| and where Π(θij|θα) is the pair binning function which is unity if θij lies in the range [θα − ∆θ/2, θα + ∆θ/2). If we now insert Eqs. (57) and (58) into the above estimator (taking c1,i = 0 and c2,i = 0) we find: where in the above we introduced the new weights w i ≡ wi(1+mi), then we see that the above equation can be rewritten: The first term on the right in parenthesis is the 'calibrated' true shear correlation which we can write as ξ cal ± , hence we can write: Correction for direct aperture mass variance estimator: following in the footsteps of the shear correlation function approach, we define the raw uncorrected aperture mass variance estimate as: recalling that the sums only extend over the galaxies in the aperture. If we now insert Eqs. (57) and (58) into the above estimator, as before, then we find: Again, on redefining the lensfit weights this leads us to write: Thus we see that the calibrated estimate of the aperture mass variance can be written: where the normalisation factor is In left panel of Figure 7 we show the aperture mass variance and in the right the variance of the cross component variance, estimated from the CFHTLenS. The results shown are for the full combination of the W1, W2, W3 and W4 survey areas. For the direct aperture mass variance approach the estimate was obtained using the weighting scheme W3, c.f. Eq. (53), and the error was obtained using a weighted jackknife resampling of the data in patches having an area of roughly (1 deg) 2 . For the method of measuring the aperture mass variance from the shear correlation functions (thick, black dashed line), when combining the results from each field, we have simply weighted the estimates in proportion to the field area. In Fig. 7 the vertical dashed line indicates the scale ϑ = 5.5 identified in Kilbinger et al. (2013) , above which the E/B-mode leakage is claimed to be 1.5%. The E/B mode leakage on smaller scales originates from the 9 cut-off in the shear-correlation functions, which originates from the blending of galaxy images and a shear bias for close pairs (see §6.4). The important point to note from the figure is that on scales ϑ > 5.5 the measurements from the direct and correlation function estimators are in very good agreement to within the errors for a wide range of aperture completeness thresholds. In addition, with the exception of all but the lowest c k thresholds, the variance of the cross component of the aperture mass variance is consistent with zero on the same scales. On scales ϑ ≤ 5.5 , both estimators appear to have non-zero B-modes, which rise sharply on scales ϑ 2 . As noted in Miller et al. (2013) , lensfit galaxies with separations closer than 9 tend to have a bias in their ellipticities which is in the direction of the line connecting the centres of the galaxy images. For the shear correlation functions, Kilbinger et al. (2013) stated that this bias can be removed by only computing the shear correlation functions down to 9 . Their justification was that the alignment direction of the pair is, to a very good approximation, randomly orientated. We now investigate how such a bias may contaminate the direct estimator approach for aperture mass variance by comparing the measurements of the full catalogue to a reduced one in which clustered galaxies below the blending scale are removed. We proceed by describing our algorithm for removing close pairs of galaxies to generate a reduced catalogue. (i) We begin by initializing a Boolean mask value for each galaxy. These values are all initially set to unity. (ii) We then spatially organize the data using a hierarchical KDtree. (iii) We next set the pair-cut-off scale θpc, within which galaxy pairs are to be expunged from the catalogue. (iv) We now loop over all galaxies and perform a range search. If any of the galaxy positions lie within the sphere of radius θpc and have their Boolean flag set to unity, then the Boolean mask value associated with the current galaxy is set to zero and the galaxy will henceforth be excluded when estimating statistics from the data. We apply this method to the CFHTLenS data and set the pair-cutoff scale θpc = 9 . This yields a reduced catalogue that contains ∼70% of the original galaxies. Figure 8 shows the probability density function of the distance to the nearest-neighbour galaxy. Note that these curves have been normalised so that the area under the graph gives unity. Figure 9 shows our measurements of the aperture mass variance and the variance of the cross component of aperture mass, in the full and the close-pair reduced CFHTLenS shear catalogues. We compare the results from both the direct and the correlation function estimators. For the direct estimator approach we have employed the weighting function W3, with the completeness threshold set to c k = 0.7. Some important points can be noted: looking at the left panel of Fig. 9 the result of excluding pairs of galaxies that are closer then 9" lowers the amplitude of M 2 ap by less than 15% for small aperture scales (∼ 10 ). For larger apertures the changes are smaller. Interestingly, the results from the TreeCorr analysis of the full and reduced catalogue also show similar differences: for small aperture scales, the aperture mass variance appears to be lower in the reduced catalogue. In addition, we see that all of the estimators agree to within the errors on all scales. However, the agreement between the direct estimator and the TreeCorr result is exceptionally good for the measurements from the reduced catalogue. It is also interesting to compare these results with the published measurement from Kilbinger et al. (2013) (denoted as the black crosses in the plot). Here we see that our measurements are fully consistent with the Kilbinger et al. (2013) measurements, to within the errors. The right panel of Figure 9 shows that the variance of the cross-component of the aperture mass is reassuringly consistent with zero for both estimators applied to both the full and reduced catalogues. We now turn our attention to the question of addressing the possible information loss in using the direct estimator approach. We will do this using the Fisher matrix formalism. If we assume that the likelihood function for measuring the aperture mass variance for a set of N d aperture scales is Gaussian, and that the priors on the cosmological parameters are flat, then the Fisher information matrix can be written (Tegmark, Taylor & Heavens 1997) : where µ T = M 2 ap (ϑ1), . . . , M 2 ap (ϑN d ) is the set of model means measured at the bin centres and C is the model covariance matrix. The vector p is the set of cosmological parameters. In this study we will restrict our attention to the cosmological parameters σ8 and Ωm,0, as these are the most readily constrained from lensing data. The minimum variance bounds on a given cosmological parameter, after marginalising over all other parameters, can be obtained as: In order to simplify the calculation, we will assume that the first term on the right-hand-side of Eq. (68) is significantly smaller than the other term. As was noted in §6 of Smith & Marian (2015) this can be justified in the high-k limit through mode counting arguments, however, it is in general an incorrect assumption 8 . In order to compute the second term of Eq. (68) we follow the approach lain down in Smith et al. (2014) and use the mocks to evaluate all quantities. That is we measure the derivatives of the model mean with respect to the cosmological parameters and we also estimate the precision matrix C −1 . To do these tasks we make use of the ray traced mock CFHTLenS data. The derivatives are estimated using: where Nens is the number of realisation of the ensemble, and M 2 ap,(j) (θi|pα +∆pα) is the estimate of the aperture mass variance on scale θi in the jth realisation of the mocks, for the simulation with cosmological parameters pα + ∆pα. In computing Eq. (70) we use the 256 mock ray-tracing simulations of each cosmological variation of the CFHTLenS data. Note that the above estimator will reduce the cosmic variance, since when running the modified cosmology simulations we used Fourier phase realisations that were identical to the fiducial model runs. In addition, when estimating the precision matrix we take account of the bias in the estimator using the method described in Hartlap, Simon & Schneider (2007) : where N d is the dimension of the data vector and C is the standard maximum likelihood estimator of the covariance matrix of the data. Figure 10 shows the derivatives of the aperture mass variance with respect to the cosmological parameters, estimated from the 256 ray-traced mock simulations. The agreement between the results from the direct estimator approach is excellent, for all of the c k values that we considered, where we used the inverse variance weighting approach of W3 to combine estimates from individual apertures. In addition, these also agree with the results from the correlation function approach to a high degree of accuracy. The only issue to note is that for high completeness fractions c k ∼ 1 the estimates become very noisy 9 . Figure 11 shows the 2D confidence contour for Ωm,0 and σ8, and also the 1D posterior distributions, marginalised over all other parameters, for the same two parameters. Focusing on the 2D contours, we notice a number of interesting points: first, considering the results for the direct estimator, we see that as we decrease the threshold completeness value c k , the ellipses rotate clockwise by a small amount. This difference in orientation can be explained by noting that for a CFHTLenS like survey the off-diagonal elements of the covariance matrix are getting more and more noisy for increasing values of c k . The reason for this is that the effective survey footprint changes for different aperture radii and that this difference is most prominent for conservative aperture completeness cuts. When choosing a c k of around 0.7 or lower, the covariance matrices become stable and so does the orientation and area of the ellipse. Similar observations can be made for the 1D marginalized posterior distributions which lets us conclude that, within the Fisher Matrix formalism, the information content of the direct estimator is comparable with the correlation function method. There are some caveats that must be mentioned to the interpretation of these results. Firstly, both the precision matrix and the model means are estimated from the mock simulations of the 9 In making the Fisher forecasts we need to take into account the error on the model means and the precision matrix, since not doing so could lead to over-optimistic forecasts. This could be done by generalising the approach described in Taylor CFHTLenS, and are therefore subject to errors. This means that the forecasted constraining power should also come with an error. Since we are only interested in the relative information, we have not taken this into account. A more detailed study is required in order to make a more precise statement, and this is beyond the scope of this work. In this paper we have explored an alternative method for estimating the aperture mass statistics in weak lensing cosmic shear surveys. Our method is a direct estimator of the variance. With the use of a hierarchical KD-tree algorithm for ordering the data we found that the computational time for execution of this estimator was linear in the number of galaxies per aperture and the number of apertures used in the estimate. This paper, the first in a series, focused on the two-point statistics, and in particular the aperture mass variance. The summary is: In §2 and §3 we reviewed the background theory of weak lensing in the cosmological context and the aperture mass variance and its connection to the matter power spectrum. Here we also discussed the standard approach for estimating this quantity, which relies on measurements of the shear correlation functions. In §3 we introduce the direct estimator for the aperture mass variance that we employ. We show that when including ellipticity weights the estimator is unbiased. We also compute the variance of the estimator and show that in the limit of Gaussian shear signal, no source clustering and a large number of galaxies per aperture that the variance reduces to a simple expression. We then show how the original estimator introduced by Schneider et al. (1998) can be accelerated to linear order in the number of galaxies per aperture. We also discuss various weighting schemes for combining the estimates from different apertures. Finally, we illuminate the computational complexity of our estimator. In §4 we give an overview of the CFHTLenS data and we also describe our method for generating mocks of the survey using full gravitational ray-tracing simulations through N -body simulations. As a first test, we measured the aperture mass maps from the survey data. In §6 we computed the aperture mass variance from our mock surveys, using both the direct estimator and correlation function method. We found that if we included incomplete apertures in the direct estimate method, and if we combined all apertures equally that there was a significant bias in the final result. This bias vanished if only complete apertures were used. However, the errors on the estimates increased significantly. We then explored an alternative weighting scheme, where the apertures were combined using an inverse variance weighting approach, where the variance was assumed to be dominated by shape noise. The results in this case were found to be in excellent agreement with the alternative method, even in the case where incomplete apertures were included in the estimate. We found that an aperture completeness threshold of ∼ 0.7 gave very good results and contained only a small residual of B-modes, that were contained well within the error tolerance. We then turned to the application of the method to the CFHTLenS data. It was necessary to account for two additional observational biases: first, we derived the correction factors required to account for the ellipticity bias in our estimator; second, we created a modified source galaxy catalogue that removed pairs of galaxies whose images were in close projection on the sky whose ellipticities are biased by an artefact in the ellipticity estimator algorithm lensfit. On taking account of these we found that our direct estimator approach and our estimates using the shear correlation function were in excellent agreement with the published data from Kilbinger et al. (2013) . In §7 we explored the information content of the direct estimator approach and compared it with that from the correlation function method. We found that the 1D marginalised posterior distributions for σ8 were less constraining for high aperture completeness than the shear correlation function method, but that as some incomplete apertures were included in the estimate the distributions became very similar. This trend was mirrored for Ωm,0, except that for high completeness the distribution for the direct estimator was the most constraining. This may be due to errors in the forecast due to uncertainties in the precision matrix and derivatives. The 2D confidence contours for Ωm,0 and σ8 were roughly the same size for all aperture completeness thresholds, however they rotated to be in the same direction as those of the correlation function approach as lower thresholds were taken. This leads us to conclude, at this stage, that the information content of the two estimators is comparable. The main advantage of this development in not to replace the correlation function approach as the way to measure aperture mass statistics and other associated statistics, but to show that it is a credible method. As mentioned earlier, the real advantage of this approach is that it can be easily generalised to enable the measurement of higher order aperture mass statistics such as the skewness and kurtosis with very little extra effort. While these can of course also be estimated using the shear three-point and four-point correlation functions, the task of measuring these correlation functions and all of their configurations becomes increasingly onerous and time consuming. The method that we have developed will scale linearly with the number of galaxies in the aperture. This will be the subject of our upcoming study. Besides this, for application to more recent large-scale lensing surveys like DES and KiDS the analysis will need to be extended to include curved sky effects and we see no obvious issues with this. We can also calculate the variance of the estimator: Var M 2 ap (ϑ) = (πϑ 2 ) 2 i =j wi wj Qi Qj t,i t,j i =j wiwj (πϑ 2 ) 2 k =l w k w l Q k Q l t,k t,l k =l w k w l − M 2 ap 2 = (πϑ 2 ) 4 i =j wiwj 2 i =j k =l wiwjw k w l QiQjQ k Q l t,i t,j t,k t,l − M 2 ap 2 (A1) Following the recipe as for the mean estimator, we first calculate the average over the source galaxies and this yields Schneider et al. (1998) : A [ t,i t,j t,k t,l ] = γt,iγt,jγ t,k γ t,l + A (s) t,i (s) t,j (s) t,k (s) t,l + σ 2 2 γt,iγt,jδ K kl + γt,iγ t,k δ K jl + γt,iγ t,l δ K jk +γt,jγ t,k δ K il + γt,jγ t,l δ K ik + γ t,k γ t,l δ K ij . (A2) Let us now work out the first term on the right-hand-side of the above expression. We see that we have the following possibilities: all indices different (i = j = k = l) → 1a ; two indices equal and three not (i = j = k = l) + perms → 1b ; (i = j = k = l) + perms → 1b two sets of indices equal; three indices equal and one not; all indices equal. • 1a → (i; j = i; k = i = j; l = i = j = k). Averaging over the source galaxy positions gives: where in the above we have defined the quantity: Ms,2 := πϑ 2 d 2 θ1Q 2 (θ1)γ 2 t (θ1). We note that there are three identical contributions of this type that arise from the terms where i = k, j = l, and j = k. Hence, on averaging over the shear-field realisations and putting back the factors we see that the sum of these terms becomes: In the following we show aperture maps for the remaining CFHTLenS fields. The row are identically structured as in Figs 3 and 4, but we only show the signal-to-noise map in the left column and the inverse variance weight map in the right column. 3.5 Figure C3 . Signal-to-noise and aperture weight map of the W4 field of the CFHTLenS data. Modern cosmology Gravitational Lensing: Strong, Weak Let us call these two terms 2a and 2b . On performing the average over galaxy positions the first term becomes:where in the above we have defined the quantity G := πϑ 2 d 2 θ1Q 2 (θ1). The second term 2b is identical and so gives us a factor of 2. Finally the expectation over the ensemble of realisations of the shear field, with the normalisation factors restored, gives us:Turning now to the third term on the right-hand-side of Eq. (A2) we see that when summing over allowed indices the first and last terms in the square bracket will not contribute. Let us label the remaining terms four terms 3a -3d . On averaging over the source galaxy positions In Figure B1 we show the survey masks for the W1, W2, W3 and W4 fields of the CFHTLenS. This figure clearly illustrates the problem with incomplete sky coverage due to the survey boundaries, the holes drilled due to the diffraction effects of bright stars and the gaps between chips. One can also notice that, while the W1, W2 and W4 fields are fairly flat projections, the W3 field clearly suffers more from the effects of the curved geometry of the sky.