key: cord-0563011-ravrmyvt authors: Diemer, Benedikt title: Fly-bys, orbits, splashback: subhalos and the importance of the halo boundary date: 2020-07-21 journal: nan DOI: nan sha: d4978a0576ca7507f1a692478a90576dd4168e37 doc_id: 563011 cord_uid: ravrmyvt The classification of dark matter halos as isolated hosts or subhalos is critical for our understanding of structure formation and the galaxy-halo connection. Most commonly, subhalos are defined to reside inside a spherical overdensity boundary such as the virial radius. The resulting host-subhalo relations depend sensitively on the somewhat arbitrary overdensity threshold, but the impact of this dependence is rarely quantified. The recently proposed splashback radius tends to be larger and to include more subhalos than even the largest spherical overdensity boundaries. We systematically investigate the dependence of the subhalo fraction on the radius definition and show that it can vary by factors of unity between different spherical overdensity definitions. Using splashback radii can yet double the abundance of subhalos compared to the virial definition. We also quantify the abundance of flyby (or backsplash) halos, hosts that used to be subhalos in the past. We show that the majority of these objects are mislabeled satellites that are naturally classified as subhalos when we use the splashback radius. We compare our results to self-similar universes to show that the subhalo and flyby fractions are not universal with redshift and cosmology. In the ΛCDM picture of structure formation, dark matter collapses into halos under its own gravity, pulling in baryons that form a galaxy at the center of the halo (Rees & Ostriker 1977) . Even before the inception of this theory, galaxies were thought to collide with each other (e.g. Toomre & Toomre 1972; Ostriker & Tremaine 1975) . These mergers are explained by the idea of hierarchical structure formation, where small halos form first, fall into a larger host halo and become subhalos (e.g. White & Rees 1978; Bond et al. 1991; Lacey & Cole 1993) . Given that structure is roughly self-similar across size scales, the large number of satellites observed in galaxy clusters implies that even galactic halos must contain an abundance of substructure (Katz & White 1993; Moore et al. 1999a; Klypin et al. 1999b) . Even though subhalos are eventually disrupted and merge with their host, they can live for a substantial fraction of a Hubble time because of the long dynamical timescale of halos. The existence of subhalos complicates the picture of structure formation significantly, both observationally and theoretically. Satellite galaxies live in physically different environments with higher density and experience disruptive processes such as ram pressure stripping (Abadi et al. 1999) . Similarly, their subhalos tend to be tidally disrupted, causing a sharp decline in their mass. For many purposes, it is thus important to distinguish between host and subhalos. Theoretically, for instance, the distinction matters when computing mass functions (of host halos) or assembly bias (Villarreal et al. 2017 ). Observationally, it matters when estimating cluster masses based on their richness (the number of satellites, DES Collaboration et al. 2020) or for effects such as galactic conformity (Kauffmann et al. 2013) . The connection between observable galaxies and the dark matter Universe is often established through prescriptions such as halo occupation distributions subhalo, abundance matching, or semi-analytic models, most of which rely on a host-subhalo classification to some extent (see Wechsler & Tinker 2018 , for a review). To define a subhalo's membership in a larger host, we could, for instance, include all sub-groups in halo's friendsof-friends (FOF) group (Davis et al. 1985; Springel et al. 2001 ). However, this definition depends on an arbitrary linking length and is hard to infer from observations (More et al. 2011) . Instead, the most common definition of a subhalo is that it lives inside the spherical overdensity (SO) radius of a larger halo. These radii enclose an overdensity that is set to a fixed or varying multiple of the critical or matter density of the universe, leading to definitions such as R 200c , R vir , or R 200m (e.g., Lacey & Cole 1994) . SO definitions are easy to compute, but differences in the chosen overdensity lead to significant differences in radius and thus in the number of subhalos (see Figure 1 for a visualization). Although understood by practitioners, this difference is rarely quantified in practice because halo catalogs typically give subhalo relations according to only one definition. Moreover, there are reasons to question whether commonly used SO definitions truly capture the physical nature of halos. For instance, infalling subhalos and galaxies begin to lose mass at about two host virial radii on average, indicating that the sphere of influence of host halos is larger than suggested by R vir (Bahé et al. 2013; Behroozi et al. 2014) . Another indication that SO radii do not include the full extent of halos is provided by a large population of "isolated" halos that used to be subhalos but now orbit outside their former host's virial radius. Often labeled "backsplash halos" or "ejected satellites," the vast majority of these systems will eventually fall back onto their past host (Balogh et al. 2000; Mamon et al. 2004; Gill et al. 2005; Pimbblet 2011; Wetzel et al. 2014; Haggar et al. 2020; Knebe et al. 2020) , although there is also a (much smaller) population of genuine, temporary "flyby" events and interactions (Sales et al. 2007; Ludlow et al. 2009; Sinha & Holley-Bockelmann 2012; L'Huillier et al. 2017; An et al. 2019) . To avoid confusion, we summarily refer to all of these phenomena as "flyby." We argue that most flyby halos are misclassified subhalos, a distinction that matters because flyby halos and their galaxies may carry significant imprints Note. -The N-body simulations used in this paper. L denotes the box size in comoving units, N 3 the number of particles, m p the particle mass, and the force softening length in physical units. The simulations cover redshifts from z initial to z final , but snapshots were output only between z f−snap and z final ; the catalogs contain the first halos at z f−cat . The references correspond to Diemer et al. (2013, DKM13) , Diemer & Kravtsov (2014, DK14) , and Diemer & Kravtsov (2015, DK15) . Our system for choosing force resolutions is discussed in DK14. of their interaction with the larger host (Knebe et al. 2011; Muriel & Coenda 2014) . Moreover, flyby halos can lead to a double-counting of mergers (Xie & Gao 2015; Benson 2017) and cause spurious assembly bias signals (Sunayama et al. 2016; Villarreal et al. 2017; Mansfield & Kravtsov 2020) . To mitigate these issues, the splashback 1 radius, R sp , has been put forward as a physically motivated definition of the halo boundary (Diemer & Kravtsov 2014; Adhikari et al. 2014; More et al. 2015) . By definition, R sp corresponds to the apocenter of particles on their first orbit, which, in spherical symmetry, would include the orbits of all particles and subhalos and separate infalling from orbiting material (Fillmore & Goldreich 1984; Bertschinger 1985; Adhikari et al. 2014; Shi 2016) . In practice, measuring R sp is more difficult due to non-sphericity and interactions between halos, but it can be detected based on its accompanying sharp drop in the density field or from particle dynamics (Diemer 2017, hereafter Paper I) . Observationally, the splashback radius has been measured as a drop in the galaxy density and weak lensing signal around clusters (e.g., More et al. 2016; Chang et al. 2018) . Baxter et al. (2017) demonstrated that the drop in density is predominantly caused by red cluster galaxies, while the infalling, bluer population follows a smooth profile, supporting the notion that the splashback radius separates infalling galaxies from those that have orbited at least once (Aung et al. 2020; Tomooka et al. 2020, see Figure 2 for a visualization). While the relationship between SO and splashback radii was investigated in More et al. (2015) and Diemer et al. (2017, hereafter Paper II) , the impact of R sp on the subhalo assignment has yet to be quantified systematically (although Mansfield & Kravtsov 2020 found it to be substantial). In this paper, we quantify the abundance of subhalos and flyby halos as a function of the radius definition. We consider a number of SO and splashback definitions; the latter are computed using the Sparta code and are encapsulated in the 1 The nomenclature can be misleading. Flyby halos are often referred to as "backsplash halos," which is the original root of the term "splashback radius." However, while the former refers to subhalos that are close to the apocenter of their orbit and thus outside of the virial radius (or a similarly defined boundary), the splashback radius refers to dark matter particles as well as subhalos and aims to include all orbits by construction. We avoid the term "backsplash halos" in this paper to avoid confusion. publicly available halo catalogs and merger trees presented by Diemer (2020a, hereafter Paper III) . The paper is structured as follows. In Section 2, we describe our simulations and halo catalogs. We present our results in Section 3 and summarize our conclusions in Section 5. Throughout the paper, we follow the notation of Paper III. The underlying halo catalogs are available at benediktdiemer.com/data. In order to keep the paper focused on the effects of the radius definition, we avoid a number of important issues. First, we ignore the unphysical numerical disruption of subhalos in N-body simulations, also known as the "over-merging problem" (e.g., Carlberg 1994; van Kampen 1995; Moore et al. 1996 Moore et al. , 1999b Klypin et al. 1999a ). This issue still affects modern simulations, including those used in this work (van den Bosch et al. 2018) . Second, we do not tackle the question of how to define the mass of a subhalo; instead, we give results for a variety of definitions such as the bound-only mass, peak mass, and circular velocity. We return to this issue in Diemer & Behroozi (2020) . 2. SIMULATION DATA In this section, we briefly review our simulations, radius definitions, and our algorithm for assigning subhalos to hosts. We refer the reader to Paper III for details. Our catalogs are based on the Erebos suite of dissipationless N-body simulations, which we summarize in Table 1 . This suite includes seven simulations of a WMAP7 ΛCDM cosmology based on that of the Bolshoi simulation (Komatsu et al. 2011; Klypin et al. 2011 , Ω m = 0.27, Ω b = 0.0469, h = 0.7, σ 8 = 0.82, and n s = 0.95). The simulations span a range from 31 h −1 Mpc to 2000 h −1 Mpc in box size and allow us to investigate a large range of halo masses. The Erebos simulations also contain three boxes of a Planck-like cosmology, but the relevant results are virtually identical to those from the WMAP7 cosmology. We omit them to avoid crowding our figures and emphasize that "ΛCDM" labels refer to the WMAP7 simulations. We also consider self-similar Einstein-de Sitter universes with power-law initial power spectra of slopes −1, −1.5, −2, and −2.5 (Paper III). The initial power spectra for the ΛCDM simulations were computed by Camb (Lewis et al. Figure 1 . Impact of the halo radius definition on the abundance of subhalos. Each panel shows the same background image, the projected density in a slab around a massive halo in the L0125-WMAP7 simulation. Circles show the radii of all host (white) and subhalos (orange) that reached N 200m ≥ 500 at any point in their history. The first four panels show bound-only R 500c , R 200c , R vir , and R 200m as computed by Rockstar. The second four panels show the splashback radii corresponding to the mean, median, 75%, and 90% of the particle apocenter distribution. The splashback radius is not defined for subhalos, we replace it with R 200m,bnd . The visualization of the density field was created using the gotetra code by P. Mansfield (https://github.com/phil-mansfield/gotetra), which uses a tetrahedron-based estimate of the density field Abel et al. 2012; Hahn et al. 2013). 2000). The initial particle grids for all simulations were generated by 2LPTic (Crocce et al. 2006) , and the simulations were run with Gadget2 (Springel 2005) . All figures in this work are based on the merger trees presented in Paper III. We start from halo catalogs and merger trees created with Rockstar and Consistent-Trees (Behroozi et al. 2013a,b) . We then run the Sparta code on each simulation to measure splashback radii and other halo properties (Paper I). The Moria extension recombines the Sparta results with the Rockstar catalogs to create enhanced catalogs and merger trees in a new format (Paper III). We use three types of radius definitions: bound-only SO radii, all-particle SO radii, and splashback radii. Rockstar calculates bound-only radii by removing gravitationally unbound particles from friends-of-friends groups and subgroups in six-dimensional phase space. Given a set of bound parti-cles, SO mass definitions are computed by finding the outermost radius where the density falls below the given SO threshold. We consider four definitions, R 500c , R 200c , R vir , and R 200m , indicating density thresholds of 500 or 200 times the critical or mean density of the Universe. We compute the varying virial overdensity using the approximation of Bryan & Norman (1998) . Second, we consider all-particle SO radii computed by Sparta, which are measured the same way as the bound-only radii but without any unbinding. For the vast majority of host halos, the difference is small, but for subhalos, all-particle masses are ill-defined because they often contain large amounts of host material (Paper III). In this work, we are mostly concerned with host-subhalo relations, meaning that the exact mass of a subhalo does not matter as much as the radius of its host. Nevertheless, we will mostly rely on bound-only radii and show results for the all-particle virial radius for comparison. For brevity, we denote the bound-only and all-particle versions of a definition X as R X,bnd and R X,all . R sp,90% Figure 2 . Visualization of a merger tree based on different mass definitions. Each panel shows the history of the largest halo in TestSim100. The trajectories reflect the distance from this "root" halo in comoving units. The distance is arbitrarily cut off at a radius well inside the root halo, represented by the vertical line on the left of the plots. Each track represents a halo that becomes a subhalo of the root halo, merges into it, or merges into one of its subhalos. The color of the lines indicates the epochs when those halos are hosts (gray), subhalos (of any halo, light blue) or flyby halos (purple). The radius definition chosen to define the host-sub relations changes both the halos included in the tree and their status. Small radii such as the commonly used R 200c produce a large number of flyby halos whereas the splashback radius includes virtually all subhalos by construction. Finally, we use splashback radii computed by Sparta. The code tracks each particle in each halo as it enters for the first time and determines its first apocenter (or splashback) event. From the locations and times of these events, we compute the splashback radius of the halo by smoothing the distribution of particle splashbacks in time and taking its mean (R sp,mn ) or higher percentiles (e.g., R sp,90% for the 90th percentile). At the final snapshots of a simulation, the time-average would be biased because we are missing future particle splashbacks. Sparta corrects for this bias, but this procedure increases the scatter in the splashback results for the final few snapshots (Paper I). Thus, we will study results at z = 0.13 instead of z = 0, which does not alter our conclusions in any way. Any mass M X is understood to include the mass inside the respective radius R X , N X denotes the number of particles in M X . The peak mass M X,peak is the highest mass attained along a halo's most-massive progenitor branch. The Moria catalogs and merger trees contain all halos with N 200m,peak ≥ 200, but we will often apply stricter limits to avoid certain selection effects. Finally, we consider two alternative ways to quantify the relative masses of halos. First, we use the maximum circular velocity, V max , as computed by Rockstar. Second, when comparing halos across redshifts and cosmologies, we express their masses as peak height, ν X . Peak height captures the statistical significance of halos, namely, whether they are rare or common with respect to the overall density field. It is formally defined as ν X = δ c /σ(M X ), where δ c = 1.686 is the threshold overdensity in the top-hat collapse model (Gunn & Gott 1972) and σ(M X ) is the variance of the linear power spectrum on the Lagrangian scale of a halo. This scale corresponds to the comoving radius that encloses the mass M X at the mean density of the Universe (see Paper III or Diemer 2020b for details). We compute peak heights with the Colossus code (Diemer 2018) , using the transfer function of Eisenstein & Hu (1998) to approximate the power spectrum. One of the main innovations of our Moria catalogs is that they contain separate host-subhalo relations for each definition. To compute these relations, Moria orders the list of all halos at a snapshot by V max to avoid making reference to any particular mass definition. Starting with the highest-V max halo, the code searches for all halo centers within its radius in the given definition and assigns them the host's ID as a parent. We then continue with the second-highest V max and so on. If a subhalo already has a host, we do not replace that host's ID. This procedure exactly reproduces the parent assignments of Consistent-Trees if the same radius definition is used. Figure 1 shows a visualization of the host-subhalo assignments for different mass definitions. The details of the percolation algorithm have some impact on the results (García & Rozo 2019), but these differences are not the subject of this paper and are small compared to the changes caused by varying the size of the halo radius. In this section, we quantify the impact of the radius definition on the subhalo and flyby fractions in ΛCDM (Sections 3.1 and 3.2) and compare them across cosmologies to establish whether they follow a universal form (Section 3.3). We define the subhalo fraction, f sub , as the fraction of all halos with a given mass (or similar characteristic) that are subhalos. At first sight, f sub should be easy to measure by counting the number of halos that do and do not have a parent at fixed mass. The difficulty is to define a halo property that can be meaningfully measured for both host and subhalos. As briefly discussed in Section 2.2 (and at length in Paper III), all-particle SO masses are ill-defined for subhalos, but we can use bound-only masses, peak masses, or V max . Since each choice leads to different subhalo fractions, Figure 3 shows f sub based on all three quantities. First, we consider the current M 200m,bnd of hosts and subhalos and compute f sub as follows. We split the mass range shown in the left column of Figure 3 into 20 bins. We apply a lower limit on the number of particles per halo to avoid poorly resolved halos and regions of parameter space where the catalogs may be incomplete (Paper III). For a given simulation with a given mass resolution, we require that the entire mass range in a bin must be resolved with more than N 200m ≥ 500 particles, otherwise we omit the bin. For the SO masses with thresholds other than R 200m , we convert the bin edges' masses to that definition assuming an NFW profile (Navarro et al. 1997 ) and the Diemer & Joyce (2019) mass-concentration relation. This procedure ensures a more or less uniform cut for all definitions. We now add the n halos from all ΛCDM simulations that contribute to a given bin (Table 1) , compute the subhalo fraction f sub , and estimate the fractional uncertainty in each bin from the binomial formula, We find that the simulations are mostly converged for the lower limit of 500 particles per halo, that is, that the fractions measured in individual simulations agree at fixed mass. However, slight non-convergence effects are visible as jagged lines in Figure 3 . We do not show the individual simulations in Figure 3 to avoid crowding the plots. We give more detail on similar techniques to combine simulation results in Paper III and Diemer (2020b) . In the smaller bottom panels of Figure 3 , we compare the subhalo fractions in each definition to the commonly used R vir,bnd . The uncertainties in the comparison panels are obtained by adding the fractional errors of the respective definitions in quadrature. Overall, the subhalo fractions monotonically decrease from between 6% and 32% at the low-mass end to zero at the highest masses. The differences between the mass definitions are striking: compared to R vir , high-threshold SO definitions such as R 200c reduces the subhalo fraction by up to 50%, R 200m increases it by about 20%, and the splashback definitions increase it by 20-40% for the mean and 50-120% for the 90th percentile. At z = 2, R 200c , R vir , and R 200m have become almost indistinguishable because they approach the same overdensity threshold when Ω m (z) ≈ 1. R 500c still reduces the subhalo fraction by a large factor, although slightly less than at z = 0. The splashback definitions lead to somewhat smaller increases in the subhalo fraction because the higher accretion rates at high redshift mean that they shrink compared to R 200m (Paper II). Using R vir,all instead of R vir,bnd makes a relatively small difference, up to 10% at the low-mass end. The differences for other SO thresholds are similar so that our conclusions for bound-only definitions basically apply to their all-particle counterparts as well. This result is congruent with Diemer (2020b) who show that the mass function of all-particle and bound-only SO masses are similar. At this point, we pause to consider the meaning of our comparison at a fixed, bound-only SO mass. After infall, subhalos lose mass whereas host halos of the same initial mass keep growing. As a result, subhalos shift left in Figure 3 . This is a sensible outcome in terms of halo mass but may not reflect the evolution of galaxies, which are thought to retain (and perhaps even slightly grow) their stellar mass for some time after infall. To mimic a selection at fixed galaxy mass, we now consider the peak mass of each halo (middle column in Figure 3 ). The peak is typically attained shortly before infall for subhalos (Behroozi et al. 2014) and commonly used in studies of the galaxy-halo connection (e.g., Guo et al. 2010; Reddick et al. 2013) . The halo selection is somewhat trickier with M peak because halos are biased to be hosts near the cut-off of our catalogs at N 200m,peak ≥ 200. This selection effect occurs due to the mass evolution logic discussed above: if two halos have the same mass close to the threshold and one becomes a subhalo, that subhalo is more likely to narrowly miss the catalog cut in the future. To restore convergence between simulations with different resolution, we increase our threshold to 1000 particles in M 200m,peak . We enforce the cut the same way as for M 200m,bnd , namely by applying it directly to the splashback definitions and converting it to the respective SO definitions. As expected, the subhalo fraction at fixed M peak increases compared to M bnd because subhalos that have lost mass are now shifted into the higher mass bin they had once attained. All definitions shift more or less in unison, the amplitude of the ratio to f sub in R vir,bnd is slightly reduced for all definitions, masses, and redshifts. Finally, in the right column of Figure 3 , we compare the subhalo fractions at fixed V max , another quantity that is commonly used to link halos to galaxies (e.g., Behroozi et al. 2019) . V max contains unique information because it measures the potential within a radius much smaller than R 200m , where subhalos can more easily shield their mass from tidal disruption. We now encounter a different selection effect: the V max of host halos is tightly correlated with their mass, meaning . Subhalo fraction in ΛCDM according to different mass definitions as a function of the current bound-only halo mass (left), the peak mass (center), and V max (right). These definitions cause significant shifts in the inferred subhalo fractions because subhalos tend to lose mass after infall whereas host halos keep growing. The top row shows the relations at z ≈ 0 (left), the bottom row at z = 2. The smaller bottom panels show the fractional difference to the commonly used R vir,bnd definition. The shaded areas highlight the statistical uncertainty. The dashed line refers to R vir,all , which is similar to R vir,bnd ; we omit the all-particle counterparts of the other SO definitions. At fixed V max and z = 0, using R 500c leads to about 60% fewer subhalos at the low-mass end, R 200m to about 20% more. The three representative splashback definitions lead to between 30% and 80% higher subhalo fractions at the low-mass end and up to 50% at cluster masses. All differences are reduced at higher redshift, mostly reflecting the overall shift to lower masses. For the most massive halos at z ≥ 2, the average mass accretion rates are so high that the smaller splashback radii such as R sp,mn lie inside R vir on average. The virial, 200c, and 200m definitions become indistinguishable at high redshift. These results highlight the degree to which the definition of the halo boundary affects our understanding of substructure. that our catalog cut in M 200m,peak selects a well-defined range of V max . For subhalos, however, V max does decrease somewhat as they lose mass, leading to the lowest V max bins being entirely dominated by subhalos. Again, we find that a cut of 1000 particles (enforced the same way as for M 200m,peak ) removes this selection effect. The subhalo fractions at fixed V max are similar to those at fixed M 200m,peak . In summary, the subhalo fraction depends dramatically on the radius definition, highlighting that commonly used choices such as the "virial" radius are by no means unique. The subhalo fraction also depends on whether we compare halos at fixed bound mass, peak mass, or V max . At z = 0, f sub ranges from 6% to 45% at the low-mass end depending on the radius definition and halo selection, but this range would change if we could probe smaller halo masses. Figure 4 . Same as Figure 3 but for the flyby fraction, which also depends strongly on the radius definition. Conversely though, small-radius definitions such as R 500c (dark blue) produce many more flyby halos than large-radius definitions. The bottom panels compare all definitions to R 500c (excluding masses where the fraction approaches zero). See Section 3.2 for a detailed discussion. Before we measure the flyby fraction, f flyby , we should contemplate the definition of a flyby halo and how we expect it to be affected by the halo boundary definition. We define f flyby as the fraction of all host halos that were a subhalo at any time in the past. This set of halos will be composed of two distinct sub-populations: halos that had a close encounter with another, larger halo but genuinely escaped from its sphere of influence and subhalos whose orbits have temporarily taken them outside the host halo radius. The former population should account for a small fraction of all halos and should increase with increasing halo radius (because the smaller halo is more likely to enter inside the larger halo's radius). The second population, often called "backsplash" halos, are orbiting their host and will eventually fall into it. Figure 2 visually demonstrates that we expect a large fraction of all subhalos to experience this type of spurious flyby event at some point if the host halo radius is small compared to the splashback radius. We expect that this population will shrink as the halo radii get larger because they will include more and more of the subhalo orbits. Given the opposite trends of genuine and spurious flyby events, the evolution of the flyby fraction with radius definition will tell us which population dominates. At face value, the flyby fraction is easy to define: the fraction of all host halos that were a subhalo at any point along their main branch progenitor history (according to a given radius definition). In practice, applying this definition to merger trees created by Rockstar and Consistent-Trees (or most other halo finder and tree algorithms) leads to erratic results due to spurious, temporary subhalo periods. In major mergers, for example, the host-subhalo relation can be ambiguous and switch between two halos, after which point the eventual host halo would be classified as a flyby halo. We follow the strategy of Mansfield & Kravtsov (2020) to eliminate such cases: we do not count a halo as a flyby if its former host is no longer alive, if the former host is a subhalo of the halo in question, or if the former host's mass is now smaller than that of the halo in question (all defined given the same radius and mass definition). In some cases, particularly for subhalo epochs at high redshift, the host may not be part of the merger trees because it never exceeded the necessary mass threshold mass. We also discard such events because the peak host mass was clearly smaller than the current mass of the halo in question. We emphasize that this definition of what constitutes a flyby halo is not unique. For example, we could consider future epochs to establish whether a flyby halo will eventually fall into its former host. Similarly, omitting any one of our exclusion criteria causes noticeable changes in the flyby fraction, raising the suspicion that f flyby is not a particularly well-defined quantity. Nevertheless, the relative differences in flyby fractions according to different radius definitions do remain similar, which is the focus of our work. We compute f flyby and its uncertainty in the same way as the subhalo fraction (Section 3.1). As we are dealing with host halos (at the current epoch), we can safely lower our particle number threshold to 200 and apply it separately to each definition. With this limit, the different ΛCDM simulations agree at fixed mass. However, we increase the threshold to 350 particles when comparing at fixed M 200m,peak and at fixed V max because otherwise the relation between the cutoff mass and V max leads to slight non-convergence at the low-mass end. Figure 4 shows the flyby fraction in the WMAP7 cosmology at z ≈ 0 and z = 2. Regardless of the radius definition, mass variable, or redshift, f flyby asymptotes to zero at the highest masses and increases towards low masses. At the smallest masses we can probe, f flyby is still increasing so that we cannot put an upper bound on it. At fixed M 200m,bnd , f flyby varies between about 7% and 12% at the low-mass end. While this range sounds relatively modest, the relative fractions differ substantially between mass definitions, especially at intermediate masses. For instance, at M ≈ 10 12 h −1 M , using R sp,90% leads to only 15% of the flyby halos found when using R 500c . When plotted as a function of peak mass, the fractions are shifted to higher values (middle column of Figure 4) . In reverse, the shift means that flyby halos are more likely to have a high ratio of peak to current mass, meaning that they have lost mass at some point along their trajectory, which makes sense given that they had an encounter with a larger halo. Finally, the right column of Figure 4 shows the same results as a function of V max . In all cases, the relative differences between the radius definitions are similar. At higher redshift (bottom row), all curves are shifted to lower masses. This shift is not entirely captured by the overall growth of the halo population, as we discuss in the next section. Given that larger halo radii significantly reduce the flyby fraction, we conclude that the majority of flyby halos are, indeed, "backsplash" halos that should be classified as subhalos (in agreement with Mansfield & Kravtsov 2020) . By definition, R sp,90% should include at least 90% of all subhalo orbits; in practice, it includes an even higher fraction because subhalos suffer from dynamical friction which shrinks their or- bits (Chandrasekhar 1943; Adhikari et al. 2016) . In summary, splashback definitions produce significantly reduced numbers of flyby halos, which is a desirable feature (as discussed in Sections 1 and 4). So far, we have compared subhalo and flyby fractions at fixed mass or V max , but Figures 3 and 4 do not allow for a fair comparison between cosmologies and redshifts because halo masses grow at different overall rates in different cosmologies. To facilitate such a comparison, we now express masses as peak height (as defined in Section 2.2). Figure 5 shows the subhalo and flyby fraction as a function of peak height. We compute the fractions and ν based on the M 200m,bnd definition, but other definitions lead to the same qualitative conclusions. The dashed blue lines show the WMAP7 cosmology at z = 0 and z = 2; they correspond to the orange lines in the left column of Figure 3 . The redshifts are clearly offset even in peak height space: we observe both more subhalos and flyby halos at higher redshift. This clear trend means that f sub and f flyby are not universal as a function of peak height, at least not in the way we have defined them. This non-universality is in contrast to, say, the halo mass function, which is approximately constant with redshift at fixed ν (Diemer 2020b) . The Planck cosmology gives almost identical results. The solid lines in Figure 5 show results for four self-similar universes with different slopes of the power spectrum, n = d ln P/d ln k. In those simulations, all redshifts give the same results (at fixed ν) and have been combined into one curve per simulation (see Diemer 2020b for details on the chosen redshifts and the procedure). The self-similar results also follow a clear trend whereby shallower n lead to lower subhalo and flyby fractions. This conclusion may be counter-intuitive since a shallower power spectrum means that, at a given scale, there is more substructure to be accreted into halos. However, this logic is reversed here: at a fixed peak height, there are more larger halos in cosmologies with a steeper power spectrum slope, leading to a higher subhalo fraction. Given the trends in z and n, we might try to apply our findings from the self-similar cosmologies to ΛCDM, where the power spectrum slope is about −2 for the largest halos at z = 0 and approaches −3 for small halos at high redshift (e.g., . The two trends seem compatible (more subhalos at steeper slopes and higher redshifts in ΛCDM), but Figure 5 makes it clear that they are not analogous in detail. For example, we would expect the z = 0 lines to lie between n = −2 and −2.5 while they are closer to −1.5 at high peak heights. We conclude that the subhalo and flyby fractions, at least as defined in this work, are not universal quantities: they significantly vary with redshift and cosmology at fixed peak height. Given this non-universality, we do not attempt to construct a fitting function for them because such an approximation would be valid only over a limited range of redshift, mass, and cosmological parameters. It is likely that the non-universality is, at least in part, caused by numerical issues with our simulations and halo finding. We have demonstrated that the definition of the halo boundary has a dramatic impact on the distinction between hosts and subhalos. In this section, we discuss some potential consequences for our understanding of large-scale structure, for the interpretation of certain observations, and for models of the galaxy-halo connection. We also highlight a number of intriguing theoretical questions. Observationally, the issue of where to draw the halo boundary is perhaps most apparent at the transition between the collapsed matter inside halos and the large-scale structure around them (the so-called "1-halo" and "2-halo" terms, e.g., Hayashi & White 2008) . This transition manifests itself as a break in the overall density profile and as a resulting dip in the lensing signal (e.g., Leauthaud et al. 2011; Oguri & Hamana 2011; Tully 2015; Tomooka et al. 2020) . If this region is interpreted based on too small a halo boundary, one might, for example, conclude that some of the 2-halo signal is due to flyby halos (Sunayama et al. 2016) . Another observable that might be impacted by the host-subhalo (or central-satellite) assignment is the total stellar mass within a halo (e.g., Lin & Mohr 2004; Gonzalez et al. 2007; Leauthaud et al. 2012) . This statistic has recently received renewed attention due to its tight connection to halo mass (Tinker et al. 2019; Bradshaw et al. 2020; Huang et al. 2020; DeMaio et al. 2020) . The scatter in this relation should be smallest if all satellites within a group or cluster are considered, not some subset inside a smaller halo boundary. Theoretically, the 1-halo and 2-halo clustering regimes are generally understood based on the so-called "halo model," which posits that all matter resides in halos (e.g., Ma & Fry 2000; Seljak 2000; Zentner et al. 2005 ). On small scales, the clustering follows the halo density profile; on large scales, it follows the linear correlation function times some halo bias (Cole & Kaiser 1989) . The definition of the halo boundary thus matters for the halo model's predictions and its interpretation. This interplay was recently investigated by Garcia et al. (2020) , who left the halo radius as a free parameter and found that a large radius, possibly larger than R sp,90% , provides the best fit to the clustering in simulations. This intriguing result hints at the possibility of constructing a halo model based on the splashback radius. When adding galaxies to our modeling of large-scale structure, adopting a splashback boundary could affect the results via the host-subhalo distinction but also via systematic changes in host masses, for example due to environmentdependent mass accretion rates. Given the subject of this paper, we focus on the former effect. We consider three popular techniques to infer the galaxy-halo connection (Wechsler & Tinker 2018) . First, subhalo abundance matching (SHAM) assigns galaxies to halos by matching rank-ordered lists of a galaxy property, such as stellar mass, and a halo property, such as halo mass (e.g. Kravtsov et al. 2004; Vale & Ostriker 2004; Conroy et al. 2006) . While the SHAM assignment does not necessarily distinguish between hosts and subhalos, the results are sometimes validated against observed satellite fractions based on a group finder (e.g., Yang et al. 2005; Tinker et al. 2011; Reddick et al. 2013; Lehmann et al. 2017 ). However, the differences in f sub due to the halo boundary can be larger than those due to the physics included in the SHAM model, such as the halo property (M peak , V max etc.) or the scatter in the stellar mass-halo mass relation (Behroozi et al. 2010; Reddick et al. 2013) . Thus, the conclusions drawn from comparisons to observed satellite fractions might change depending on the definition of the halo boundary. Second, in a halo occupation distribution (HOD) analysis, we assign one central and any number of satellite galaxies to a halo based on its mass (e.g., Peacock & Smith 2000; Seljak 2000; Berlind & Weinberg 2002; Cooray & Sheth 2002) or other halo properties (e.g., Hearin et al. 2016 ). This assignment is not sensitive to changes in the abundance of subhalos in the halo catalogs, but a larger halo boundary would mean removing hosts that are, by construction, strongly clustered around other halos. The free parameters of the HOD would readjust to match the observations (e.g., clustering signals), possibly leading to a different physical interpretation of the results. Third, semianalytical models (SAMs) constitute simplified descriptions of the sophisticated processes of galaxy formation that are applied to simulated merger trees (Kauffmann et al. 1993; Cole et al. 1994; Somerville & Primack 1999; Benson 2012; Croton et al. 2016; Lagos et al. 2018) . The impact of the hostsubhalo assignment will depend on whether a specific model treats subhalos differently from hosts, for example by explicitly modeling satellite stripping and disruption (e.g., Guo et al. 2011; Stevens et al. 2016) . In summary, we expect the halo boundary definition to have some impact on most types of galaxy-halo modeling, both due to changed host masses and due to the host-subhalo assignment; the importance of these effects will need to be quantified model by model. We have left a number of theoretical and numerical issues for future investigations. For example, Villarreal et al. (2017) showed that assembly bias (Gao et al. 2005) can be mitigated by choosing large halo boundaries, an effect that can now be quantified for splashback radii. On the other hand, the question of assembly bias also highlights a big caveat: our results are derived from spherical halo radii, whether splashback or SO. Recently, Mansfield & Kravtsov (2020) showed that nonsphericity leads to significant differences in the subhalo assignment and thus in the assembly bias signal. Another somewhat unsatisfying aspect of our results is that we have not found any universality of f sub or f flyby with redshift or cosmology (Section 3.3), which is surprising given the approximate self-similarity of structure in ΛCDM universes. A first step towards a universal description could be to understand the low-mass behavior of the subhalo fraction. The asymptotic value of f sub may depend on the smallest possible halo mass and thus on the cutoff scale of the power spectrum (e.g. due to warm dark matter). If this cutoff allowed for, say, earth-mass halos (e.g., Diemand et al. 2005) , it is conceivable that the vast majority of the smallest halos would be subhalos. Testing this hypothesis may demand simulations with an unprecedented dynamic range (e.g., Wang et al. 2019) . Finally, we have made no attempt to correct for numerical issues that lead to unphysical subhalo disruption, for example by tracking undetectable subhalos based on a subset of their constituent particles ("orphans" or "cores", e.g., Wang et al. 2006; Heitmann et al. 2019 ). We will return to this issue in Diemer & Behroozi (2020) , where we track so-called "ghost" subhalos and propose a new definition of their mass. We have systematically investigated the fraction of halos that are a subhalo or flyby halo based on both spherical overdensity and splashback definitions of the halo boundary. We find that both fractions depend strongly on the radius definition. Our main conclusions are as follows. 1. The subhalo fraction depends on the chosen definition of subhalo mass, with lower subhalo fractions at fixed bound-only mass than at fixed peak mass or V max . 2. Compared to the commonly used R vir definition, defining subhalos via R 500c leads to up to 60% fewer subhalos while using splashback radii leads to between 50% and 100% more subhalos at the low-mass end. The differences are slightly smaller at higher redshift but generally persist across cosmic time and cosmology. 3. The flyby fraction follows the opposite trend, where larger radii lead to fewer flyby halos. This trend demonstrates that the vast majority of flyby halos are "backsplash" satellites that should be classified as subhalos. A subhalo assignment based on the splashback radius largely eliminates this issue. 4. Neither the subhalo nor the flyby fractions are universal, meaning that they vary with redshift and cosmology at fixed peak height. It is not clear whether these trends are caused by a fundamental non-universality of structure formation, by our definition of subhalos and flybys, or by numerical issues. We have left numerous open questions for future work, particularly regarding the impact of the radius definition on the galaxy-halo connection. Our catalogs and merger trees are publicly available at benediktdiemer.com/data; we hope that this paper provides motivation for further investigations. I am grateful to Han Aung, Peter Behroozi, Joe DeRose, Philip Mansfield, Surhud More, Daisuke Nagai, and Enia Xhakaj for productive discussions and feedback on a draft. I am especially thankful to Andrew Hearin for his feedback and creative input. This research made extensive use of the python packages NumPy (Oliphant 2006) , SciPy (Virtanen et al. 2019), Matplotlib (Hunter 2007) , and Colossus (Diemer 2018) . Support for Program number HST-HF2-51406.001-A was provided by NASA through a grant from the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Incorporated, under NASA contract NAS5-26555. This work was partially completed during the Coronavirus lockdown and would not have been possible without the essential workers who did not enjoy the privilege of working from the safety of their homes. All computations were run on the Midway computing cluster provided by the University of Chicago Research Computing Center. This research made extensive use of the python packages NumPy (Oliphant 2006) , SciPy (Virtanen et al. 2019), Matplotlib (Hunter 2007) , and Colossus (Diemer 2018) . This research was supported in part by the National Science Foundation under Grant No. NSF PHY-1748958. Support for Program number HST-HF2-51406.001-A was provided by NASA through a grant from the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Incorporated, under NASA contract NAS5-26555. 35 -. 2020a, arXiv e-prints Universal at last? The splashback mass function of dark matter halos Haunted halos: keeping simulated subhalos alive as ghosts Guide to NumPy