Tracking Subsample Particles#

CompaSO subsample particles can be tracked across output redshifts by particle ID (PID). This has many uses, one of which is building merger trees. Many users will want to use the pre-built merger trees for their halo-tracking needs, rather than deal with subsample particles directly. However, some applications might need to track particles at a granularity that the pre-built merger trees do not expose. This tutorial addresses those cases.

For example, beyond just knowing that halo A merged into halo B, one might be interested in knowing where halo A’s subsample particles ended up post-merger.

CompaSO has a notion of “tagged particles” that can help with this. CompaSO tags a bit field in the particle aux structure when a particle is part of an L2 halo (i.e. a halo core). The idea is that these particles can be identified in future outputs as a proxy for where that halo core ended up, even if it is absorbed by a larger halo or disrupted. In practice, most halo subsample particles end up being tagged relatively quickly, so one might not care about the distinction of tagged vs not tagged, unless one is tracking halos from an early time.

Let’s work through an example of identifying the subsample particles of a high-redshift halo and finding those particles again at a lower redshift.

As a reminder, “subsample” particles in CompaSO are divided into two sets: “A” (3% of all particles) and “B” (7% of all particles). The union of the A and B set is a uniform 10% subsample of all particles. Membership in the A and B sets is determined by a hash of the PIDs and is thus consistent across time.

import numpy as np
from abacusnbody.data.compaso_halo_catalog import CompaSOHaloCatalog

Load some small (\(N=250\)) halos from \(z=2\), along with their subsampled particles:

cat = CompaSOHaloCatalog(
    '/mnt/home/lgarrison/ceph/AbacusSummit/AbacusSummit_base_c000_ph000/halos/z2.000/halo_info/halo_info_000.asdf',
    fields='all',
    subsamples={'A': True, 'pid': True, 'pos': True},
    unpack_bits=['pid', 'tagged'],
    filter_func=lambda h: h['N'] == 250,
)

Looking at the subsample particles, we see that we have a column for whether the particle is tagged (i.e. was ever part of a L2 halo):

cat.subsamples

Table length=37300

pos	pid	tagged
float32[3]	int64	uint8
-999.15 .. -953.86	704379748365	1
-999.228 .. -953.896	708674846732	1
-999.258 .. -953.892	704379748363	1
-999.284 .. -953.87	717264650253	1
-999.234 .. -953.954	700084715532	1
-999.216 .. -953.91	708674715659	1
-999.224 .. -953.888	712969617420	1
-998.858 .. 727.898	25598014397177	0
-999.64 .. -773.606	3354390167555	1
...	...	...
-941.49 .. -371.676	9294753366225	1
-941.43 .. -371.566	9286163366095	0
-941.812 .. -293.368	10471577223381	1
-941.882 .. -293.462	10493051994327	1
-941.75 .. -293.356	10480167289042	1
-941.754 .. -293.294	10480167223506	1
-941.75 .. -293.354	10484461994194	1
-941.904 .. -293.298	10467282190548	0
-941.826 .. -293.47	10471577354453	0
-941.796 .. -293.294	10471577092311	1

As mentioned before, most particles are tagged.

Now let’s pick out the subsample particles belonging to a particular halo (selected by hand to be a good example for this tutorial):

h0 = cat.halos[3]
s0 = cat.subsamples[h0['npstartA'] : h0['npstartA'] + h0['npoutA']]
s0

Table length=7

pos	pid	tagged
float32[3]	int64	uint8
-999.018 .. 345.814	19937277640704	1
-999.08 .. 345.792	19941572614910	1
-999.146 .. 345.756	19945867902976	1
-999.188 .. 345.8	19941572739074	1
-999.198 .. 345.814	19941572673539	1
-999.052 .. 345.708	19954457254654	0
-999.258 .. 345.732	19950162673665	1

Since most of the particles are tagged, there’s not much difference in tracking all vs tagged particles. For that reason, we’ll just track all particles. But one could filter to just tagged particles if desired. At higher redshifts, we’d expect a larger difference, and if one were processing all redshifts, one could establish “seen” vs “not yet seen” tagged particles. But that’s out of scope for this tutorial.

Now let’s load the next primary[1] redshift catalog and find these particles:

cat_next = CompaSOHaloCatalog(
    '/mnt/home/lgarrison/ceph/AbacusSummit/AbacusSummit_base_c000_ph000/halos/z1.700/halo_info/halo_info_000.asdf',
    fields='all',
    subsamples={'A': True, 'pid': True, 'pos': True},
    unpack_bits=['pid', 'tagged'],
)

We want to search the subsample PIDs of this catalog to figure out where the \(z=2\) halo ended up. To make the search efficient, we’ll sort by PID. This would be especially important if we were searching for many halos, not just one.

To make it easier to associate subsamples back to halos, we’ll add a halo row index column to the subsamples.

Note that the particles we’re looking for may not be in any halo! Even if they are in a halo, they may not be in this superslab (halo_info_000.asdf). A more robust solution would also look in the field particle files, and look in the neighboring superslabs.

And note that if we didn’t need the association back to halos (i.e. we just wanted to know where the halo particles ended up, not what halo it belonged to now), we could operate on the subsample particle files directly, without going through the CompaSOHaloCatalog interface for cat_next.

# technical detail: this works because the compaso loader has already eliminated L0-but-not-L1 subsample particles
cat_next.subsamples['halo_row_id'] = np.repeat(
    np.arange(len(cat_next.halos)), cat_next.halos['npoutA']
)

subsamples_sorted = cat_next.subsamples.copy()
subsamples_sorted.sort('pid')

s0_sorted = s0.copy()
s0_sorted.sort('pid')

import numba as nb

# Technical note: for small B, a binary search would probably be faster.
# But if B gets large enough such that most values in A are visited,
# visiting them in order (i.e. zipper search) is probably faster.


@nb.njit
def zipper_search(A, B):
    """Search for elements of B in A, assuming both A and B are sorted."""
    indices = np.empty(len(B), dtype=np.int64)

    i = 0  # index for A

    for j in range(len(B)):
        while i < len(A) and A[i] < B[j]:
            i += 1

        if i < len(A) and A[i] == B[j]:
            indices[j] = i
        else:
            indices[j] = -1

    return indices

idx = zipper_search(subsamples_sorted['pid'], s0_sorted['pid'])
idx

array([32948273, 32955212, 32955213, 32955215, 32962347, 32969208,
             -1])

subsamples_sorted[idx[idx != -1]]

Table length=6

pos	pid	tagged	halo_row_id
float32[3]	int64	uint8	int64
-998.996 .. 346.136	19937277640704	1	18003
-998.956 .. 346.12	19941572614910	1	18003
-998.95 .. 346.112	19941572673539	1	18003
-998.906 .. 346.156	19941572739074	1	18003
-998.914 .. 346.178	19945867902976	1	18003
-998.934 .. 346.146	19950162673665	1	18003

s0_sorted[idx != -1]

Table length=6

pos	pid	tagged
float32[3]	int64	uint8
-999.018 .. 345.814	19937277640704	1
-999.08 .. 345.792	19941572614910	1
-999.198 .. 345.814	19941572673539	1
-999.188 .. 345.8	19941572739074	1
-999.146 .. 345.756	19945867902976	1
-999.258 .. 345.732	19950162673665	1

So we see that all but 1 particle was found[2]. And all of those particles ended up in the same halo. Let’s check the halo’s mass:

cat_next.halos[subsamples_sorted['halo_row_id'][idx[idx != -1]][0]]

np.uint32(541)

We can also look at the individual particle displacements (in Mpc/h):

np.asarray(subsamples_sorted['pos'][idx[idx != -1]] - s0_sorted['pos'][idx != -1])

array([[ 0.02203369, -0.18597412,  0.32199097],
       [ 0.12402344, -0.1619873 ,  0.32800293],
       [ 0.24798584, -0.14196777,  0.29800415],
       [ 0.28198242, -0.06799316,  0.35601807],
       [ 0.23199463,  0.01397705,  0.42199707],
       [ 0.3239746 , -0.20599365,  0.41400146]], dtype=float32)

And verify that it (roughly) matches the displacement of the halo:

(
    cat_next.halos[subsamples_sorted['halo_row_id'][idx[idx != -1]][0]]['x_L2com']
    - h0['x_L2com']
)

array([ 0.17645264, -0.16577148,  0.3140564 ], dtype=float32)

To establish something like a “subhalo” catalog, one might be interested in knowing whether the halo that these particles ended up in is the “same” halo that they started in or the result of a major merger. One could probably make a decent guess based on relative masses, but a more robust solution would likely use the merger trees in conjunction with this particle tracking.