Unix Pipes: pipe_asdf#
pipe_asdf
is a Python script to unpack Abacus ASDF files (such as
halo catalog or particle data) and write them out via a Unix pipe (stdout).
The intention is to provide a simple way for C, C++, Fortran, etc,
codes to read ASDF files while letting Python handle the details
of the file formats, compression, and other things Python does well.
Usage#
pipe_asdf [-h] [-f FIELD] [--nthread NTHREAD] asdf-file [asdf-file ...] | ./client
positional arguments#
- asdf-file
An ASDF file. Multiple may be specified.
optional arguments#
- -h, --help
show this help message and exit
- -f FIELD, --field FIELD
A field/column to pipe. Multiple -f flags are allowed, in which case fields will be piped in the order they are specified. (default: None)
- --nthread NTHREAD
Number of blosc decompression threads (when applicable). For AbacusSummit, use 1 to 4. (default: 4)
Binary Format of Piped Data#
The binary format of the piped data is simple:
an 8-byte int indicating the number of data values
a 4-byte int indicating the width of the primitive data type that composes the data (e.g. 4 for float, 8 for double). Largely provided as a sanity check.
the data, consisting of a number of bytes equal to the product of the preceeding ints
Repeat from (1) for all fields requested
So the expected pattern for the client code is to read the int64 and int32, take the product, allocate that many bytes, then read the data into that allocation.
When passing multiple files, a single column will be read from all files before moving to the next column. In other words, the client sees the concatenated data.
From a performance perspective, the pipe operation probably amounts to a memcpy. So a small performance hit, but likely vanishingly small compared to the actual IO and analysis.
Ultimately, this pipe scheme is not a replacement for direct access to the files, but it may be helpful for applications with simple data access patterns.
Entry Points#
Technically, pipe_asdf
is a “console script” alias provided by setuptools to invoke
the abacusnbody.data.pipe_asdf
module as a script. This alias is
usually installed in a user’s PATH environment variable when installing
abacusutils via pip, but if not, one could equivalently invoke the
script with:
$ python3 -m abacusnbody.data.pipe_asdf
The abacusnbody/pipe_asdf
directory also contains a symlink to this
file, so from this directory one can also run
$ ./pipe_asdf.py
To-do#
Add a “-k/–key” flag to read header fields. Decide on a wire protocol.
Add CompaSOHaloCatalog hooks to pipe the unpacked data (?)
Python API#
Example C Client Code#
An example C program called client.c
that receives data over a pipe
is given in the abacusutils/pipe_asdf directory.
From this directory, one can build the client
program by running
$ make
and run it with:
$ ./pipe_asdf.py halo_info_000.asdf -f N -f x_com | ./client
You can use the example halo_info_000.asdf
file symlinked in the pipe_asdf
directory to test this.
This program is a stand-in for an analysis code. In this case, it just reads the raw
binary data for two columns, N
and x_com
, and prints the values.