Unix Pipes: pipe_asdf¶
pipe_asdf is a Python script to unpack Abacus ASDF files (such as
halo catalog or particle data) and write them out via a Unix pipe (stdout).
The intention is to provide a simple way for C, C++, Fortran, etc,
codes to read ASDF files while letting Python handle the details
of the file formats, compression, and other things Python does well.
pipe_asdf [-h] [-f FIELD] [--nthread NTHREAD] asdf-file [asdf-file ...] | ./client
An ASDF file. Multiple may be specified.
- -h, --help
show this help message and exit
- -f FIELD, --field FIELD
A field/column to pipe. Multiple -f flags are allowed, in which case fields will be piped in the order they are specified. (default: None)
- --nthread NTHREAD
Number of blosc decompression threads (when applicable). For AbacusSummit, use 1 to 4. (default: 4)
Binary Format of Piped Data¶
The binary format of the piped data is simple:
an 8-byte int indicating the number of data values
a 4-byte int indicating the width of the primitive data type that composes the data (e.g. 4 for float, 8 for double). Largely provided as a sanity check.
the data, consisting of a number of bytes equal to the product of the preceeding ints
Repeat from (1) for all fields requested
So the expected pattern for the client code is to read the int64 and int32, take the product, allocate that many bytes, then read the data into that allocation.
When passing multiple files, a single column will be read from all files before moving to the next column. In other words, the client sees the concatenated data.
From a performance perspective, the pipe operation probably amounts to a memcpy. So a small performance hit, but likely vanishingly small compared to the actual IO and analysis.
Ultimately, this pipe scheme is not a replacement for direct access to the files, but it may be helpful for applications with simple data access patterns.
pipe_asdf is a “console script” alias provided by setuptools to invoke
abacusnbody.data.pipe_asdf module as a script. This alias is
usually installed in a user’s PATH environment variable when installing
abacusutils via pip, but if not, one could equivalently invoke the
$ python3 -m abacusnbody.data.pipe_asdf
abacusnbody/pipe_asdf directory also contains a symlink to this
file, so from this directory one can also run
Add a “-k/–key” flag to read header fields. Decide on a wire protocol.
Add CompaSOHaloCatalog hooks to pipe the unpacked data (?)
- abacusnbody.data.pipe_asdf.unpack_to_pipe(asdf_fns, fields, data_key='data', header_key='header', pipe=<_io.BufferedWriter name='<stdout>'>, nthread=4, verbose=True)¶
Invoke the command-line interface
Example C Client Code¶
An example C program called
client.c that receives data over a pipe
is given in the abacusutils/pipe_asdf directory.
From this directory, one can build the
client program by running
and run it with:
$ ./pipe_asdf.py halo_info_000.asdf -f N -f x_com | ./client
You can use the example
halo_info_000.asdf file symlinked in the
pipe_asdf directory to test this.
This program is a stand-in for an analysis code. In this case, it just reads the raw
binary data for two columns,
x_com, and prints the values.