Coping with BIG DATA Image Formats: Integration of CBF, NeXus and HDF5, A Progress Report
Poster May 24, 2014
Herbert J. Bernstein, Jonathan M. Sloan, Graeme Winter, Tobias S. Richter, NeXus International Advisory Committee, Committee on the Maintenance of the CIF Standard
The BIG DATA demands of the new generation of X-ray pixel array detectors necessitate the use of new storage technologies as we meet the limitations of existing file systems. In addition, the modular nature of these detectors provides the opportunity to construct more complex detector arrays (e.g. the Dectris Pilatus detector at I23 at DLS), which in turn requires a more complete description of the detector geometry. Taken together these give rise to a need to combine the best of CBF/imgCIF (the Crystallographic Binary File, which has a complete description of the experiment), NeXus (a common data framework for neutron, X-ray and muon science, which gracefully handles large data sets) and HDF5 (Hierarchical Data Format, version 5, the high-performance data format used by NeXus) for the management of such data at synchrotrons. In July 2013, discussions were in progress between COMCIFS (the IUCr Committee for the Maintenance of the CIF Standard) and NIAC (the NeXus International Advisory Committee) on an integrated ontology. Those discussions have progressed. A proof-of-concept API based on CBFlib and the HDF5 API that was being developed in a collaboration among Dowling College, Brookhaven National Laboratory and Diamond Light Source is now in use. The mapping and combined API continue to develop. (See the January 2014 Computational Crystallography Newsletter.) Releases of CBFlib since CBFlib 0.9.2.12 can store arbitrary CBF files in HDF5 and recover them, support use of all CBFlib compressions in HDF5 files, and can convert sets of miniCBF files to a single NeXus file. Here we present the new format, with examples, alongside the implications of the use of this format for software developers and for beamline users.