Used on all runs: Used on some runs:
-a- API
for I/O -c- collective I/O (Used by
-b 16m- block size used
-V- use
MPI_file_set_view
-r- reads file -IOR_HINT__MPI__romio_ds_read
-w- write file -IOR_HINT__MPI__romio_ds_write
-i 5- number of repetitions
-MPICH_ROMIO_NO_RECORD_
-t- transfer size LOCKING=1
- 256k and 16M
Comparison
of MPI-IO and HDF5 Parallel I/O on the Cray XT3/XT4
Joylika
Yvette Adams
Fisk
University
Research
Alliance in Math and Science
Computer
Science and Mathematics Division
Mentor:
Dr. Mark Fahey
http://www.csm.ornl.gov/Internships/rams_07/poster/Joylika_Poster.pdf
A special thanks goes
out to my faculty advisor Dr. Stephen Egarievwe, of Fisk University, for helping me receive this internship. Many thanks also goes out to my Dr. Mark Fahey for all of his help, especially when I had no
clue what he was talking about. Finally, special thanks goes
to Debbie McCoy, who made this research experience possible and also
exciting.
Conclusions
•MPI-IO and HDF5 can provide good I/O bandwidth on XT platforms with Lustre
•MPI-IO and HDF5 can be poor when using collectives and fileview
respectively
•BUT can be much better when not-so-well-known tricks are used
•MPI-IO fileview needs hints set
•IOR_HINT__MPI__romio_ds_read
•IOR_HINT__MPI__romio_ds_write
•Either enable, automatic, or disable
•HDF5 collective needs environment variable set
•MPICH_ROMIO_NO_RECORD_
LOCKING=1
•For the test with small number of clients, the HDF5 rates were almost as fast as
MPI-IO
•For the test with large number of clients, the MPI-IO rate were twice as fast as HDF5
•Bigger transfer size only help collective
Future Research
•Run more tests to fill in the missing data (HDF5 with hints using the larger number
of clients)
•Run netcdf tests to compare with HDF5 and MPI-IO
The
Research Alliance in Math and Science program is sponsored by the Office of
Advanced Scientific Computing Research, U.S. Department of Energy. The work was performed as part of a
joint
project funded by Office of Naval Research
Discovery and Innovation Program, at the Oak Ridge National Laboratory which
is managed by UT-Battelle, LLC under Contract No. De-AC05-00OR22725. This work
has been authored by a contractor of the U.S.
Government,
accordingly, the U.S. Government retains a nonexclusive, royalty-free license
to publish or reproduce the published form of this contribution, or allow
others to do so, for U.S. Government purposes.
Background
•Jaguar
•11,706 dual-core processors
•Peak
performance of 119 TFlops
•46
TB of memory
• Lustre file system
•600
TB of scratch disk space
•Lustre
•3
separate file systems (two 150 TB and one 300 TB)
•Previously
measured read and write bandwidth of 43 and 26 GB/s, respectively
The
National Center for Computational Sciences (NCCS) was founded in 1992 to
provide world-class, high-performance scientific resources to scientists for
the purpose of advancing science and
technology research. To accomplish these goals,
researchers and scientists need to determine how to most effectively run their
applications on supercomputers with tens of
thousands of processing cores, and one of the most daunting challenges is how
to efficiently write applications data to disk, which can be on the order
of hundreds of terabytes for a simulation.
The tests that were performed on
Jaguar (Cray XT3/XT4) with a Lustre file system. Lustre is a
parallel, object-based file system designed to provide large, high-bandwidth
storage on large clustered computers. The project continues previous work by
evaluating and cataloging the performance of various I/O methodologies
and libraries. To evaluate various parallel
I/O methodologies, the IOR (Interleaved Or Random) code was used to perform
parallel reads and writes to/from a file system using MPI-IO and HDF5
interfaces. The tests that were performed used only the MPI-IO and
the HDF5 interfaces primarily to replicate I/O done by typical users of the
NCCS Cray XT4. With these interfaces, performance
results were obtained when using IOR with a constant buffer size per client
while increasing from 2 to 1024 processes. The collective
and fileview options were also tested.
Previous test results indicated that good parallel I/O
performance could be obtained with MPI-IO, but HDF5 performance had yet to be
studied. These new tests show that parallel HDF5 rates are nearly
as good as the MPI-IO rates for relatively small tests and
within a factor of 2 for large processor counts. It was also discovered that
the I/O performance could be very poor for both the
MPI-IO and the HDF5 interfaces within IOR when using the collective and
fileview options. However, by using “hints” can result in
collective and fileview tests performing very well,
These findings will be documented and made available to the NCCS users.
Methods
• Use IOR- a software used for benchmarking
parallel file systems
• Ran multiple instances of IOR with the HDF5 and MPI-IO interfaces
•Report the maximum of the plots
•The runs were done in non-dedicated time
• Compared I/O libraries
•With and without fileview
•With and without collective
•With small and large transfer sizes
Motivation
•Researchers
have very large I/O requirements [Kothe2007]
•Some require writing out (100) GB of data every hour
•Some will need to write out (10) TB of data every hour in a year from now
• Ensure that users efficiently use of the Lustre
file system
• I/O
portions of their codes do not dominate users runtime
•Produce scientific results
•Continue
previous work by further studying performance of various I/O libraries
•Identify
the best practices
•Compare different I/O libraries
•Compare
Interfaces
•
MPI-IO
HDF5
Simulation
requirements on a 1-PF LC system with 200 TB of memory [Kothe2007]
[Kothe2007] "Computational Science
Requirements for Leadership Computing,"D. Kothe, et. al.,
in preparation, Leadership Computing Facility, Oak Ridge National Laboratory,
2007.
Options Used While Collecting Data
