Movies and popular music are moving from tape to disk, but tape is still on a roll at the San Diego Supercomputer Center (SDSC) at the University of California, San Diego. The center’s huge, updated tape storage system has illustrated its effectiveness by transferring data at 828 megabytes per second. "Nobody in academia has such a fast data archive system," said Phil Andrews, program director for High-End Computing at SDSC.
Supercomputers are ranked by their raw computational speed, but rapid movement of increasingly large data sets has become vital to researchers in scientific disciplines from anatomy and astronomy, to climatology and particle physics. Data sets in those fields have grown over the past decade, by multiples of 1,000, from megabytes (millions of bytes) to gigabytes, to terabytes. Some data sets are now approaching or exceeding one petabyte (one quadrillion bytes).
Even though tape costs a fraction of disk, supercomputer centers have been investing heavily in more expensive disk storage because its data-transfer rates are higher, by a factor of at least 10, than what had been the best tape systems. At the same time, however, supercomputer centers are struggling to pay for the disk capacity required to keep up with the explosive growth of scientific data.
"While we are expanding our disk storage capacity, we’re also adding new higher-density tapes, faster tape drives and more of them, and other technology in a highly tuned, extremely capable data management system," said Andrews. "It’s all these pieces working together that has allowed us to reach a new milestone in data-transfer speed."
New technology at SDSC involved in the tape-to-disk data transfers includes tape drives from StorageTek® (Storage Technology Corp.), switches from Brocade Communications Systems, fibre channel adapters from QLogic, and end-to-end systems and technology from Sun Microsystems. The combined effect the new technology has been a reduction–from days to hours–in the transfer of multi-terabyte data sets from SDSC’s tape-storage system to its IBM Blue Horizon supercomputer, which has a peak speed of 1.7 teraflops (trillion floating point operations per second).
"Our supercomputer center is dealing with more multi-terabyte data sets–such as the 10-terabyte Digital Sky astronomy project–and very fast transfers will allow astronomers to make discoveries faster," said Andrews. "With tape, we have achieved a data-transfer rate that very few people realized was feasible."
No industry association keeps track of data-transfer speed records. "As far as the use of tape drives or multiple tape drives is concerned, the data transfer rates of most academic groups have been well under 100 megabytes per second," said John Marshal, program manager at StorageTek. "The groups that have achieved data transfer rates in the range of 800 megabytes per second have not used tape: they have done it with much larger investments in disk drives, switches, and routers."
SDSC is replacing its 20-gigabyte-capacity tape cartridges with 200-gigabyte native capacity tape cartridges manipulated robotically in five silos with a total storage capacity of six petabytes. The center also installed 24 StorageTek T9940B tape drives, each of which can transfer data to and from tape at roughly 30-megabytes-per-second throughput without data compression. (Compression of data increases its rate of transmission.) SDSC has measured over 60 megabytes-per-second peak speed with the new tape drives with roughly twofold data compression. (The actual measured data-transfer ratedepends on the degree to which data is compressed, the type of software used, and the capacity of the associated hardware involved in the transfer.)
As part of its data-management system, SDSC has also installed Sun Microsystems’ SAM-FS Advanced Storage Management and QFS High Performance Shared File System software to provide maximum scalability, performance, and throughput for the most data-intensive applications. SDSC has also installed Sun’s High Performance Computing Storage Area Network (SAN) storage solution. Such SANs allow multiple computers, using a range of operating systems, to share data seamlessly. The disk capacity attached to SDSC’s SAN will be increased from 50 terabytes to 500 terabytes by 2003.
SDSC’s recent and planned improvements will enable the center to deliver the unprecedented data-management infrastructure required by the National Science Foundation’s $88 million TeraGrid project. The TeraGrid will be deployed in 2003 as the world's largest, fastest, distributed infrastructure for open scientific research. When completed, the TeraGrid will include more than 20 teraflops of distributed computing power, facilities dedicated to managing and storing nearly one petabyte of disk storage and more than six petabytes of tape capacity, high-resolution visualization environments, and toolkits for grid computing. The components of the TeraGrid will be tightly integrated and connected through a network that will operate at 40 gigabits per second–the world’s fastest research network.
The TeraGrid infrastructure will be distributed among five sites: SDSC, the lead site for the National Partnership for Advanced Computational Infrastructure; the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign; the Center for Advanced Computing Research at the California Institute of Technology; Argonne National Laboratory; and the Pittsburgh Supercomputing Center.
A Second Data-Transfer Achievement
In a second data-transfer feat demonstrated Nov. 21 at the Supercomputing 2002 (SC2002) conference at the Baltimore Convention Center, SDSC computer scientist Bryan Banister transferred data from disk at the supercomputer center in La Jolla, CA, to disk at the convention center at 721 megabytes per second.
Banister performed the disk-to-disk data transfer as a demonstration of what will soon be routine–movement of large data sets from SDSC to the other TeraGrid sites over a high-speed network. The cross-country, disk-to-disk transfer made use of a Qwest Communication 10-gigabit-per-second fiber-optic backbone. It also involved Nishan Systems storage switches, Force10 Networks switches, and Juniper Networks routers.
SC2002, this year’s edition of the world’s largest annual high-performance networking and computing conference, is designed to demonstrate supercomputing technology and the breakthrough science it makes possible.
Cite This Page: