Sidebar 7: Data-intensive computing, cherax, ruby, DMF, the Scientific Computing Data Store, tape libraries – a retrospective
Data-intensive computing, cherax, ruby, HSM, DMF, the Scientific Computing Data Store, tape libraries – a retrospective
Last updated: 11 Oct 2023.
Added information on and pictures of tape libraries.
Updated: 9 Aug 2023.
Added links to SC manuals on the Data Store, and a history page.
Added link to Why HSM? presentation.
Added and updated 30-year anniversary note. Added note about DMF Users Group. Added note about STK library retirement.
Robert C. Bell
With the commencement of service of CSIRO’s first Cray Y-MP (“cherax”) in March 1990, the installation of StorageTek cartridge tape drives in mid-1991, and the start of Cray’s Data Migration Facility on 14th November 1991, a unique partnership developed – a high-performance compute service and a large-scale data repository as closely coupled as possible. With the start of automation of tape mounting in June 1993 with the installation of a StorageTek Automated Tape Library, the foundation was set for a platform to support large-scale computing on large-scale data (large-scale for the time).
Users were freed from the need to deal with mountable media (tapes, floppy discs, diskettes, CDs), and the users’ data was preserved safely and carried forward through many generations of tape technology, all done transparently behind the users’ backs on their behalf – 13 generations of drive/media so far, going from 240 Mbyte to 20 Tbyte (and more with compression) on a single tape cartridge. There were a succession of hosts – three Cray vector systems (cheraxes running UNICOS), 4 SGI hosts (3 cheraxes and ruby running Linux), and now dedicated servers.
Users had a virtually infinite storage capacity – though at times very little of it was available on-line – the Hierarchical Storage Management (HSM) would automatically recall data from lower levels when it was required, or users could issue a command to recall sets of data in batches. It was somewhat difficult for new users to get used to (to them, DMF stood for “Don’t Migrate my Files”), but experienced users valued the ability to have large holdings of data in one place.
Unlike many sites, the Data Store was directly accessible as the /home filesystem on the hosts. This overcame the problem of managing a large shared filesystem, and DMF took care of the data as it filled by copying data to lower cost media, and removing the data (though not the metadata) from the on-line storage. Other sites (such as NCI, Pawsey and the main US supercomputer centres) ran their mass storage as separate systems, with users having to explicitly send data from the compute to the storage servers, and recall explicitly: users then had a minimum of two areas to manage, rather than just one.
One of the main users ran atmospheric models on various HPC platforms in Australia, but always copied data back to the Data Store for safe-keeping, and to enable analysis to be carried out in one place.
The Data Store continues after the de-commissioning of ruby on 30th April 2021, as the filesystem is available (via NFS) on the Scientific Computing cluster systems. There is no general user-accessible host with the Data Store as the home filesystem.
Below is a graph showing the total holdings in the Data store since 1992. The compound annual growth rate over that period was 1.59 – meaning that the amount stored increased by 59% each year.
The Data Store depended heavily on the strong support from the staff and technology of the vendors (principally Cray Research, SGI, StorageTek/Sun/Oracle, and lately IBM, HPE and Spectralogic), and the superb support by the systems staff – principally Peter Edwards, but also Ken Ho Le, Virginia Norling, Jeroen van den Muyzenberg and latterly Igor Zupanovic.
In 2009, Peter Edwards and I formed a DMF User Group, to allow sites running DMF to share information. See DMF Users Group.
P.S. Note that CSIRO developed its own operating system (DAD) and own HSM (the document region) in the late 1960s, using drums, disc and tape (manually mounted), with automatic recall of files, and integration with the job scheduler so that jobs blocked until the required files had been recalled! See Sidebar 1.
P.P.S. Here’s a link to talk given in 2012 entitled, Why HSM?
Thirty Year Anniversary of the CSIRO Scientific Computing Data Store CSIRO and DMF
On 14 November 1991, the Data Migration Facility (DMF) was brought into operation on the CSIRO home file system on the Joint Supercomputing Facility Cray Y-MP (cherax).
This became known as the CSIRO Scientific Computing Data Store, and will be thirty years old on 14th November 2021, being managed by DMF over all those years. The above provides some of the history, and more is provided in the Personal Computing History sidebar. The Data Store became an important part of the services to science, and one that I attempted to champion for the benefit of the users.
It has grown from a 1 gigabyte disc storage in 1990-1991 to over 62 petabyte – a factor of 62 million.
Although it started in its current form in 1991, it contained files from the start of the cherax service in March 1990, and I (and possibly other users) had carried files forward from older systems, e.g. Csironet. I can identify files in the Data Store dating back to the 1970s and 1960s.
From its inception for about 20 years, the Data Store provided storage for the bulk of CSIRO’s climate model output data, and the cheraxes were the primary data analysis platform.
The physical layer of storage is the sine qua non of data preservation, and the Data Store has provided this.
The Data Migration Facility (DMF) was originally developed by Cray Research for UNICOS. It passed into the hands of SGI when it acquired Cray Research. SGI held on to DMF when it sold Cray Research, and ported DMF to IRIX and Linux. Latterly, HPE acquired both Cray Inc and SGI, and renamed DMF as the Data Management Framework.
CSIRO SC Information
Chapter 3 recorded that Csironet acquired a Calcomp/Braegan Automated Tape library in the early 1980s. This used 9-track STK round tape drives, with tape capacities around 160 Mbyte.
The CSIRO Supercomputing Facility acquired a StorageTek ACS4400 tape library which was installed at the University of Melbourne in June 1993. It and subsequent STK libraries could hold about 6000 tape cartridges.
When CSIRO joined the Bureau of Meteorology in the HPCCC, CSIRO acquired a similar STK Powderhorn tape library which was installed at 150 Lonsdale Street Melbourne.
When the HPCCC moved with the Bureau from 150 Lonsdale St to 700 Collins St Docklands, a shell game was played with another Powderhorn being installed at 700 Collins St, and the Bureau and CSIRO staging the transitions until there were two Powerhorns at Docklands, and one old one from 150 Lonsdale Street being traded in.
In 2008, CSIRO acquired an STK/Sun SL8500, to allow support for newer tape drives, and the Powderhorn was decommissioned.
In 2015, the CSIRO HSM was moved from Docklands to Canberra, with backup facilities at Clayton. CSIRO acquired IBM TS4500 tape libraries:
and then Spectralogic Tnfinity libraries.
Each type of library has its own advantages and disadvantages. Much discussion centred around access times. The Powderhorn libraries had access times of only a few seconds to any tape cartridge, but could deal with only one retrieval at a time. The SL8500 libraries had four levels with one or two robots per level, providing multi-tasking, but cartridges and drives needed careful placement o avoid too many demands on the elevators that moved cartridges between levels.
The IBM libraries have a unique feature of being able to store two cartridges in some deep slots, but of course access to the ‘hidden’ cartridge is slower.
The Spectralogic libraries store tapes in small trays (up to 10 cartridges per tray): the robot retrieves a tray, and then selects the desired tape from the tray.
Optimising the layout was difficult for the later libraries. This is on top of the problem of optimising multiple retrievals from a modern tape which is written in serpentine fashion.