These pages attempt to give some of the history of CSIRO’s use of computing in its research, focussing mainly on the large shared systems and services.
Sidebar 6: Robert Bell personal computing history
Last updated: 30 April 2021.
Robert C. Bell
Rob Bell with CSIR Mk 1 (CSIRAC) at the Melbourne Museum.
In 2003, I gave a talk at a Dinner hosted by RMIT Engineering, and talked about my career in terms of Technology: Mathematics, Meteorology, Machines.
I was born in Hobart in 1949 (the same year that CSIR Mk1 came to life), and lived there until September 1951 when our family (Father Powell, Mother Gwenyth, brothers Alan and John, and sister Margaret) moved to Melbourne to the suburb of Bentleigh. My family had originally come from NSW – my Mother from Terara near Nowra, and my Father from Sydney.
My Father was the Accountant at the main Melbourne branch of the Commonwealth Savings Bank at 8 Elizabeth St, and was promoted to Inspector before retiring in 1973. My Mother was never in paid employment, working on the family dairy farm until she was married. But there was a history of scholarship in her family, with one Uncle being a school inspector, and another being a civil engineer, being Chief Engineer of the NSW Railways and a president of the Institute of Engineers.
Maths and School
Maths and numbers seemed to be around us. Prior to my going to school, I could tell you the number of each of the animal cards in my collection from the cereal packets. I attended Bentleigh West primary school, and then Brighton High School. In grade 4, after parent/teacher interviews with Mr Curlis, I was encouraged to learn more about Maths, and my parents bought an additional textbook.
While at primary school, we acquired an old typewriter from my Uncle Albert’s business (C. Bell & Son). My Father did the monthly accounts for the business, and he was rewarded by ‘presents’ from Uncle Albert. I set to to type up a multiplication table. I don’t remember how many rows it had, but think it went to at least 20. I typed up successive pages to the right as the table was extended, and I do remember that I reached 60. Pages were sticky-taped together, and rolled up like a scroll. (Of course, the typing was done on the back of old paper: paper was not easy to come by, but my Father brought home from the Commonwealth Savings Bank some surplus paper, seemingly only once per year).
I won a Mothers’ Club scholarship from Bentleigh West.
I started High school in 1961, and soon became interested in the weather through the Geography syllabus. I set up my own weather station at home in September 1961, and started taking twice-daily observations, which continued until I left home in 1974 (the records are on paper at home). Here is an article I wrote which was published in the school magazine Voyager
I continued to be good at Maths. In year 11 and 12, I benefitted from having Mrs Frietag as Maths teacher. She had a laconic style, and I remember her handing back our exam papers one time, and she reluctantly saying that she couldn’t find anything wrong with my effort! In the year 12 mid-year exams, in one of the Maths subjects, she failed nearly everyone – I think she had used an old end-year exam paper, and we hadn’t covered all the topics yet. There were protests, and the results were scaled so more students passed. I was a bit annoyed that she arbitrarily gave me a score of 95 when the scaling-up produced a result for me around 110 out of 100.
At the end of year 11 in 1965, I was called in by the Principal to encourage me to attend a summer Maths camp at Somers. He said that I could win an exhibition. I attended the camp, and subsequently did gain a general exhibition, awarded for the top 30 students in the state. I also won a state Senior Scholarship, which provided money!
My brother Alan joined the Department of Defence in early 1960 after completing a science degree at the University of Melbourne. (I found out years later that he had done a subject on computing, and had used CSIRAC.) In September 1961, he left home to work for a year at GCHQ in Cheltenham, UK. We (and especially me) at the time had no idea of what he worked on. We knew he was in Defence Signals. Part way through his year for a few weeks, his aerogramme letters started coming from Bletchley Park instead of Cheltenham, and on his way home in November/December 1962, we received letters from Silver Springs, Maryland.
In about 1963-1964, he started to teach me about computers, with a ‘model’ showing pigeon holes as place where numbers and instructions could be stored.
In 1965, I attended an ISCF Science Camp (Inter-School Christian Fellowship) at Belgrave Heights; the camp was run by the Graduates Fellowship, including my brother Alan. We were taught the beginnings of Fortran programming, and he took the programs we wrote to get them punched onto cards, and compiled and ran the programs (on a CDC 3400 at DSD that was under acceptance testing I found out later). Thus I wrote my first computer program in 1965: the program calculated the period of a pendulum for various lengths using the well known formula.
In 1966, my sister Margaret commenced work with the Bureau of Meteorology as a programmer in training, having completed a BSc(Hons) majoring in Applied Maths at Monash University. The Bureau used the CSIRO computing network from then until it acquired the first of its twin IBM 360/65 systems in 1968.
University and Vacation Employment
I was keen on a career in Meteorology, and enrolled in a BSc at Monash University, commencing in 1967. During 1967, I wrote to CSIRO Division of Meteorological Physics at Aspendale, and was subsequently employed as a vacation student for 1967-68. I worked for Reg Clarke. I spent the first hour of each day punching in data from the Wangara Expedition onto cards, and the subsequent hours with a large mechanical desk calculator (the sort with ten rows of ten buttons), and log book and pencil and paper, calculating u.cos(theta) + v.sin(theta) from more Wangara observations. It’s great to see that data from Wangara is now publicly accessible through ARDC – I thought it might be lost! https://researchdata.edu.au/wangara-experiment-boundary-layer/681872
(However, not all the data was published, and I suspect the raw data is gone.)
I also learnt where the off switch was in the radar tower, as a safety measure to cover the absence of other staff. Andreij Berson was studying dry cold fronts, which showed up on radar for unknown reasons: I can remember his scanning an approaching dry cold front with binoculars and shouting “birds!”. See https://publications.csiro.au/publications/publication/PIprocite:43a9d45f-bc88-4d4b-af4c-801773391cff/BTauthor/BVberson,%20f.%20a./RP1/RS25/RORECENT/RFAuthor=berson%2C%20f.%20a./STsearch-by-keyword/LIBRO/RI7/RT38
I also started reading from the library at Aspendale A Guide to FORTRAN Programming 1961 by Daniel D. McCracken, and tried writing a simple program to be run on the DCR CDC 3200 at Clayton – the program didn’t work, because I didn’t understand array dimensioning.
Fortunately in 1968, I studied numerical analysis and Fortran programming at Monash, with John O. Murphy lecturing, and gained more understanding.
The next year, I was again successful in gaining vacation employment at Aspendale. This time, Reg Clarke assigned me to write a program to model a sea breeze, using the equations derived in a paper (Estoque, M. A. A theoretical investigation of the sea breeze https://doi.org/10.1002/qj.49708737203), and Reg’s own boundary layer parameterisations. I tried to make progress, but had no idea of programmability, and foolishly programmed everything with explicit constants in a premature attempt for speed. Despite a few attempts later in 1969 to get the program working, it never did. I did generate successful routines to solve some of the parameterisations.
In 1969, only one of my units at Monash involved computing – Astrophysics.
I applied to CSIRO to work at Aspendale in the next vacation, but was instead offered a position at the Commonwealth Meteorology Research Centre – a joint CSIRO/Bureau of Meteorology centre. I commenced there working for Doug Gauntlett in the IOOF building in McKenzie St Melbourne. Doug asked me to investigate solvers for Helmholtz equations, which were required at each time-step of the current weather forecasting models. In particular, to investigate the alternating-direction implicit (ADI) method, which was developed by the nuclear research community. I built a framework to test various algorithms, including successive over-relaxation, a method developed by Ross Maine, and ADI. ADI proved to be the fastest in the cases under investigation, where a good first guess was available, such as in a time-stepping model where the field from the previous time-step was likely to be close to the required solution for the current time step, and absolute accuracy was not needed. (It turned out that a fellow student, David Bover, on the same vacation was working on ADI for the ARL.) During this vacation, I used the Bureau’s IBM 360/65 systems, and so learnt some JCL.
I did no computing during my honours year, which probably helped, and graduated with first class honours in Applied Maths.
I commenced a PhD in Applied Mathematics at Monash University in early 1971, with Roger K. G. Smith as supervisor. I wanted to do something meteorological with computing, but Roger suggested doing work on quasi-geostrophic models of ocean circulation. Quasi-geostrophic equations were an earlier (successful) simplification of the equations governing the flow of the atmosphere, when the earth’s rotation dominated the forces acting on the atmosphere. I was not happy about oceans rather than atmosphere, but started the work, and did build a successful model, which did show the main features of large-scale oceanic flow. I used the Monash CDC 3200 for this work. Unfortunately for me, Roger Smith left for the University of Edinburgh, and I did not follow. Bruce Morton took over my supervision, and suggested looking at stromatolites in Shark Bay WA, but I did not get far with this. Roger Smith returned to Monash, I wrote a more accurate quasi-geostrophic model which I ran on the Monash Burroughs 6700. But, the lure of computation distracted me from the PhD.
I did learn a lot about computing, and was appointed as post-graduate representative on the Monash Computing Advisory Committee, at a time when replacements for the CDC 3200 were being considered. There were proposals from CDC, Burroughs, IBM, UNIVAC and one other (Honeywell?), but it became clear that the Computing Centre was committed to buying the Burroughs 6700 as a successor to the Burroughs 5500. One of the Applied Maths professors likened it to a university library, on discovering that it could earn money from lending romantic novels to the community, threw out all the science journals and texts and bought more novels! The Burroughs machines acted as backups for the Hospital computing services. I developed my first benchmark, program helm, solving a Helmholtz equation by the method of successive over-relaxation, which was run on many machines over subsequent years.
I accepted a 3-year appointment at Aspendale, starting in late 1974, and there commenced developing a model of airflow in support of the Latrobe Valley Study. I eventually did finish my PhD, but was not making much progress with the airflow model. People like Peter Manins tried to help me in my research career, but I think I was too proud to accept advice. In 1977, the Chief, Brian Tucker, offered me an indefinite appointment as an Experimental Officer, which I was grateful to accept – a position allowing me to support researchers by doing computation for them, and I had found my niche.
In October 1976, the Division hosted the International Turbulence Comparison Experiment (ITCE) at Conargo, NSW, one of the flattest areas on earth! Three weeks before the start, I was asked to help with the computing side, working with Neil Bacon and Graham Rutter. This involved writing programs to deal with the acquisition and calibration of the data, and was to be run on an HP 21MX mini-computer. I spent about 10 days in Deniliquin/Conargo helping to set up the computing services in a caravan.
C WA IS THE FIRST OF THREE PROGRAMS TO ANALYSE I. T. C. E. CORE
C DATE FROM MAGNETIC TAPE
C THE MAIN STAGES OF WA ARE:
C 1. SETUP AND INITIALIZATION
C 2. INPUT OF FUNCTION SPECIFICATIONS FROM PAPER TAPE FROM THE
C 3. INPUT OF SPECTRA SPECIFICATIONS FROM PAPER TAPE FROM THE
C 4. PROCESSING OF BLOCKS OF DATA FROM MAGNETIC TAPE. THIS STAGE
C CONSISTS OF –
C A. INPUT FROM MAGNETIC TAPE.
C B. CONVERSION TO VOLTAGES.
C C. SELECTING THE CORRECT SUBROUTINE FOR CALIBRATION.
C D. COLLECTING SUMS FOR AVERAGING, ETC.
C E. OUTPUTTING REQUIRED CALIBRATED DATA TO DISC FOR SPECTRA.
C 5. CALCULATION AND PRINTING OF AVERAGES, ETC.
C 6. OUTPUT OF CONTROLLING DATA AND AVERAGES, ETC. FOR WB AND WC.
C NOTE. THROUGHOUT THIS PROGRAM, THE WORDS FUNCTION AND
C SUBROUTINE ARE BOTH USED TO DESCRIBE THE EXPERIMENTER-
C SUPPLIED SUBROUTINES.
One of the surprises to me was when I ran a program to calculate means and variances from the data, to find that I had negative variances! I used a well-known formula for variances which allowed a single pass through the data:
instead of the mathematically equivalent:
The 32-bit floating-point arithmetic on the HP-21MX did not have enough precision to avoid catastrophic cancellation that the first formula allowed. I later researched summation algorithms (Kahan and others), and developed block algorithms which provided high accuracy for the calculation of means and variances in a single pass (unpublished).
When I returned from ITCE, I found Rory Thompson sitting at my desk. I worked with Rory Thompson (my worst time in CSIRO – I feared him, for good reason as we found out later), Angus McEwan, Allan Plumb (I programmed the first successful model of the Quasi-Biennial Oscillation for Allan), Peter Webster, Peter Baines and latterly Jorgen Frederiksen, with whom I had a productive partnership over several years during the time he won the David Rivett medal. He kindly made me joint author on several papers.
UK Met Office visit
In 1983-84, I visited the UK Meteorological Office for a period of six months to gain early experience with a Cyber 205, and to begin the porting of CSIRO codes to it. More details are given here
. See also Csironet News no. 178, August 1984 – Cyber 205 experiences – R. Bell
One of the projects with Jorgen involved trying to improve code that he had that looked at atmospheric stability – fastest growing modes, blocking, etc. I found that over 90% of the run time was in setting up interaction coefficients, and less than 10% of the time was spent in solving the eigenvalue problem. Furthermore, I found that the interaction coefficients could be calculated separately, and once only, and saved. This led to a huge speed-up, and allowed much larger problems to be tackled.
Another problem involved computing ensembles, and I was able to vectorise the code for the Cyber 205 over the ensemble members, to get great speed-up.
During these years, I tried to take advantage of every useful facility that DCR/Csironet provided to support the scientific effort. I used and promoted the use of the source code control system UPDATE, I could write Ed box programs, I promoted the use of standard Fortran, I built libraries of useful code (a set of routines for solving block tri-diagonal systems, used in the QBO work, and by Peter Webster) and wrote utilities to help manage data holdings. I had two stints working in the User Assistance Section in Canberra. I started writing an anonymous column for Csironet News (Stings and Things by Scorpio.)
DAR Computing Group
In about 1986, the Chief asked me to consider taking on the role of Computing Group Leader, which I had done on a temporary basis in June-August 1985. I accepted the position, and started in March 1987. Tony Eccleston joined the group as well, with the existing staff of Graham Rutter, Jill Walker and Hartmut Rabich. Staff issues dominated, as we sought to establish a new UNIX-based local computing environment, and to ban smoking in the computing building including the machine room! After going out to tender, running benchmarks, and evaluating proposals, Silicon Graphics won over Sun and HP (and maybe others) with a clear performance advantage. A UNIX server was installed for general computing use.
With the privatisation of Csironet underway, and no clear path for a successor to the Cyber 205 for scientific computing work, in 1989 the CSIRO Policy Committee on Computing set up the Supercomputing Facilities Task Force (SFTF), to decide on follow-on facilities from the Cyber 205. See Chapter 5
I was heavily involved and managed the benchmarks that were assembled from codes from several CSIRO Divisions, along with some specific benchmarks to test key areas such as memory performance. I travelled with Bob Smart to the USA for two weeks to undertake benchmarking and to explore options. This was our first visit to the USA.
When decision-time came in August 1989 at the PCC, my Chief, Brian Tucker, insisted that I should be present along with Mike Coulthard, who chaired the SFTF. The PCC decided on the Cray Research/Leading Edge Technologies shared Cray Y-MP proposal.
JSF and SSG
I was then heavily involved in setting up the partnership (Joint Supercomputing Facility) with LET in Port Melbourne, establishing the service, and had sole responsibility for running the acceptance tests in March 1990 – 16 hours per day re-running the benchmarks for about a week on cherax, the name we gave the system (SN1409) and subsequent platforms. I was not present all the time, but relied on Cray Research staff to start the benchmarks at 8 AM each day, and terminate them at midnight.
I continued to help with the setting up of the service, on one occasion accompanying 3 staff from Aspendale to visit LET with a magnetic tape to set up their programs, prior to acceptable networking facilities being set up by Bob Smart.
The position of Supercomputing Support Group leader was advertised, to be based at the Division of Information Technology at 55 Barry St Carlton, and I was successful in gaining the job, starting (initially for 3 days per week on secondment from DAR) in May 1990. I had by then relinquished the Computing Group Leader position at Aspendale, to concentrate on the establishment of the Joint Supercomputing Facility. I was joined by Marek Michalewicz, Simon McClenahan, and Len Makin to form the group of four.
In the second half of 1990 I was involved (with Peter Boek from LET and Peter Grimes of Cray Research) on a roadshow to all the major CSIRO sites (all capitals, and Townsville) to publicise the new service. The uptake was good in several Divisions of CSIRO, but those with computing needs which could be met with existing PCs, workstations and Divisional facilities (including mini-supercomputers), did not make great use of the JSF.
At the end of 1990, I presented the paper Benchmarking to Buy at the Third Australian Supercomputer Conference in Melbourne, based on our experiences.
CUG and DMF
In April-May 1991, I was fortunate to be able to attend my first Cray User Group meeting – in London, and then visit several other supercomputing sites, including the UK Met Office, ECMWF, NCSA, NCAR and SDSC. At CUG, I had fruitful meetings with Charles Grassl and others, as I presented results from the benchmarking of the memory subsystems of various computers. These results illustrated the large memory bandwidth of the Cray Research vector systems of the time, compared with cache-based systems systems. I also learnt about Cray Research’s Data Migration Facility, which would become pivotal in CSIRO’s subsequent scientific computing storage services.
I later served two terms on the CUG Board of Directors as Asia/Pacific Representative, and presented two papers: “Seven Years and Seven Lessons with DMF”, and a joint paper with Guy Robinson comparing the Cray and NEC vector systems (Cray was marketing the NEC SX-6 as the Cray SX-6 at the time).
We quickly found that the Cray Y-MP turned a compute problem into a data storage problem – the original system had 1 Gbyte of disc storage (DD-49s) for the CSIRO home area, and the only option for more storage was manually mounted 9-track magnetic tapes. LET wished to acquire cartridge tape drives for its seismic data processing business, and CSIRO assisted in a joint purchase of such drives from StorageTek. This set up minimal requirements to invoke DMF on the CSIRO /home area, which was done on 14th November 1991, so that more dormant files would be copied to two tapes, and subsequently have their data removed from disc, but able to be restored from tape when referenced. This took some getting used to for users, but in the end the illusion of near-infinite storage capacity was compelling, and skilled users learnt how to drive pipelines of recall and process. Thus, I had (unwittingly at the time) re-created the DAD Document Region functionality on the CDC 3600, with automatic migration to tape, and recall when required.
At the end of 1991, economic circumstances put LET under threat – see Chapter 5. DMF allowed us to institute an off-site backup regime, just in case. Cray Research put a proposal to CSIRO to establish a new service, in conjunction with and situated at the University of Melbourne, with a Cray Research Y-MP 3/464, and service started there on 1st August 1992, with the data being transferred from the previous Y-MP using the DMF off-site backup. This commenced what we called the CSIRO Supercomputing Facility (CSF).
Cost write-back, the Share Scheme and the Development Fund: STK Tape library.
Back in 1990, funding for the Supercomputing Facility was constrained, and senior management was keen to have the costs attributed to Divisions. Two mechanisms were put in place. One, called the write-back, was applied at the end of each financial year. The total costs of the facility were apportioned to Divisions based on their usage, an extra appropriation amount equal to the cost was given to each Division (from the Institute Funds for the Supercomputing Facility), and then taken away from Divisions as expenditure. This achieved the costs of the facility being attributed to Divisions, but changed (for the worse) Divisions’ ratio of external earnings to appropriation funds, thus making it harder to meet the target (which was about 30% at this time).
The second scheme was called the Share Scheme. The idea came from a report by Trevor Hales of DIT of a funding mechanism used for a European network, where each contributor received a share of the resources proportional to their contribution. I set up a share scheme, inviting Divisions to contribute monthly, with a minimum contribution of $100 and a ‘floor-price’ from the Division of Atmospheric Research which contributed $10,000 per month (re-directing its spending on Csironet to this share scheme). The contributions went into a Development Fund, which was used to buy items to enhance the facility, e.g. commercial software, tape drives, and, in June 1993, a StorageTek Automatic Tape Library holding up to 6000 tape cartridges. We set shares in the Fair Share Scheduler on the Crays for the CSIRO Divisions proportional to the contributions. Later, the batch scheduler was enhanced to consider the shares when deciding which jobs to start. There was a problem with Divisions with small needs and contributions getting access, but this was solved following a suggestion from the Institute Director Bob Frater, who reported that some international bodies set voting rights for countries proportional to the square root of the population. This was implemented, to allow reasonable access for Divisions with low shares.
The CSF seemed to work: CSIRO provided the bulk of the funding and support staff, Cray Research managed the maintenance, and provide a systems administrator (Peter Edwards) and a Help Desk person (Eva Hatzi from LET). The University of Melbourne hosted the system and provided operators for two-shifts per weekday (and maybe some on weekends), etc. There were regular meetings between the parties, made easier by the fact that my brother Alan headed the University’s computing services at the time. A utilisation of 98.5% was achieved over the life of the system, with the utilisation being boosted after the installation of the tape library – my analysis showed that the automation paid for itself in reduced idle time over a year or so.
Utilities – the tardir family
In March 1992 as users were starting to exercise DMF on the /home filesystem on cherax, it was apparent that recalling many little files took a long time (especially with manual tape mounts) and over-loaded the system. I started a set of utilities, tardir, untardir and gettardir, to allow users to consolidate the contents of a directory into a tar (“Tape ARchive) file on disc, which would be likely to be migrated to tape, but also save a listing of the directory contents in a smaller file which would be more likely to stay on-line, as very small files were not being removed from the disc. This provided order of magnitude speedups for some workflows, and allowed users to scan the contents of a an off-line file before requesting recall. The untardir reversed the process, while gettardir allowed selective recalls. The tardir utilities remain in use today (2021), particularly in the “external backups” procedures developed by CSIRO Scientific Computing.
Around 1993-95, the CSF with Cray Research hosted development work on cherax by the designer of the America Cup syndicate. The designer, who had to be based in Australia, was offered time on Sun systems, but insisted on access to a Cray system. With the money that came from this, a fourth processor was acquired, worth about $A250k.
Bureau of Meteorology – HPCCC
The Bureau had also acquired a Cray Y-MP. In about 1996, the incoming CSIRO CEO, Malcolm McIntosh, reportedly asked, “What are we doing about supercomputing: I’m prepared to sign off on a joint facility with the Bureau.” This was enough to get the management and support staff of both organisations working together to bring this about. The technical team drew up specifications for a joint system, and went to tender: three companies responded: Fujitsu, NEC and Cray Research. One of the contentious parts was that I specified Fortran90-compliant compilers for the CSIRO benchmarks, and the Cray T90 outperformed the NEX SX-4 on these tests, but the Bureau didn’t specify Fortran90-compliance, and the NEC bid was better on the Bureau’s tests. Software quality was always difficult to measure, and the things we could measure came to dominate the evaluation, as often happens. In the end, NEC won the contract. (Some years later, a Cray Research employee noted that we had dodged a bullet with the T90 – it was unreliable. I remember a colleague from CEA France, Claude Lecouvre, reporting seeing Cray engineers in full PPE in CEA’s machine room, diagnosing an uncontrolled leak of fluorinert, which released poisonous gases if over-heated.)
Ini parallel with the tender evaluation, work was underway to draw up an agreement between CSIRO and the Bureau, which became the HPCCC (High Performance Computing and Communications Centre) allowing for the Bureau to be the owner of the shared equipment, for the Bureau to host the joint support staff on its premises, and for auxiliary systems to be co-located. Steve Munro from the Bureau was the initial manager, and I was appointed deputy manager (although I couldn’t act as manager, as I did not have Bureau financial delegations).
Staff moved into newly fitted-out premises on the 24th Floor of the existing Bureau Head Office at 150 Lonsdale St Melbourne in September 1997, with 8 staff members initially.
The SX-4 arrived in September 1997, and was installed in the Bureau’s Central Computing Facility (CCF) on the first floor, requiring some tricky crane-work.
Although the HPCCC awarded the contract to NEC, there were two aspects of its proposal that were considered deficient, and NEC agreed to under take developments to cover these aspects: scheduling and data management. Rob Thurling of the Bureau and I drew up specifications for enahncements.
The first problem was the lack of a ‘political’ fair-share scheduler. The HPCCC need the system to respond rapidly to operational work, but allow background work to fill the machine, and also to ensure that each party received its 50% share of resources. NEC set to work and wrote the Advanced Resource Scheduler (ARS), but after John O’Callaghan pointed out what the abbreviation ARS led to, the name was changed to Enhanced Resource Scheduler (ERS). An early version was available by the end of 1997, and this grew into a product which was later enhanced by NEC to support multi-node operation for the SX-6, allowing for preemption by high priority jobs, with checkpointing, migration to other nodes and restart for lower priority work. Other NEC SX sites used the product. There were over a hundred tunable parameters, and NEC continued to enhance the product to meet our suggestions through the life of the systems. (Jeroen van den Muyzenberg wrote one addition to implement a request from me. CSIRO liked to over-commit its nodes that weren’t running multi-CPU or multi-node jobs with single-CPU jobs, to maximise utilisation – otherwise, idle CPU time would accumulate when jobs were doing i/o for example. The addition was to tweak the process priorities for jobs (about every 5 minutes), giving higher priority to the jobs which were proportionally closest to their finishing time, and giving lower priority to jobs just starting. This resulted in jobs starting slowly, but accelerating as they neared completion. The HPCCC ran ERS on the NEC SX systems until their end in 2010.
The second problem was data management. Both CSIRO and the Bureau were running DMF on Cray Research systems – a J90 for the Bureau. NEC proposed the SX-Backstore product as a replacement to provide an integrated compute and data solution. There followed a development process by NEC to meet the specifications that we gave for a workable production HSM.
However, when testing was undertaken on site, a serious issue arose. One of the key requirements for a HSM is protection of the data, including restoration of all the files and information in the event of a crash and loss of the underlying filesystem (there was such a crash around that time on CSIRO’s Cray J916se system, with recovery being provided by the CSIRO systems administrator at the time, Virginia Norling, and taking 30 hours for millions of files). Ann Eblen set up a test file system on the SX-4 with about 30,000 files managed by SX-Backstore, took a dump to disc (about 5 minutes) and to tape (about 6 minutes), wiped the disc, and then set SX-Backstore to restore the filesystem. This took 46 hours, a totally unacceptable time – it looked like there was an n-squared dependency in the restore process. NEC found that a complete re-engineering would be needed to solve the problem, and the HPCCC agreed to accept from NEC compensation for the failure to deliver.
The Bureau had by this stage moved from an Epoch to a SAM-FS HSM, while CSIRO continued with DMF on a Cray J916se, which was acquired in September 1997 and installed in the Bureau’s CCF as an associated facility. This system was acquired at my insistence. The J916 had a HiPPI connection to the NEC SX-4, giving far higher bandwidth than the Bureau provided for its system with just Ethernet.
The naming of the SX-4 caused contention – the Bureau staff wanted to continue the Bureau’s naming scheme based on aboriginal names, but that was seen by CSIRO staff as cemneting the systems as being part of the Bureau, not part of th new joint enetity, the HPCCC. Eventually, the system was name bragg after eminent Australian scientists, and this convention continued in the HPCCC to florey, russell, eccles, mawson, and in CSIRO to burnet, bracewell, ruby, bowen.
In 1999, the HPCCC had to consider options for the second stage of the contract with NEC – more SX-4 equipment, or an SX-5. A team from the HPCCC (including me) and Bureau Research Centre travelled to Japan to test the SX-5, and this option was chosen to replace the SX-4. Around this time, the Bureau was concerned about reliably giving accurate forecasts for the Sydney Olympics, and wanted to have redundancy by acquiring a second SX-5. A brief was put to CSIRO to support this, and it was signed off to the tune of several million by the CEO, much to the reported annoyance of other senior CSIRO staff. So, there were two SX-5s, named florey and russell.
In early 2002, I was contacted by colleagues at the Arctic Region Supercomputing Center, who I had met at Cray and CUG meetings – Barbara Horner-Miller, Virginia Bedford and Guy Robinson. The ARSC was considering acquiring a Cray SX-6 (a re-badged NEC SX-6), and knew that I had had experience with NEC vector systems. Subsequently, I spent three short terms as a consultant at ARSC in Fairbanks, Alaska – April-May 2002, July 2002 and September 2003.
HPCCC – SX-6 era, 700 Collins St, SGI storage management, clusters
In 2002, as the SX-5s approached their end of life and the contract with NEC was to terminate, the HPCCC went out to tender for replacement systems. By this stage, Steve Munro had left the HPCCC, and new manager, Phil Tannenbaum had been appointed. Phil had worked for NEC in the USA, and was familiar with the company and its workings. The HPCCC prepared specifications, including a workload benchmark that I devised, using Bureau and CSIRO applications. The task was for the vendors to demonstrate that their systems could be filled with applications, but when operational jobs arrived, they would be started promptly (within seconds), preempting some of the background work. When the operational jobs had finished, the background jobs were to be resumed.
There were three main contenders for the contract: IBM, NEC and SGI. SGI did not attempt the workload benchmark but instead tendered a system to help transition away from vector architectures. IBM failed to demonstrate the workload benchmark, but NEC did! Just after NEC submitted its tender to the Bureau, one of the NEC staff members, Ed Habjan, walked past where I was sitting, and uttered the word, “sadist”!
Phil Tannenbaum also wanted a benchmark to explore the i/o capabilities of the systems. In half a day, I adapted some old benchmarks from 1989 to produce the CLOG benchmark, which measured write and read performance from an application with varying request sizes. This was subsequently used to monitor the performance of various systems through their life. Here’s an example, showing the results of running clog on a Global File System and a memory-resident files system, probably on an SX-6.
The performance climbs as the record size increases, as the buffer size is increased, and when switching from the disc-based GFS to the memory-based filesystems.
So, NEC was the leading contender, and the HPCCC organised a team to visit NEC in May 2003 for a live test demonstration. Phil managed to fail NEC on the first day, but subsequent attempts succeeded, and NEC won the contract.
By this stage, CSIRO had made the decision to diversify its central scientific computing platforms, and had negotiated with the Bureau to contribute only 25% of the cost of the new system, leaving the Bureau to fund the other 75%. The initial system of 18 SX-6 nodes was split 5 for CSIRO and 13 for the Bureau, and was installed in the new CCF in the new building at 700 Collins St in December 2003 while it was still being completed, and when the additional 10 nodes for the upgrade arrived, they were all assigned to the Bureau, owing to a quirk in the pricing schedule from NEC. There were two front-end NEC TX7 systems, based on Itanium processors.
The new building provided opportunities, with staging of StorageTek Powderhorn tape libraries between the old and new sites with the Bureau and CSIRO cooperating in the transition. CSIRO went out to tender for a storage management solution to replace DMF on the Cray J916. NEC, Cray and SGI bid, with the SGI tender being successful, providing DMF on an IRIX/MIPS platform. CSIRO was already on the path to Linux, and so contracted to run DMF on an Altix running Itanium processors and Linux. This was one of the first in the world, and came with some risks. The NEC and Cray bids failed to match the SGI bids because of the cost of the licences for the software based on the amount stored – in one case, exceeding the cost of storage media. The Altix was installed in early 2004, and was upgraded in June 2003 to provide a base for data-intensive computing – large memory, multiple processors, and access to the DMF-managed storage as closely as possible. The DMF-managed filesystem was the /home filesystems, as it had been on the preceding Cray Research vector systems.
CSIRO went out to tender for general-purpose cluster systems, and also for a cluster system to be a development platform for the ROAM application that was being developed by CSIRO Marine and Atmospheric Research under the Bluelink project with the Bureau and the RAN. IBM won the tender with a blade-based system, which we called burnet, with the ROAM platform being named nelson. These systems, along with the Altix and tape library were installed in the CCF under the associated facilities clause in the HPCCC agreement.
This section highlights some of the software I designed or developed in the 21st Century, to support the users and the systems for CSIRO Scientific Computing. The tardir family and the clog benchmark were mentioned above.
Time Create Delete
In 2007, when developing the clog benchmark (see above), I also developed a simple test of the performance of filesystems on metadata operations. This simple test timed the creation of about 10,000 nested directories in a file system, and timed the deletion of them. (A similar test was run in 1997 during acceptance tests on the NEC SX-4, and was abandoned incomplete after about 10 days.) Many operations on files do not involve bulk data movement, but do involve scanning filesystems to retrieve metadata, e.g. for backups or for scanning for files to be flushed (see below). I ran the tests on several systems around 23:00 each day, to allow for monitoring of the performance over time. Of course, there was a lot of variation in performance from day to day, because the tests were run on non-dedicated systems. Also, the test results depend on not just the underlying storage performance, but on the operating system – caching could have come into play.
Here’s an example of the performance of several filesystems, run from one of the SC cluster nodes.
There was a reconfiguration in September 2020 which led to reduced performance of the /home filesystem, and the /datastore filesystem (NFS mounted from ruby).
Phil Tannenbaum had suggested a system where users of the HPCCC services could go to a web site and see the status of the systems: other sites had such subsystems. There was software to do this for systems administrators (e.g. Nagios), but no obvious ones available for a user-facing service. When Phil was away in May-June 2006 for about two weeks, I set about creating such a system, with the generous help of Justyna Lubkowski of the Bureau who provided the web infrastructure. This system, dubbed the traffic lights, was able to be demonstrated to Phil on his return, and was subsequently enhanced to cover more services (HPCCC, CSIRO, National HPC Facilities), and to monitor items such as floating software licences. Here are some partial snapshots from 2021.
The first shows the groups view: services were put into groups, to give a quick overview (there were 71 services being monitored at this stage.) The status was shown by red, green or grey indicators.
The next snapshot shows all the services for the group ruby_datastore. The traffic lights could provide reports on incidents, downtimes and more (such as the current batch queue) for services. The notes provided a brief summary of the service, including recent downtimes or slowness.
The next snapshot shows the start of the downtimes records for one service. These were not particularly accurate, since the probing interval ranged from about 3 minutes to 30 minutes, depending on the service.
The next snapshot shows part of the report on the software licences. The ‘more’ button provides access to the licence logs, so that users waiting for a licence can find who is using the licences. The Access information link sends the user to the Science Software Map, which provides details on the licence conditions.
This service is still available to users, after nearly 15 years. The software is about 5500 lines of code.
Backups, data protection, data management, the Scientific Computing Data Store
When the HPCCC SX-4 service was being set up, and I was encouraging potential users to switch from the Crays, one of the users (Julie Noonan) said to me that they wouldn’t start using the SX-4 systems until backups were being done for the /home file systems. In the absence of a system product, I developed scripts using the rsh, rdist and cpio utilities to make backups onto the Cray J916se, and I used a Tower of Hanoi management scheme.
These scripts, started in April 1998, ran through to the end of the life of the SX-5s in 2004.
In February 1997, I organised a one-day workshop on Large Scale Data Management for CSIRO. I spoke on the topic: Storage Management – the forgotten nightmare. This period was a time when I was increasingly focussed on data storage and management for HPC users as much as on the HPC facilities and services. Around that time, I wrote:
Users typically want every file kept and backed-up, and would be happy to use only one file system, globally visible across all the systems they use, with high-performance everywhere, and infinite capacity!A user added that they want all of the above, at zero cost!
When I was acting as a consultant at the Arctic Region Supercomputing Centre in 2002, I conducted a review of its storage plans, and argued for two models of service: for those concentrating on HPC, they would be based on on the HPC servers, and would have to explicitly access storage servers. For those concentrating on working with data, they would be based as close to the storage as possible, i.e. directly using an HSM-managed filesystem, and would have to explicitly access HPC servers, e.g. by using batch job submission. This model continued through to 2021, with cherax and ruby SGI systems providing a platform for data-intensive processing, in tandem with HPC systems. Part of the inspiration for this model and these systems was a comment from one of the climate modellers, that the modelling was done (on the HPC system of the time), but he was so far behind in the data analysis.
These systems, and the closely-associated Data Store became one of my flagship endeavours, in attempting to provide users with a single large space for storing and working with data. Although the migrating filesystem for the /home area took some getting used to (because inevitably, the file you wanted was off-line), users with large data holdings valued the system for its unitary nature, and coded workflows to build pipelines allowing for efficient file recalls and file processing. Peter Edwards enhanced this experience by enhancing the dmget command to allow large recalls to be broken into carefully crafted batches, one for each tape needing to be accessed (stopping denial of service from a user requesting huge recalls), and allowing the interlacing of recalls and processing of batches of files. He also enhanced the dmput command, to allow one user to work with another user’s data and not cause problems with space management for the owner of the data.
One day in about 2007, Jeroen reported to me that there was a new facility in the rsync utility which might be of interest. Jeroen had taken over management of the backups on the CSIRO systems. The rsync utility, written by Andrew Tridgell at ANU, allowed efficient mirroring of files from one location to another, avoiding unnecessary transfers. The new feature was the –link-dest= – this allowed an rsync transfer from a source to a destination to be able to consider a third location (such as the previous day’s backup), and instead of transferring an identical file, just make a hard-link. The backups then because a series of directories (perhaps one per day), with there being only one copy of files common to multiple backups, but each directory appearing to be a complete or full backup (which it is). This has the advantage of providing a full backup every day, for the cost of an incremental backup – i.e. transferring only changed or new files.
Jeroen coded this into the backup suite, and he and I also developed Tower of Hanoi management of backup holdings. We used a DMF-managed filesystem as the targets for the backups, taking advantage of the in-built tape management. After Jeroen left, Peter Edwards took over the systems administrator. He and I continued to develop the capabilities of the backup suite, including the work by Peter to develop a directive-based front-end to specify filesystems to be backed up. Peter also found that a filesystem could be mounted twice onto a system, with the second mount being read-only. This allowed the backups of the users’ home filesystems to be made available on-line to the users, allowing for inspection and restoration. We did consider patenting some of the ideas, but instead made the ideas freely available. Here’s a picture of Tower of Hanoi puzzle.
I gave several talks on the backup suite: the first one, given to the DMF User Group in 2009 was entitled DMF as a target for backup. The key features are:
The techniques in use provide:
1. coverage back in time adjusting to the likelihood of recovery being needed
2. full backups every time, for the cost of incrementals
3. simple visibility of the backup holdings
4. simple recovery for individual files and complete file systems
5. no vendor dependency
6. centralised tape management
7. space saving of about a factor of five compared with conventional backups
8. directive-driven configuration
9. user visibility and recovery of files from the backups
The key utility for doing the Tower of Hanoi and other management is in a script called purge_backups.pl, started by Jeroen, and stretching now to 5740 lines. A note I wrote about some of the extensions to the original Tower of Hanoi management is at the Wikipedia page Backup rotation scheme under the heading Extensions and example.
In 2009, I gave a poster presentation at the eResearch conference entitled, Your Data, Our Responsibility.
The poster outlined some storage dilemmas for HPC centres, and then advocated the use of HSM, the use of the rsync utility and Tower of Hanoi management scheme for backups, using CSIRO’s experience as an example.
This backup suite continued to protect systems and user filesystems until 1st March 2021, when backups of the cluster home filesystems were switched to use the in-built snapshots capability, and Commvault.
Jeroen van den Muyzenberg started with CSIRO as systems administrator in 1999. He successfully wrote scripts to handle flushing of temporary filesystems in a rational way from December 2000. One script monitoring the target filesystem, and if it was more than a threshold (such as 95%) full, triggered another script. This used the large memory on the SGI systems to slurp in details of the entire holdings on the target filesystem. This list was then sorted (based on the newer of access and modify times), and the oldest files (and directories) were removed. This worked successfully for several years.
However, in 2016, flushing was needed for the CSIRO cluster systems, which did not have a big memory, and indeed the filesystems were hosted on servers without a large memory, which were the most suitable hosts for such housekeeping operations. Around that time, I read an article It Probably Works by Tyler McMullen in the Communications of the ACM (November 2015, Vol. 58 No. 11, Pages 50-54).
I realised that we don’t need to know the oldest file to start flushing – we just need to know a collection of the older files. This led to an algorithm which scanned a filesystem, and registered the files in ‘buckets’ according to their age. This removed the need for a sort. Then it became apparent that the scan could be performed in advance and the results saved, ready for a flushing process when needed. This separation of scanning and flushing meant that the system was always ready when a flush was needed, and in practice the flushing could be started within a few seconds from when a monitoring process signalled that flushing should be started. The only extra step for the flushing was to recheck for the existence of a candidate file, and whether its access or modify times were still within the range of the bucket.
The bucket boundaries were determined from the date of the start or last flush of a filesystem at one end, and a cut-off period, e.g. 14 days; we guaranteed to the users that we would not invoke the flushing on files younger than 14 days old.
The implementation was done by Steve McMahon from March 2016, with additions by Ahmed Arefin and Peter Edwards who added some production hardening.
Here is a list of features:
- New scalable flushing algorithm
- Separates scanning from flushing
- Eliminates the sort
- Can have lists ready for action in advance
- 2 second response time!
- Technical report available
- Open Source
- Allows increased default quotas
The scalable flushing code is in use on the CSIRO SC systems for 4 filesystems (March 2021). The graph below shows the action of flushing on a file system, showing the date of the oldest surviving files – flushes are marked by the vertical lines, reducing the age of the oldest surviving file.
The next graph shows the the total space occupied by the files in each bucket. In this case, a flush has recently occurred, and the buckets marked in red have been processed.
(The abscissa labels are of the form YY-MM, the last two digits of the year, and the digits of the month.)
In early 2018, CSIRO Scientific Computing started the process of removing a /data area from its systems. Here is a table of just some of the available filesystems on the cluster.
The /data area ($DATADIR) was subject to quotas, but when it filled, there was no good way to manage the holdings – no migration (HSM) nor flushing, and the old way of sending users a list of the biggest users (“name and shame”) was akin to bullying, in trying to use peer pressure to get users to remove old files. However, I wanted users to be able to maintain a collection of files on the systems bigger than the home file system could support, to be able to protect the files, but also not to lose files through flushing (when the only available space was a filesystem subject to flushing). In April 2018, I started on a utility to deal with mirrors of a working area. So, a user could set up an area on a flushable filesystem, then run the utility with
and a mirror of all the files would be created on the HSM storage ($STOREDIR) above, with intelligent consolidation of small files into tardir archives. If the user was away for a while, and some of the collection had been flushed, the user could restore the collection with the utility
Other options allowed for the updating of the mirror. The utility was installed to run on ruby, the clusters, and the NCI systems. Later versions supported multi-host operations, such as providing a mirror on the CSIRO systems of a collection held at NCI. Here is a list of all the operations supported.
create sync update cleanse delete check status flush list help kill moveto explain dev_history release recall removetmp man restore auditw auditm verify config getremote putremote discover rebuild
Utilities were also enhanced to allow profiling of the contents of an area – size, access time and modify time. Here is an example: when produced interactively, the plot could be rotated and zoomed.
Back to contents