Sidebar 6: Robert Bell personal computing history

By February 8th, 2021

These pages attempt to give some of the history of CSIRO’s use of computing in its research, focussing mainly on the large shared systems and services.

Sidebar 6: Robert Bell personal computing history

Last updated: 17 Jun 2024.
Robert C. Bell

Largely complete.
2021-09-20 – added note about colleagues.
2021-10-25 – expanded the quote from Paul Davies.
2023-08-12 – spelling correction, and note about QBO.
2024-05-17. Added material about DCR interactions – computing costs.
2024-05-20. Added material about resource allocation, scheduling and job progress.
2024-06-17. Fixed some typos.

Rob Bell with CSIR Mk 1 (CSIRAC) at the Melbourne Museum.

Contents

Introduction
Early life
Maths and School, and first computing
University and Vacation Employment
Post-graduate
CSIRO Aspendale
ITCE
Collaborations
UK Met Office visit
Jorgen Frederiksen
DCR/Csironet interactions and charging
DAR Computing Group
SFTF
JSF and SSG
CUG and DMF
DMF
CSF
Cost write-back, the Share Scheme and the Development Fund: STK Tape library.
CSF Collaboration
Utilities – the tardir family
America Cup
Bureau of Meteorology – HPCCC
HPCCC – SX-6 era, 700 Collins St, SGI storage management, clusters
National Facilities
Software
Time Create Delete
Job progress
STREAM2
STORE/RECALL
Traffic Lights
Backups, data protection, data management, the Scientific Computing Data Store
Flushing
Mirrors
Tenders, purchasing, acquisitions
CSIR Mk I
CDC 3200
Aspendale
The Cray Research Era
The HPCCC era
The Aspendale Cray J90 system (mid 1990s)
HPCCC SX-6 procurement (2002-03) – see above
The Sun Constellation systems for APAC and the Bureau (c. 2008)
The NCI raijin system (2012-2013)
Multiple StorageTek tape drives and libraries
The Pawsey magnus and storage systems (2010-2014)
The CSIRO cluster systems
CSIRO Data Store hosts, software and disc storage, MAID
Bureau Cray system (Aurora) (2013-2014)
The NCI gadi system
Resource Allocation and Scheduling
Colleagues
Personal Reflections

Introduction

In 2003, I gave a talk at a Dinner hosted by RMIT Engineering, and talked about my career in terms of Technology: Mathematics, Meteorology, Machines.

Early life

I was born in Hobart in 1949 (the same year that CSIR Mk1 came to life), and lived there until September 1951 when our family (Father Powell, Mother Gwenyth, brothers Alan and John, and sister Margaret) moved to Melbourne to the suburb of Bentleigh.  My family had originally come from NSW – my Mother from Terara near Nowra, and my Father from Sydney.

My Father was the Accountant at the main Melbourne branch of the Commonwealth Savings Bank at 8 Elizabeth St, and was promoted to Inspector before retiring in 1973.  My Mother was never in paid employment, working on the family dairy farm until she was married.  But there was a history of scholarship in her family, with one uncle being a school inspector, and another (Robert Boyd) being a civil engineer, being Chief Engineer of the NSW Railways and a president of the Institute of Engineers .

Maths and School, and first computing

Maths and numbers seemed to be around us.  Prior to my going to school, I could tell you the number of each of the animal cards in my collection from the cereal packets.  I attended Bentleigh West primary school, and then Brighton High School.  In grade 4, after parent/teacher interviews with Mr Curlis, I was encouraged to learn more about Maths, and my parents bought an additional textbook.

While at primary school, we acquired an old typewriter from my Uncle Albert’s business (C. Bell & Son).  My Father did the monthly accounts for the business, and he was rewarded by ‘presents’ from Uncle Albert.  I set to to type up a multiplication table.  I don’t remember how many rows it had, but think it went to at least 20.  I typed up successive pages to the right as the table was extended, and I do remember that I reached 60.  Pages were sticky-taped together, and rolled up like a scroll.  (Of course, the typing was done on the back of old paper:  paper was not easy to come by, but my Father brought home from the Commonwealth Savings Bank some surplus paper, seemingly only once per year).

I won a Mothers’ Club scholarship from Bentleigh West.

I started at Brighton High school in 1961, and soon became interested in the weather through the Geography syllabus.  I set up my own weather station at home in September 1961, and started taking twice-daily observations, which continued until I left home in 1974 (the records are on paper at home).  Here is an article I wrote which was published in the school magazine Voyager in 1962.
I continued to be good at Maths.  In year 11 and 12, I benefitted from having Mrs Frietag as Maths teacher.  She had a laconic style, and I remember her handing back our exam papers one time, and she reluctantly saying that she couldn’t find anything wrong with my effort!  In the year 12 mid-year exams, in one of the Maths subjects, she failed nearly everyone – I think she had used an old end-year exam paper, and we hadn’t covered all the topics yet.  There were protests, and the results were scaled so more students passed.  I was a bit annoyed that she arbitrarily gave me a score of 95 when the scaling-up produced a result for me around 110 out of 100.

At the end of year 11 in 1965, I was called in by the Principal to encourage me to attend a summer Maths camp at Somers.  He said that I could win an exhibition.  I attended the camp, and subsequently did gain a general exhibition, awarded for the top 30 students in the state.  I also won a state Senior Scholarship, which provided money!

My brother Alan joined the Department of Defence in early 1960 after completing a science degree at the University of Melbourne.  (I found out years later that he had done a subject on computing, and had used CSIRAC.)  In September 1961, he left home to work for a year at GCHQ in Cheltenham, UK. We (and especially me) at the time had no idea of what he worked on.  We knew he was in Defence Signals.  Part way through his year for a few weeks, his aerogramme letters started coming from Bletchley Park instead of Cheltenham, and on his way home in November/December 1962, we received letters from Silver Springs, Maryland.

In about 1963-1964, he started to teach me about computers, with a ‘model’ showing pigeon holes as place where numbers and instructions could be stored.

In 1965, I attended an ISCF Science Camp (Inter-School Christian Fellowship) at Belgrave Heights; the camp was run by the Graduates Fellowship, including my brother Alan.  We were taught the beginnings of Fortran programming, and he took the programs we wrote to get them punched onto cards, and compiled and ran the programs (on a CDC 3400 at DSD that was under acceptance testing I found out later).  Thus I wrote my first computer program in 1965: the program calculated the period of a pendulum for various lengths using the well known formula.

In 1966, my sister Margaret commenced work with the Bureau of Meteorology as a programmer in training, having completed a BSc(Hons) majoring in Applied Maths at Monash University.  The Bureau used the CSIRO computing network from then until it acquired the first of its twin IBM 360/65 systems in 1968.

University and Vacation Employment

I was keen on a career in Meteorology, and enrolled in a BSc at Monash University, commencing in 1967.  During 1967, I wrote to CSIRO Division of Meteorological Physics at Aspendale, and was subsequently employed as a vacation student for 1967-68.  I worked for Reg Clarke. I spent the first hour of each day punching in data from the Wangara Expedition onto cards, and the subsequent hours with a large mechanical desk calculator (the sort with ten rows of ten buttons), and log book and pencil and paper, calculating u.cos(theta) + v.sin(theta) from more Wangara observations.  It’s great to see that data from Wangara is now publicly accessible through ARDC – I thought it might be lost!  https://researchdata.edu.au/wangara-experiment-boundary-layer/681872
(However, not all the data was published, and I suspect the raw data is gone.)

I also learnt where the off switch was in the radar tower, as a safety measure to cover the absence of other staff.  Andreij Berson was studying dry cold fronts, which showed up on radar for unknown reasons: I can remember his scanning an approaching dry cold front with binoculars and shouting “birds!”. See https://publications.csiro.au/publications/publication/PIprocite:43a9d45f-bc88-4d4b-af4c-801773391cff/BTauthor/BVberson,%20f.%20a./RP1/RS25/RORECENT/RFAuthor=berson%2C%20f.%20a./STsearch-by-keyword/LIBRO/RI7/RT38

I also started reading from the library at Aspendale A Guide to FORTRAN Programming 1961 by Daniel D. McCracken, and tried writing a simple program to be run on the DCR CDC 3200 at Clayton – the program didn’t work, because I didn’t understand array dimensioning.
Fortunately in 1968, I studied numerical analysis and Fortran programming at Monash, with John O. Murphy lecturing, and gained more understanding.

The next year, I was again successful in gaining vacation employment at Aspendale.  This time, Reg Clarke assigned me to write a program to model a sea breeze, using the equations derived in a paper (Estoque, M. A. A theoretical investigation of the sea breeze https://doi.org/10.1002/qj.49708737203), and Reg’s own boundary layer parameterisations.  I tried to make progress, but had no idea of programmability, and foolishly programmed everything with explicit constants in a premature attempt for speed.  Despite a few attempts later in 1969 to get the program working, it never did.  I did generate successful routines to solve some of the parameterisations.

In 1969, only one of my units at Monash involved computing – Astrophysics.

I applied to CSIRO to work at Aspendale in the next vacation, but was instead offered a position at the Commonwealth Meteorology Research Centre –  a joint CSIRO/Bureau of Meteorology centre.  I commenced there working for Doug Gauntlett in the IOOF building in McKenzie St Melbourne.  Doug asked me to investigate solvers for Helmholtz equations, which were required at each time-step of the current weather forecasting models.  In particular, to investigate the alternating-direction implicit (ADI) method, which was developed by the nuclear research community.   I built a framework to test various algorithms, including successive over-relaxation, a method developed by Ross Maine, and ADI.  ADI proved to be the fastest in the cases under investigation, where a good first guess was available, such as in a time-stepping model where the field from the previous time-step was likely to be close to the required solution for the current time step, and absolute accuracy was not needed.  (It turned out that a fellow student, David Bover, on the same vacation was working on ADI for the ARL.)  During this vacation, I used the Bureau’s IBM 360/65 systems, and so learnt some JCL.

I did no computing during my honours year, which probably helped, and graduated with first class honours in Applied Maths.

I again worked at CMRC in 1970-71, and wrote a report on the work: Comparisons between explicit and semi-implicit time differencing schemes for simple atmospheric models

Post-graduate

I commenced a PhD in Applied Mathematics at Monash University in early 1971, with Roger K. G. Smith as supervisor.  I wanted to do something meteorological with computing, but Roger suggested doing work on quasi-geostrophic models of ocean circulation.  Quasi-geostrophic equations were an earlier (successful) simplification of the equations governing the flow of the atmosphere, when the earth’s rotation dominated the forces acting on the atmosphere.  I was not happy about oceans rather than atmosphere, but started the work, and did build a successful model, which did show the main features of large-scale oceanic flow.  I used the Monash CDC 3200 for this work.  Unfortunately for me, Roger Smith left for the University of Edinburgh, and I did not follow.  Bruce Morton took over my supervision, and suggested looking at stromatolites in Shark Bay WA, but I did not get far with this.  Roger Smith returned to Monash, I wrote a more accurate quasi-geostrophic model which I ran on the Monash Burroughs 6700. But, the lure of computation distracted me from the PhD.

I did learn a lot about computing, and was appointed as post-graduate representative on the Monash Computing Advisory Committee, at a time when replacements for the CDC 3200 were being considered.  There were proposals from CDC, Burroughs, IBM, UNIVAC and one other (Honeywell?), but it became clear that the Computing Centre was committed to buying the Burroughs 6700 as a successor to the Burroughs 5500.  One of the Applied Maths professors likened it to a university library, on discovering that it could earn money from lending romantic novels to the community, threw out all the science journals and texts and bought more novels!  The Burroughs machines acted as backups for the Hospital computing services.  I developed my first benchmark, program helm, solving a Helmholtz equation by the method of successive over-relaxation, which was run on many machines over subsequent years.  A table of results can be seen here.

CSIRO Aspendale

I accepted a 3-year appointment at Aspendale, starting in late 1974, and there commenced developing a model of airflow in support of the Latrobe Valley Study.  I eventually did finish my PhD, but was not making much progress with the airflow model.  People like Peter Manins tried to help me in my research career, but I think I was too proud to accept advice.  In 1977, the Chief, Brian Tucker, offered me an indefinite appointment as an Experimental Officer, which I was grateful to accept – a position allowing me to support researchers by doing computation for them, and I had found my niche.

ITCE

In October 1976, the Division hosted the International Turbulence Comparison Experiment (ITCE) at Conargo, NSW, one of the flattest areas on earth!  Three weeks before the start, I was asked to help with the computing side, working with Neil Bacon and Graham Rutter.  This involved writing programs to deal with the acquisition and calibration of the data, and was to be run on an HP 21MX mini-computer.  I spent about 10 days in Deniliquin/Conargo helping to set up the computing services in a caravan.
         PROGRAM WA
C———————————————————————–
C
C        WA IS THE FIRST OF THREE PROGRAMS TO ANALYSE I. T. C. E. CORE
C        DATE FROM MAGNETIC TAPE
C        THE MAIN STAGES OF WA ARE:
C        1. SETUP AND INITIALIZATION
C        2. INPUT OF FUNCTION SPECIFICATIONS FROM PAPER TAPE FROM THE
C           21MX.
C        3. INPUT OF SPECTRA  SPECIFICATIONS FROM PAPER TAPE FROM THE
C           21MX.
C        4. PROCESSING OF BLOCKS OF DATA FROM MAGNETIC TAPE. THIS STAGE
C           CONSISTS OF –
C           A. INPUT FROM MAGNETIC TAPE.
C           B. CONVERSION TO VOLTAGES.
C           C. SELECTING THE CORRECT SUBROUTINE FOR CALIBRATION.
C           D. COLLECTING SUMS FOR AVERAGING, ETC.
C           E. OUTPUTTING REQUIRED CALIBRATED DATA TO DISC FOR SPECTRA.
C        5. CALCULATION AND PRINTING OF AVERAGES, ETC.
C        6. OUTPUT OF CONTROLLING DATA AND AVERAGES, ETC. FOR WB AND WC.
C
C        NOTE. THROUGHOUT THIS PROGRAM, THE WORDS FUNCTION AND
C              SUBROUTINE ARE BOTH USED TO DESCRIBE THE EXPERIMENTER-
C              SUPPLIED SUBROUTINES.
One of the surprises to me was when I ran a program to calculate means and variances from the data, to find that I had negative variances!  I used a well-known formula for variances which allowed a single pass through the data:
instead of the mathematically equivalent:
The 32-bit floating-point arithmetic on the HP-21MX did not have enough precision to avoid catastrophic cancellation that the first formula allowed.  I later researched summation algorithms (Kahan and others), and developed block algorithms which provided high accuracy for the calculation of means and variances in a single pass (unpublished).

Collaborations

When I returned from ITCE, I found Rory Thompson sitting at my desk.  I worked for Rory Thompson (my worst time in CSIRO – I feared him, for good reason as we found out later), Angus McEwan, Alan Plumb (I programmed the first successful model of the Quasi-Biennial Oscillation for Alan, and there were two papers in the Quarterly Journal of the Royal Meteorological Society, one of which was highlighted in the 150 years of QJ), Peter Webster, Peter Baines and latterly Jorgen Frederiksen, with whom I had a productive partnership over several years during the time he won the David Rivett medal.   He kindly made me joint author on several papers.

UK Met Office visit

In 1983-84, I visited the UK Meteorological Office for a period of six months to gain early experience with a Cyber 205, and to begin the porting of CSIRO codes to it.  More details are given here.  See also Csironet News no. 178, August 1984 – Cyber 205 experiences – R. Bell.

Jorgen Frederiksen

One of the projects with Jorgen involved trying to improve code that he had that looked at atmospheric stability – fastest growing modes, blocking, etc.  I found that over 90% of the run time was in setting up interaction coefficients, and less than 10% of the time was spent in solving the eigenvalue problem.  Furthermore, I found that the interaction coefficients could be calculated separately, and once only, and saved.  This led to a huge speed-up, and allowed much larger problems to be tackled.

Another problem involved computing ensembles, and I was able to vectorise the code for the Cyber 205 over the ensemble members, to get great speed-up.

DCR/Csironet interactions and charging

During these years, I tried to take advantage of every useful facility that DCR/Csironet provided to support the scientific effort.  I used and promoted the use of the source code control system UPDATE, I could write Ed box programs, I promoted the use of standard Fortran, I built libraries of useful code (a set of routines for solving block tri-diagonal systems, used in the QBO work, and by Peter Webster) and wrote utilities to help manage data holdings.  I had two stints working in the User Assistance Section in Canberra.  I started writing an anonymous column for Csironet News (Stings and Things by Scorpio – 23 columns – bi-monthly from 1979 to 1983.)

My first Letter to the Editor of the DCR Newsletter appears to be in October 1978, where I queried the DCR 25% increase in computing charges, in the light of much lower increases in Divisional funding.
In retrospect, the reply given by Terry Holden, Technical Secretary, seemed unconvincing, with his highlighting a new low priority job class available.

This price increase was one on many over the years.  As users acquired mini-computers, and later, personal computers, the range of users of DCR/Csironet systems reduced.  Having self ownership allowed experimentation, and as one scientist (Art Raiche) told me once, we could afford to make mistakes!  This flight from Csironet then left the Division to cover fixed costs from a narrowing user base, which entailed higher charging rates.  This led to under-utilised systems, and so on.  It was also hard to budget operational funds when the un-forecast charges for Csironet usage were simply extracted from Divisional funds each month.  This constituted, to me, a death spiral, and ultimately CSIRO walked away from using the privatised Csironet by 1990.  From there onwards, whenever I had the chance, I opposed end-user charging for computing access, arguing, that, for example, if a division bought an electron microscope for its work, it then didn’t charge users to use it, and computing should be similar.  This then led to questions of how to allocate resources on a ‘free’ service.  This was largely solved through the “share scheme” (see below), and the fair-share scheduler and the like.

DAR Computing Group

In about 1986, the Chief asked me to consider taking on the role of Computing Group Leader, which I had done on a temporary basis in June-August 1985.  I accepted the position, and started in March 1987.  Tony Eccleston joined the group as well, with the existing staff of Graham Rutter, Jill Walker and Hartmut Rabich.  Staff issues dominated, as we sought to establish a new UNIX-based local computing environment, and to ban smoking in the computing building including the machine room!  After going out to tender, running benchmarks, and evaluating proposals, Silicon Graphics won over Sun and HP (and maybe others) with a clear performance advantage.  A UNIX server was installed for general computing use.

SFTF

With the privatisation of Csironet underway, and no clear path for a successor to the Cyber 205 for scientific computing work, in 1989 the CSIRO Policy Committee on Computing set up the Supercomputing Facilities Task Force (SFTF), to decide on follow-on facilities from the Cyber 205.  See Chapter 5 .

I was heavily involved and managed the benchmarks that were assembled from codes from several CSIRO Divisions, along with some specific benchmarks to test key areas such as memory performance.  I travelled with Bob Smart to the USA for two weeks to undertake benchmarking and to explore options.   This was our first visit to the USA.  We visited Purdue University, which was running UNIX on an ETA-10, and still running a CDC 6500 system.  We visited Control Data in Minneapolis, then Cray Research at Mendota Heights and Chippewa Falls.  We visited Convex in Richardson Texas, and finally the National Center for Atmospheric Research in Boulder Colorado for the Users Conference.  One of my benchmarks measured the performance of code using varying strides through memory, and showed dramatic decreases in performance with large strides on systems without ‘supercomputer’ memory architecture.  I presented results at the Third Australian Supercomputer Conference proceedings in 1990 under the title Benchmarking to buy.

When decision-time came in August 1989 at the PCC, my Chief, Brian Tucker, insisted that I should be present along with Mike Coulthard, who chaired the SFTF.  The PCC decided on the Cray Research/Leading Edge Technologies shared Cray Y-MP proposal.

JSF and SSG

I was then heavily involved in setting up the partnership (Joint Supercomputing Facility) with LET in Port Melbourne, establishing the service, and had sole responsibility for running the acceptance tests in March 1990 – 16 hours per day re-running the benchmarks for about a week on cherax, the name we gave the system (SN1409) and subsequent platforms.  I was not present all the time, but relied on Cray Research staff to start the benchmarks at 8 AM each day, and terminate them at midnight.

I continued to help with the setting up of the service, on one occasion accompanying 3 staff from Aspendale to visit LET with a magnetic tape to set up their programs, prior to acceptable networking facilities being set up by Bob Smart.
The position of Supercomputing Support Group leader was advertised, to be based at the Division of Information Technology at 55 Barry St Carlton, and I was successful in gaining the job, starting (initially for 3 days per week on secondment from DAR) in May 1990.  I had by then relinquished the Computing Group Leader position at Aspendale, to concentrate on the establishment of the Joint Supercomputing Facility.  I was joined by Marek Michalewicz, Simon McClenahan, and Len Makin to form the group of four.

In the second half of 1990 I was involved (with Peter Boek from LET and Peter Grimes of Cray Research) on a roadshow to all the major CSIRO sites (all capitals, and Townsville) to publicise the new service.  The uptake was good in several Divisions of CSIRO, but those with computing needs which could be met with existing PCs, workstations and Divisional facilities (including mini-supercomputers), did not make great use of the JSF.

At the end of 1990, I presented the paper Benchmarking to Buy at the Third Australian Supercomputer Conference in Melbourne, based on our experiences.

CUG and DMF

In April-May 1991, I was fortunate to be able to attend my first Cray User Group meeting – in London, and then visit several other supercomputing sites, including the UK Met Office, ECMWF, NCSA, NCAR and SDSC.  At CUG, I had fruitful meetings with Charles Grassl and others, as I presented results from the benchmarking of the memory subsystems of various computers.   These results illustrated the large memory bandwidth of the Cray Research vector systems of the time, compared with cache-based systems systems.  I also learnt about Cray Research’s Data Migration Facility, which would become pivotal in CSIRO’s subsequent scientific computing storage services.

I later served two terms on the CUG Board of Directors as Asia/Pacific Representative, and presented two papers: “Seven Years and Seven Lessons with DMF”, and a joint paper with Guy Robinson comparing the Cray and NEC vector systems (Cray was marketing the NEC SX-6 as the Cray SX-6 at the time).

DMF

We quickly found that the Cray Y-MP turned a compute problem into a data storage problem – the original system had 1 Gbyte of disc storage (DD-49s) for the CSIRO home area, and the only option for more storage was manually mounted 9-track magnetic tapes.  LET wished to acquire cartridge tape drives for its seismic data processing business, and CSIRO assisted in a joint purchase of such drives from StorageTek.  This set up minimal requirements to invoke DMF on the CSIRO /home area, which was done on 14th November 1991, so that more dormant files would be copied to two tapes, and subsequently have their data removed from disc, but able to be restored from tape when referenced.  This took some getting used to for users, but in the end the illusion of near-infinite storage capacity was compelling, and skilled users learnt how to drive pipelines of recall and process.  Thus, I had (unwittingly at the time) re-created the DAD Document Region functionality on the CDC 3600, with automatic migration to tape, and recall when required.

CSF

At the end of 1991, economic circumstances put LET under threat – see Chapter 5.  DMF allowed us to institute an off-site backup regime, just in case.  Cray Research put a proposal to CSIRO to establish a new service, in conjunction with and situated at the University of Melbourne, with a Cray Research Y-MP 3/464, and service started there on 1st August 1992, with the data being transferred from the previous Y-MP using the DMF off-site backup.  This commenced what we called the CSIRO Supercomputing Facility (CSF).

Cost write-back, the Share Scheme and the Development Fund: STK Tape library

Back in 1990, funding for the Supercomputing Facility was constrained, and senior management was keen to have the costs attributed to Divisions.  Two mechanisms were put in place.  One, called the write-back, was applied at the end of each financial year.  The total costs of the facility were apportioned to Divisions based on their usage, an extra appropriation amount equal to the cost was given to each Division (from the Institute Funds for the Supercomputing Facility), and then taken away from Divisions as expenditure.  This achieved the costs of the facility being attributed to Divisions, but changed (for the worse) Divisions’ ratio of external earnings to appropriation funds, thus making it harder to meet the target (which was about 30% at this time).

The second scheme was called the Share Scheme.  The idea came from a report by Trevor Hales of DIT of a funding mechanism used for a European network, where each contributor received a share of the resources proportional to their contribution.  I set up a share scheme, inviting Divisions to contribute monthly, with a minimum contribution of $100 and a ‘floor-price’ from the Division of Atmospheric Research which contributed $10,000 per month (re-directing its spending on Csironet to this share scheme).  The contributions went into a Development Fund, which was used to buy items to enhance the facility, e.g. commercial software, tape drives, and, in June 1993, a StorageTek Automatic Tape Library holding up to 6000 tape cartridges.  We set shares in the Fair Share Scheduler on the Crays for the CSIRO Divisions proportional to the contributions.  Later, the batch scheduler was enhanced to consider the shares when deciding which jobs to start.  There was a problem with Divisions with small needs and contributions getting access, but this was solved following a suggestion from the Institute Director Bob Frater, who reported that some international bodies set voting rights for countries proportional to the square root of the population.  This was implemented, to allow reasonable access for Divisions with low shares.

CSF Collaboration

The CSF seemed to work: CSIRO provided the bulk of the funding and support staff, Cray Research managed the maintenance, and provide a systems administrator (Peter Edwards) and a Help Desk person (Eva Hatzi from LET).  The University of Melbourne hosted the system and provided operators for two-shifts per weekday (and maybe some on weekends), etc.  There were regular meetings between the parties, made easier by the fact that my brother Alan headed the University’s computing services at the time.  A utilisation of 98.5% was achieved over the life of the system, with the utilisation being boosted after the installation of the tape library – my analysis showed that the automation paid for itself in reduced idle time over a year or so.

Utilities – the tardir family

In March 1992 as users were starting to exercise DMF on the /home filesystem on cherax, it was apparent that recalling many little files took a long time (especially with manual tape mounts) and over-loaded the system.  I started a set of utilities, tardir, untardir and gettardir, to allow users to consolidate the contents of a directory into a tar (“Tape ARchive) file on disc, which would be likely to be migrated to tape, but also save a listing of the directory contents in a smaller file which would be more likely to stay on-line, as very small files were not being removed from the disc.   This provided order of magnitude speedups for some workflows, and allowed users to scan the contents of a an off-line file before requesting recall.  The untardir reversed the process, while gettardir allowed selective recalls.  The tardir utilities remain in use today (2021), particularly in the “external backups” procedures developed by CSIRO Scientific Computing.

America Cup

Around 1993-95, the CSF with Cray Research hosted development work on cherax by the designer of the Australian America Cup syndicate, Bruce Rosen.  The designer, who had to be based in Australia, was offered time on Sun systems, but insisted on access to a Cray system.  With the money that came from this, a fourth processor was acquired, worth about $A250k.  Here is a letter of appreciation received for the team – CSIRO SSG and the Cray Research staff.

Bureau of Meteorology – HPCCC

The Bureau had also acquired a Cray Y-MP.  In about 1996, the incoming CSIRO CEO, Malcolm McIntosh, reportedly asked, “What are we doing about supercomputing: I’m prepared to sign off on a joint facility with the Bureau.”  This was enough to get the management and support staff of both organisations working together to bring this about.   The technical team drew up specifications for a joint system, and went to tender: three companies responded: Fujitsu, NEC and Cray Research.  One of the contentious parts was that I specified Fortran90-compliant compilers for the CSIRO benchmarks, and the Cray T90 outperformed the NEX SX-4 on these tests, but the Bureau didn’t specify Fortran90-compliance, and the NEC bid was better on the Bureau’s tests.  Software quality was always difficult to measure, and the things we could measure came to dominate the evaluation, as often happens.  In the end, NEC won the contract.  (Some years later, a Cray Research employee noted that we had dodged a bullet with the T90 – it was unreliable.  I remember a colleague from CEA France, Claude Lecouvre, reporting seeing Cray engineers in full PPE in CEA’s machine room, diagnosing an uncontrolled leak of fluorinert, which released poisonous gases if over-heated.)
In parallel with the tender evaluation, work was underway to draw up an agreement between CSIRO and the Bureau, which became the HPCCC (High Performance Computing and Communications Centre) allowing for the Bureau to be the owner of the shared equipment, for the Bureau to host the joint support staff on its premises, and for auxiliary systems to be co-located.  Steve Munro from the Bureau was the initial manager, and I was appointed deputy manager (although I couldn’t act as manager, as I did not have Bureau financial delegations).
Staff moved into newly fitted-out premises on the 24th Floor of the existing Bureau Head Office at 150 Lonsdale St Melbourne in September 1997, with 8 staff members initially.
The SX-4 arrived in September 1997, and was installed in the Bureau’s Central Computing Facility (CCF) on the first floor, requiring some tricky crane-work.
Although the HPCCC awarded the contract to NEC, there were two aspects of its proposal that were considered deficient, and NEC agreed to under take developments to cover these aspects: scheduling and data management.   Rob Thurling of the Bureau and I drew up specifications for enhancements.

The first problem was the lack of a ‘political’ fair-share scheduler.  The HPCCC need the system to respond rapidly to operational work, but allow background work to fill the machine, and also to ensure that each party received its 50% share of resources.  NEC set to work and wrote the Advanced Resource Scheduler (ARS), but after John O’Callaghan pointed out what the abbreviation ARS led to, the name was changed to Enhanced Resource Scheduler (ERS).   An early version was available by the end of 1997, and this grew into a product which was later enhanced by NEC to support multi-node operation for the SX-6, allowing for preemption by high priority jobs, with checkpointing, migration to other nodes and restart for lower priority work.  Other NEC SX sites used the product.  There were over a hundred tunable parameters, and NEC continued to enhance the product to meet our suggestions through the life of the systems.  (Jeroen van den Muyzenberg wrote one addition to implement a request from me.  CSIRO liked to over-commit its nodes that weren’t running multi-CPU or multi-node jobs with single-CPU jobs, to maximise utilisation – otherwise, idle CPU time would accumulate when jobs were doing i/o for example.  The addition was to tweak the process priorities for jobs (about every 5 minutes), giving higher priority to the jobs which were proportionally closest to their finishing time, and giving lower priority to jobs just starting.  This resulted in jobs starting slowly, but accelerating as they neared completion.  The HPCCC ran ERS on the NEC SX systems until their end in 2010.

The second problem was data management.  Both CSIRO and the Bureau were running DMF on Cray Research systems – a J90 for the Bureau.  NEC proposed the SX-Backstore product as a replacement to provide an integrated compute and data solution.  There followed a development process by NEC to meet the specifications that we gave for a workable production HSM.

However, when testing was undertaken on site, a serious issue arose.  One of the key requirements for a HSM is protection of the data, including restoration of all the files and information in the event of a crash and loss of the underlying filesystem (there was such a crash around that time on CSIRO’s Cray J916se system, with recovery being provided by the CSIRO systems administrator at the time, Virginia Norling, and taking 30 hours for millions of files).  Ann Eblen set up a test file system on the SX-4 with about 30,000 files managed by SX-Backstore, took a dump to disc (about 5 minutes) and to tape (about 6 minutes), wiped the disc, and then set SX-Backstore to restore the filesystem.  This took 46 hours, a totally unacceptable time – it looked like there was an n-squared dependency in the restore process.   NEC found that a complete re-engineering would be needed to solve the problem, and the HPCCC agreed to accept from NEC compensation for the failure to deliver.

The Bureau had by this stage moved from an Epoch to a SAM-FS HSM, while CSIRO continued with DMF on a Cray J916se, which was acquired in September 1997 and installed in the Bureau’s CCF as an associated facility.  This system was acquired at my insistence.  The J916 had a HiPPI connection to the NEC SX-4, giving far higher bandwidth than the Bureau provided for its system with just Ethernet.

The naming of the SX-4 caused contention – the Bureau staff wanted to continue the Bureau’s naming scheme based on aboriginal names, but that was seen by CSIRO staff as cementing the systems as being part of the Bureau, not part of the new joint entity, the HPCCC.  Eventually, the system was named bragg after eminent Australian scientists, and this convention continued in the HPCCC to florey, russell, eccles, mawson, and in CSIRO to burnet, bracewell, ruby, bowen, pearcey.

In 1999, the HPCCC had to consider options for the second stage of the contract with NEC – more SX-4 equipment, or an SX-5.  A team from the HPCCC (including me) and Bureau Research Centre travelled to Japan to test the SX-5, and this option was chosen to replace the SX-4.  Around this time, the Bureau was concerned about reliably giving accurate forecasts for the Sydney Olympics, and wanted to have redundancy by acquiring a second SX-5.  A brief was put to CSIRO to support this, and it was signed off to the tune of several million by the CEO, much to the reported annoyance of other senior CSIRO staff.  So, there were two SX-5s, named florey and russell.
In early 2002, I was contacted by colleagues at the Arctic Region Supercomputing Center, who I had met at Cray and CUG meetings – Barbara Horner-Miller, Virginia Bedford and Guy Robinson.  The ARSC was considering acquiring a Cray SX-6 (a re-badged NEC SX-6), and knew that I had had experience with NEC vector systems.  Subsequently, I spent three short terms as a consultant at ARSC in Fairbanks, Alaska – April-May 2002, July 2002 and September 2003.

HPCCC – SX-6 era, 700 Collins St, SGI storage management, clusters

In 2002, as the SX-5s approached their end of life and the contract with NEC was to terminate, the HPCCC went out to tender for replacement systems.  By this stage, Steve Munro had left the HPCCC, and a new manager, Phil Tannenbaum had been appointed.  Phil had worked for NEC in the USA, and was familiar with the company and its workings.  The HPCCC prepared specifications, including a workload benchmark that I devised, using Bureau and CSIRO applications.  The task was for the vendors to demonstrate that their systems could be filled with applications, but when operational jobs arrived, they would be started promptly (within seconds), preempting some of the background work.  When the operational jobs had finished, the background jobs were to be resumed.
There were three main contenders for the contract: IBM, NEC and SGI.  SGI did not attempt the workload benchmark but instead tendered a system to help transition away from vector architectures.  IBM failed to demonstrate the workload benchmark, but NEC did!  Just after NEC submitted its tender to the Bureau, one of the NEC staff members, Ed Habjan, walked past where I was sitting, and uttered the word, “sadist”!

Phil Tannenbaum also wanted a benchmark to explore the i/o capabilities of the systems.  In half a day, I adapted some old benchmarks from 1989 to produce the CLOG benchmark, which measured write and read performance from an application with varying request sizes.  This was subsequently used to monitor the performance of various systems through their life.  Here’s an example, showing the results of running clog on a Global File System and a memory-resident files system, probably on an SX-6.

The performance climbs as the record size increases, as the buffer size is increased, and when switching from the disc-based GFS to the memory-based filesystems.

So, NEC was the leading contender, and the HPCCC organised a team to visit NEC in May 2003 for a live test demonstration.  Phil managed to fail NEC on the first day, but subsequent attempts succeeded, and NEC won the contract.

By this stage, CSIRO had made the decision to diversify its central scientific computing platforms, and had negotiated with the Bureau to contribute only 25% of the cost of the new system, leaving the Bureau to fund the other 75%.   The initial system of 18 SX-6 nodes was split 5 for CSIRO and 13 for the Bureau, and was installed in the new CCF in the new building at 700 Collins St in December 2003 while it was still being completed, and when the additional 10 nodes for the upgrade arrived, they were all assigned to the Bureau, owing to a quirk in the pricing schedule from NEC.  There were two front-end NEC TX7 systems, based on Itanium processors.

The new building provided opportunities, with staging of StorageTek Powderhorn tape libraries between the old and new sites with the Bureau and CSIRO cooperating in the transition.  CSIRO went out to tender for a storage management solution to replace DMF on the Cray J916.  NEC, Cray and SGI bid, with the SGI tender being successful, providing DMF on an IRIX/MIPS platform.  CSIRO was already on the path to Linux, and so contracted to run DMF on an Altix running Itanium processors and Linux.  This was one of the first in the world, and came with some risks.  The NEC and Cray bids failed to match the SGI bids because of the cost of the licences for the software based on the amount stored – in one case, exceeding the cost of storage media.  The Altix was installed in early 2004, and was upgraded in June 2004 to provide a base for data-intensive computing – large memory, multiple processors, and access to the DMF-managed storage as closely as possible.  The DMF-managed filesystem was the /home filesystems, as it had been on the preceding Cray Research vector systems.

CSIRO went out to tender for general-purpose cluster systems, and also for a cluster system to be a development platform for the ROAM application that was being developed by CSIRO Marine and Atmospheric Research under the Bluelink project with the Bureau and the RAN.  IBM won the tender with a blade-based system, which we called burnet, with the ROAM platform being named nelson.  These systems, along with the Altix and tape library were installed in the CCF under the associated facilities clause in the HPCCC agreement.

National Facilities

From 2014, I was tasked with managing CSIRO’s use of and interaction with National HPC facilities, such as NCI, the Pawsey Centre and MASSIVE.

For NCI, I managed the allocation process with the aim of maximising CSIRO’s use of the facilities.  This involved setting initial allocations each quarter, but increasingly I was able to adjust allocations through each quarter – giving more allocation to projects that were running ahead of their pro-rata allocation, and taking allocation away from dormant projects.  This was not part of the original allocation model, but was important to avoid wasted resources towards the end of each quarter as allocations would have otherwise sat in projects that were not going to use the allocation.  With the aid of the ‘bonus’ facility (allowing projects whose allocation was exhausted to still have jobs started when otherwise-idle conditions arose – subsequently withdrawn by NCI), CSIRO was able to drive utilisation of its share to high levels.

Software

This section highlights some of the software I designed or developed in the 21st Century, to support the users and the systems for CSIRO Scientific Computing.  The tardir family and the clog benchmark were mentioned above.

Time Create Delete

In 2007, when developing the clog benchmark (see above), I also developed a simple test of the performance of filesystems on metadata operations.  This simple test timed the creation of about 10,000 nested directories in a file system, and timed the deletion of them. (A similar test was run in 1997 during acceptance tests on the NEC SX-4, and was abandoned incomplete after about 10 days.)  Many operations on files do not involve bulk data movement, but do involve scanning filesystems to retrieve metadata, e.g. for backups or for scanning for files to be flushed (see below).  I ran the tests on several systems around 23:00 each day, to allow for monitoring of the performance over time.  Of course, there was a lot of variation in performance from day to day, because the tests were run on non-dedicated systems.  Also, the test results depend on not just the underlying storage performance, but on the operating system – caching could have come into play.

Here’s an example of the performance of several filesystems, run from one of the SC cluster nodes.

There was a reconfiguration in September 2020 which led to reduced performance of the /home filesystem, and the /datastore filesystem (NFS mounted from ruby).

Job progress

I developed scripts to monitor the progress of jobs on a compute system.  This was especially important on the HPCCC systems where operational jobs were deployed.  Below are two samples of non-operational jobs running on a over-committed nodes, where initial progress is slow, then the jobs speed up and finally reach full speed towards the end.  This type of workload benefitted from the varying process priorities as implemented by Jeroen as described above.

Over-commitment (allocating jobs to a node so that the sum of the number of requested CPUs was more than the number of CPUs on a node), allowed jobs in their early stages to soak up small amounts of otherwise idle time.  Over-commitment was not done for large multi-CPU and multi-node jobs, that required dedicated resources for maximum throughput.

STREAM2

John McAlpine developed the STREAM benchmark to measure the bandwidth between processors and memory, since this aspect of computer performance was often of more importance than pure processing speed for real-world applications.  He started developing the STREAM2 benchmark, which measured memory bandwidth for a range of data or problem sizes, and I developed the benchmark further.  Later, Aaron McDonough took over the management of running the benchmark on the systems available to CSIRO users, to show some of their characteristics.

STREAM2 exposed the memory hierarchy on cache-based processors, and showed the strength of vector processors which were supported by  cross-bar switches to many-banked memory systems, such as the Cray and NEC vector systems.  Here is an example of the results of running the DAXPY test of STREAM2 on a range of systems available to CSIRO users in 2004.

(The scale is in Mflop/s – for the DAXPY operation that requires 12 bytes/flop, 1 Mflop/s corresponds to 12 Mbyte/s memory traffic.)

The cache architecture is evident for the microprocessors, with an order of magnitude lower bandwidth than the NEC vector system, whose memory bandwidth is uniformly good after a threshold is reached.

STORE/RECALL

CSIRO Scientific Computing and predecessors has run the DMF Hierarchical Storage Management system since November 1991.

At various stages, other products and configuration were evaluated.  At the time, there appeared to be no benchmark tests available for HSMs, so I started one in 1996.

The idea was to simulate and measure the data transfer rates for a notional user who ran a Fortran program (such as a climate modeller) to produce a stream of data, starting in memory, then on fast disc, then on slower disc and then to a secondary storage medium such as magnetic tape.  That was the store part of the benchmark.  The recall part was to recall the data for processing, reversing the flow through the layers above.  Of course, the data sizes I used in the late 1990s were far smaller than current computing allows.

Some of the ideas for this benchmark came from the Hint benchmark.

Here’s a graph showing the speed of storage (Mbyte/s) as a function of the progressive amount of data stored.  This was run on an SGI Altix in 2012 using DMF.

The 4 tiers of storage can be seen clearly, with their corresponding transfer rates.  Tiers 2 and 3 appear to have reached their asymptotic speeds, but tiers 1 (memory) and 4 (tape storage) were not fully pressured.

Here’s the corresponding graph for the recall.

Again, the 4 tiers are visible, but the data sizes were not large enough to push each tier to its maximum performance.  It seems that the tape recall was as fast as the disc reading.

This benchmark could have been a useful tool to evaluate storage systems from the point of view of a user with ‘infinite’ demands.

Traffic Lights

Phil Tannenbaum had suggested a system where users of the HPCCC services could go to a web site and see the status of the systems: other sites had such subsystems.  There was software to do this for systems administrators (e.g. Nagios), but no obvious ones available for a user-facing service.  When Phil was away in May-June 2006 for about two weeks, I set about creating such a system, with the generous help of Justyna Lubkowski of the Bureau who provided the web infrastructure.  This system, dubbed the traffic lights, was able to be demonstrated to Phil on his return, and was subsequently enhanced to cover more services (HPCCC, CSIRO, National HPC Facilities), and to monitor items such as floating software licences.  Here are some partial snapshots from 2021.
The first shows the groups view: services were put into groups, to give a quick overview (there were 71 services being monitored at this stage.)  The status was shown by red, green or grey indicators.
The next snapshot shows all the services for the group ruby_datastore.  The traffic lights could provide reports on incidents, downtimes and more (such as the current batch queue) for services.  The notes provided a brief summary of the service, including recent downtimes or slowness.
The next snapshot shows the start of the downtimes records for one service.  These were not particularly accurate, since the probing interval ranged from about 3 minutes to 30 minutes, depending on the service.
The next snapshot shows part of the report on the software licences.  The ‘more’ button provides access to the licence logs, so that users waiting for a licence can find who is using the licences.  The Access information link sends the user to the Science Software Map, which provides details on the licence conditions.
This service is still available to users, after nearly 15 years.  The software is about 5500 lines of code.

Backups, data protection, data management, the Scientific Computing Data Store

When the HPCCC SX-4 service was being set up, and I was encouraging potential users to switch from the Crays, one of the users (Julie Noonan) said to me that they wouldn’t start using the SX-4 systems until backups were being done for the /home file systems.  In the absence of a system product, I developed scripts using the rsh, rdist and cpio utilities to make backups onto the Cray J916se, and I used a Tower of Hanoi management scheme.
These scripts, started in April 1998, ran through to the end of the life of the SX-5s in 2004.
In February 1997, I organised a one-day workshop on Large Scale Data Management for CSIRO.  I spoke on the topic: Storage Management – the forgotten nightmare.  This period was a time when I was increasingly focussed on data storage and management for HPC users as much as on the HPC facilities and services.  Around that time, I wrote:

Users typically want every file kept and backed-up, and would be happy to use only one file system, globally visible across all the systems they use, with high-performance everywhere, and infinite capacity!A user added that they want all of the above, at zero cost!

When I was acting as a consultant at the Arctic Region Supercomputing Centre in 2002, I conducted a review of its storage plans, and argued for two models of service: for those concentrating on HPC, they would be based on on the HPC servers, and would have to explicitly access storage servers.  For those concentrating on working with data, they would be based as close to the storage as possible, i.e. directly using an HSM-managed filesystem, and would have to explicitly access HPC servers, e.g. by using batch job submission.  This model continued through to 2021, with cherax and ruby SGI systems providing a platform for data-intensive processing, in tandem with HPC systems.  Part of the inspiration for this model and these systems was a comment from one of the climate modellers, that the modelling was done (on the HPC system of the time), but he was so far behind in the data analysis.

These systems, and the closely-associated Data Store became one of my flagship endeavours, in attempting to provide users with a single large space for storing and working with data. Although the migrating filesystem for the /home area took some getting used to (because inevitably, the file you wanted was off-line), users with large data holdings valued the system for its unitary nature, and coded workflows to build pipelines allowing for efficient file recalls and file processing.  Peter Edwards enhanced this experience by enhancing the dmget command to allow large recalls to be broken into carefully crafted batches, one for each tape needing to be accessed (stopping denial of service from a user requesting huge recalls), and allowing the interlacing of recalls and processing of batches of files.  He also enhanced the dmput command, to allow one user to work with another user’s data and not cause problems with space management for the owner of the data.

One day in about 2007, Jeroen reported to me that there was a new facility in the rsync utility which might be of interest.  Jeroen had taken over management of the backups on the CSIRO systems.  The rsync utility, written by Andrew Tridgell at ANU, allowed efficient mirroring of files from one location to another, avoiding unnecessary transfers.  The new feature was the –link-dest= – this allowed an rsync transfer from a source to a destination to be able to consider a third location (such as the previous day’s backup), and instead of transferring an identical file, just make a hard-link.  The backups then because a series of directories (perhaps one per day), with there being only one copy of files common to multiple backups, but each directory appearing to be a complete or full backup (which it is).  This has the advantage of providing a full backup every day, for the cost of an incremental backup – i.e. transferring only changed or new files.

Jeroen coded this into the backup suite, and he and I also developed Tower of Hanoi management of backup holdings.  We used a DMF-managed filesystem as the targets for the backups, taking advantage of the in-built tape management.  After Jeroen left, Peter Edwards took over the systems administrator.  He and I continued to develop the capabilities of the backup suite, including the work by Peter to develop a directive-based front-end to specify filesystems to be backed up.  Peter also found that a filesystem could be mounted twice onto a system, with the second mount being read-only.  This allowed the backups of the users’ home filesystems to be made available on-line to the users, allowing for inspection and restoration.  We did consider patenting some of the ideas, but instead made the ideas freely available.  Here’s a picture of  Tower of Hanoi puzzle.

Tower of Hanoi puzzle

I gave several talks on the backup suite:  the first one, given to the DMF User Group in 2009 was entitled  DMF as a target for backup.   The key features are:

The techniques in use provide:

1. coverage back in time adjusting to the likelihood of recovery being needed

2. full backups every time, for the cost of incrementals

3. simple visibility of the backup holdings

4. simple recovery for individual files and complete file systems

5. no vendor dependency

6. centralised tape management

7. space saving of about a factor of five compared with conventional backups

8. directive-driven configuration

9. user visibility and recovery of files from the backups

The key utility for doing the Tower of Hanoi and other management is in a script called purge_backups.pl, started by Jeroen, and stretching now to 5740 lines.  A note I wrote about some of the extensions to the original Tower of Hanoi management is at the Wikipedia page Backup rotation scheme under the heading Extensions and example.  

In 2009, I gave a poster presentation at the  eResearch conference entitled, Your Data, Our Responsibility.

The poster outlined some storage dilemmas for HPC centres, and then advocated the use of HSM, the use of the rsync utility and Tower of Hanoi management scheme for backups, using CSIRO’s experience as an example.

This backup suite continued to protect systems and user filesystems until 1st March 2021, when backups of the cluster home filesystems were switched to use the in-built snapshots capability, and Commvault.

The graph below supports the assertion that full backups were achieved at the cost of incrementals.  It shows the ratio of files moved in a month to total files being considered.

The ratio is about 0.1, or 10%.  The ratio of data moved in a month to total data being considered was about 0.4.  For daily backups, this means a reduction by a factor of about 75 in data moved compared with full backups every day.

The collage shows various other statistics from the backups.

(New storage for the backups with front-end SSD was utilised from 2014 onwards, providing better performance.)

Flushing

Jeroen van den Muyzenberg started with CSIRO as systems administrator in 1999.  He successfully wrote scripts to handle flushing of temporary filesystems in a rational way from December 2000.  One script monitoring the target filesystem, and if it was more than a threshold (such as 95%) full, triggered another script.  This used the large memory on the SGI systems to slurp in details of the entire holdings on the target filesystem.  This list was then sorted (based on the newer of access and modify times), and the oldest files (and directories) were removed.  This worked successfully for several years.

However, in 2016, flushing was needed for the CSIRO cluster systems, which did not have a big memory, and indeed the filesystems were hosted on servers without a large memory, which were the most suitable hosts for such housekeeping operations.  Around that time, I read an article It Probably Works by Tyler McMullen in the Communications of the ACM (November 2015, Vol. 58 No. 11, Pages 50-54).

I realised that we don’t need to know the oldest file to start flushing – we just need to know a collection of the older files.  This led to an algorithm which scanned a filesystem, and registered the files in ‘buckets’ according to their age.  This removed the need for a sort.  Then it became apparent that the scan could be performed in advance and the results saved, ready for a flushing process when needed.  This separation of scanning and flushing meant that the system was always ready when a flush was needed, and in practice the flushing could be started within a few seconds from when a monitoring process signalled that flushing should be started.  The only extra step for the flushing was to recheck for the existence of a candidate file, and whether its access or modify times were still within the range of the bucket.

The bucket boundaries were determined from the date of the start or last flush of a filesystem at one end, and a cut-off period, e.g. 14 days; we guaranteed to the users that we would not invoke the flushing on files younger than 14 days old.

The implementation was done by Steve McMahon from March 2016, with additions by Ahmed Arefin and Peter Edwards who added some production hardening.

Here is a list of features:

  • New scalable flushing algorithm
  • Separates scanning from flushing
  • Eliminates the sort
  • Can have lists ready for action in advance
  • 2 second response time!
  • Technical report available
  • Open Source
  • Allows increased default quotas

The scalable flushing code is in use on the CSIRO SC systems for 4 filesystems (March 2021).  The graph below shows the action of flushing on a file system, showing the date of the oldest surviving files – flushes are marked by the vertical lines, reducing the age of the oldest surviving file.

The next graph shows the the total space occupied by the files in each bucket.  In this case, a flush has recently occurred, and the buckets marked in red have been processed.

(The abscissa labels are of the form YY-MM, the last two digits of the year, and the digits of the month.)

Mirrors

In early 2018, CSIRO Scientific Computing started the process of removing a /data area from its systems.  Here is a table of just some of the available filesystems on the cluster.

The /data area ($DATADIR) was subject to quotas, but when it filled, there was no good way to manage the holdings – no migration (HSM) nor flushing, and the old way of sending users a list of the biggest users (“name and shame”) was akin to bullying, in trying to use peer pressure to get users to remove old files.  However, I wanted users to be able to maintain a collection of files on the systems bigger than the home file system could support, to be able to protect the files, but also not to lose files through flushing (when the only available space was a filesystem subject to flushing). In April 2018, I started on a utility to deal with mirrors of a working area.  So, a user could set up an area on a flushable filesystem, then run the utility with

mirror.sh create

and a mirror of all the files would be created on the HSM storage ($STOREDIR) above, with intelligent consolidation of small files into tardir archives.  If the user was away for a while, and some of the collection had been flushed, the user could restore the collection with the utility

mirror.sh sync

Other options allowed for the updating of the mirror.  The utility was installed to run on ruby, the clusters, and the NCI systems.  Later versions supported multi-host operations, such as providing a mirror on the CSIRO systems of a collection held at NCI.  Here is a list of all the operations supported.

create sync update cleanse delete check status flush list help kill moveto explain dev_history release recall removetmp man restore auditw auditm verify config getremote putremote discover rebuild

Utilities were also enhanced to allow profiling of the contents of an area – size, access time and modify time.  Here is an example: when produced interactively, the plot could be rotated and zoomed.

Tenders, purchasing, acquisitions

A lot could be written about the major CSIRO computing acquisitions and the tender processes.  The first one was about the responses to requests to build commercial versions of CSIRAC, and then the acquisition of the CDC 3600 and CDC 3200s.  Later came the Csironet CDC machines, including the Cyber 205, and the Fujitsu systems.

Since the mid-1980s, I have been involved in acquisitions of the Cray Y-MPs and J90se, the HPCCC NEC SX-4, SX-5s and SX-6s and subsequent Bureau systems, the StorageTek tape drives and libraries, the APAC and NCI systems, the Pawsey magnus and storage systems, the Aspendale SGI and Cray J90 systems, the CSIRO cluster systems, the CSIRO Data Store hosts, software and disc storage, etc.

There is a story to tell for many of these.  Some of the HPCCC NEC SX-6 tendering is documented in Chapter 7.

CSIR Mk I

The  book The Last of the First, p ix, notes the attempt to commercialise CSIR Mk 1.

Attempts were made to interest Australian electronics firms in the commercial production of computers based on the CSIRAC design. In October 1952 Philips, EMI, AWA and STC were invited to tender for the construction of up to three machines. AWA and STC responded (there appears to be no record of interest from Philips or EMI), however, nothing eventuated from this exercise.

CDC 3200

There is a story behind the CDC 3600 and 3200 acquisitions by CSIRO and the Bureau of Census and Statistics, documented in an unpublished note taken from an address by E.T. Robinson, then manager of the CDC agent in Australia.  Suffice it to say that although the 3600 was a real machine, the tendered auxiliary systems (called the 160-Z) did not exist at all at the time of the tendering, and the US head office did not know that the systems had been tendered. The Australian agent had to subsequently put it to CDC Head Office that it could and should design and build the systems, which became the 3200.   Subsequently, the systems were installed but were underdone, particularly in the software area.  The 3200 was the first in what was to become a very successful line of lower 3000 series machines.

I’ve told previously of the story of the CDA salesman who secured the sale of the Cyber 76, reportedly walking into a suburban bank branch in Canberra to deposit a $5M cheque into his personal bank account, and then transferring all except his commission to CDA the next day.

Reportedly, Peter Claringbold as Chief of DCR and head of Csironet arranged a major purchase or long lease of equipment without consulting the CSIRO Executive.

Aspendale

My first involvement started around 1986 with my appointment as Leader of the Computing Group in the CSIRO Division of Atmospheric Research at Aspendale.  It was clear that the Division needed to move to having its own UNIX servers to support local processing. Already the CSIDA project had acquired Sun Microsystems workstations to support its image processing work, and the Division had a long history of using HP minicomputers running RTE for data acquisition and other tasks.

I set about drawing up requirements, and preparing a set of benchmarks to test contenders – these benchmarks included code such as Jorgen Frederiksen’s eigenvalue code which was running on the large Csironet machines.  We expected Sun as the leading UNIX workstation vender at the time (SPARC), or the incumbent HP (PA-RISC) to provide the best solutions.  I remember testing the codes on HPUX systems at HP’s office in Joseph St Blackburn, and finding that the compilers had difficulty compiling the code.  SGI also submitted a tender, and we finished up testing the codes on an SGI system (MIPS) in Mike Gigante’s office at RMIT.  To most people’s surprise, SGI won the contract, providing significantly higher performance than the other bids.  This led to a SGI server (atmos) being installed for general UNIX processing at Aspendale.

This result showed two lessons, often not learnt.

  1. There is value in going out to tender, even when you think you know the best solution: the SGI proposal was not anticipated.  SGI had embraced the value of a server rather than just one-person workstations.
  2. We can all get emotionally attached to what we are comfortable with, or perceive in our estimation as being the best result.  The CSIDA group had its own solution (Sun), and did not embrace the SGI solution; nor did some of the Computing Group staff who had a long history with the HP minicomputers.
The Cray Research Era

Chapter 5 and the personal history above contain details about the process that led to CSIRO acquiring a Y-MP in the Joint Supercomputer Facility, and Chapter 6 contains details on the next Y-MP in the CSIRO Supercomputing Facility.  The SFTF experience raised two lessons.

3. Benchmarks can provide performance information, but also expose the ability of the tenderers to demonstrate the capabilities of their systems, and show the strength (or otherwise) of their support capabilities.

4.  Political considerations can often outweigh technical considerations.  Despite the clear strength of the JSF Cray Y-MP proposal, there was resistance at a high level to the cost and risks involved, and the Convex solution was favoured.  In the end, a chance circumstance led to the JSF Cray Y-MP proposal being accepted.

The HPCCC era

Chapter 7 and the personal history above contain details of the tender process for the first shared system for the HPCCC.  The crucial issue for me was that although we had a scoring system for the responses from the vendors, we had not developed an overall methodology to cover the different rankings obtained by the Bureau and CSIRO.  This led to two more lessons.

5. An overall ranking method was needed which gave weights to different factors.

6. Methods were needed to rank issues such as software quality, where Cray Research’s offering clearly outranked NEC’s.  In the end, what could be easily measured, e.g. feeds and speeds, came to dominate the evaluation at the expense of more harder-to quantify qualities.

There was another issue at the time.  The NEC solution was clearly not able to immediately deal with the management of data being carried out on the CSF Cray Y-MP with DMF.  I insisted that a solution was needed to carry forward the (then) large quantities of data in a transparent way for the users.  The only obvious way was to buy another Cray Research system, with a J90 being the obvious (cheaper) solution than the Y-MP, C90 or T90 systems.  With the help of Ken Redford and John O’Callaghan, CSIRO was able to gain approval to acquire a J916se, another StorageTek tape library and tape drives to allow the transition of the Data Store.  This led to another lesson.

7. It is possible to acquire significant systems without going to tender, but the case needs to be strong to gain approval levels (possibly up to ministerial levels).

This turned out to be a good decision, with the failure of the development work on the NEC SX-Backstore product to achieve production quality (see above).  However, the J916se did have reliability problems, mainly around the Gigaring i/o architecture.

In 1999, the HPCCC decided for the final phase of the NEC contract to select an SX-5 rather than another SX4.  A group from the HPCCC and the Bureau’s Research group visited Tokyo to evaluate the options, and came back and recommended the SX-5.  This went ahead, and “florey” replaced “bragg”, the SX-4.

In 1999 with the 2000 Olympics looming in Sydney, the Bureau was keen to increase its resiliency by having a second SX-5.  It received a proposal from NEC.  CSIRO needed to make the choice of either matching the Bureau’s funding of the second SX-5, or go into an asymmetric funding arrangement for the HPCCC.  Supporters of the HPCCC gained the  approval of the Chief Executive, Malcolm McIntosh, for the multi-million dollar boost to funding for the HPCCC.  Other senior members of CSIRO management were not happy that the money had been spent, particularly as the HPCCC systems were predominantly used by only a few Divisions of CSIRO – Atmospheric Research and Fisheries and Oceanography being the main users.  Thus “russell” was acquired, installed and brought into production in time for the Olympics.

The Aspendale Cray J90 system (mid 1990s)

This was acquired to offload some of the climate modelling computations from the CSF Cray Y-MP.  It also ran DMF – the value of the integrated storage solution was realised.

HPCCC SX-6 procurement  (2002-03) – see above
The Sun Constellation systems for APAC and the Bureau (c. 2008)

See the NCI history here.

A decision was made in 2007-2008 for CSIRO not to continue the HPCCC, but to divert the funding into a system at ANU under NCI, with the Bureau acquiring a system solely for operations and operational development, but also to have a share of the NCI system for research computation.  A joint NCI/Bureau procurement in 2008 in which I participated resulted in a contract with Sun for two Sun Constellation system – vayu at NCI and solar at the Bureau.   Unfortunately, Sun was acquired by Oracle during this procurement, and Oracle did not have a focus on scientific computation.

8. RFQs should have the ability to consider and rate changes of company ownership when considering proposals.

The NCI raijin system (2012-2013)

NCI received funding for a new system, and proceeded to tender and the acquisition of a Fujitsu system – raijin. I was not directly involved with this one.

Towards the end of the process, three of the key staff at NCI left.  One of the departures was David Singleton who was principally responsible for the ANU PBS, a version of the PBS batch system which notably supported pre-emption by the suspend/resume mechanism.  This allowed large and urgent jobs to start at the expense of smaller and less urgent jobs, by suspending the latter: when the large and urgent were done, the smaller and less urgent resumed.  (The ERS Schedular on the NEC SX systems had similar capabilities, but the HPCCC used the checkpoint/restart capability fro preemption.)  The ANU PBS system provided excellent responsiveness and high utilisation, since there was less need to drain nodes to allow large jobs to have enough resources available to start.  Utilisation on the systems prior to raijin was more than 90%.  With the Fujitsu system, NCI made the decision to drop use of the ANU PBS, and relied on the commercial product PBS Pro, and never again brought suspend/resume into production.  Utilisation thus dropped.

Multiple StorageTek tape drives and libraries

I was responsible for the decision to support the JSF to acquire StorageTek cartridge tape drives in 1991, which led to DMF being enabled.  I championed the acquisition of the first Automated Tape Library in 1993 using the Development Fund.  I later calculated that the increased utilisation of the system paid for the tape library in about a year.  Subsequently, many generations of tape drives and media were acquired, as the technology marched onwards.  In both 1997 and 2003-2004, CSIRO purchased an additional tape library at new sites to allow the transition from the old sites to be accomplished over weekends.  See Sidebar 7.  I made at least one mistake in these upgrades. I arranged to order 4 tape drives to attach to one of the Crays, but found afterwards that the Cray had only one channel.  Fortunately, STK staff helped me find an ESCON Director which was able to do the fibre channel multiplexing.

The Pawsey magnus and storage systems (2010-2014)

I was involved in the initial preparations for the acquisition of the supercomputer and storage for what became the Pawsey Supercomputing Centre from 2010.  I was seconded part-time to Pawsey from September 2011, having to relinquish an earlier secondment to RDSI.  I acted as HPC Architect for a period when Guy Robinson was taken ill suddenly, and then acted again as HPC Architect after his departure to oversee acceptance testing.

The CSIRO cluster systems

CSIRO HPSC decided to acquire a cluster system in 2004, to enable it to offer a service on such a system and allow researchers to concentrate on research rather than on systems administration.  At the same time, CSIRO Marine and Atmospheric Research had joined the Bluelink project with the Bureau of Meteorology and the Royal Australian Navy to develop, amongst other projects, a forecast system able to be targetted to areas of interest (ROAM). CSIRO HPSC undertook to acquire a prototype cluster for this application, and arranged to house it in the Bureau’s Central Computing Facility.  Tenders were let for both systems, and I had to push back against a desire to specify explicitly the hardware instead of asking the vendors to deliver the best performance on a range of applications.

This was done, and IBM won the contract for clusters which became burnet (for general use) and nelson (for the ROAM development).  IBM’s response included two architectures, and we chose the blade one.  However, the benchmarks had been run on the other architecture, and did not perform as well on the blade architecture (because of the reduced interconnect between nodes), and so IBM had to supply extra blades to meet the targets.  There are two lessons.

9. RFQs should specify the desired applications and their performance rather than explicit hardware.

10. Acceptance testing to targets is important.

CSIRO HPSC managed the replacement of the nelson cluster by a production cluster called salacia for the ROAM application in about 2007.  In 2013, another replacement was needed, and the Defence Material Organisation tasked CSIRO ASC to carry out the procurement and installation.

CSIRO Data Store hosts, software and disc storage, MAID

From 1997, the Data Store was considered as a separate facility to the supercomputers.  At the end of 2003, as the move of the Bureau and the HPCCC from 150 Lonsdale Street Melbourne to 700 Collins Street Docklands, CSIRO issued an RFQ for a replacement system.  SGI tendered DMF (as it had inherited the rights when it acquired Cray Research, and retained the rights when it sold Cray), NEC and Cray.  SGI won the evaluation, principally because the licence fees were not based solely on volume stored, and were less than the media costs – this became very important over subsequent years as the demand for storage grew.  SGI tendered a system based on MIPS and IRIX, but CSIRO chose to adopt the recent port to Linux, as this was seen as the way of the future.  DMF worked on the Altix Itanium and later Intel processor platforms.  The biggest issue was the unreliability of the NUMA systems memory management.  Various disc upgrades were undertaken, to cope with the growing demand.  At some stage around 2008, CSIRO IM&T had arranged a panel contract for the supply of all disc storage, which made it difficult to enhance the storage on the SGI hosts.  In 2011, CSIRO acquired a MAID – Massive Array of Idle Discs.  This provided storage intermediate in performance and cost between enterprise fibre-channel high speed disc and tape.  The discs in the MAID were treated like tape volumes, and were switched off when not in use, but could be brought back to life in under 20 seconds.  The MAID was a successful ‘green’ device, saving power compared with always-spinning disc.

11. Panel contracts are useful for reducing the work involved in full tenders, but should not be mandatory.

12. The full costs over the life-time of a system need to be considered, especially any costs based on the capacity used.

Bureau Cray system (Aurora) (2013-2014)

I was again involved in the Bureau’s Supercomputer RFT, which led to the acquisition of the Cray XC40 system (aurora).

The NCI gadi system

In March 2018 I was appointed to a Consultation Group for the tender for the replacement of raijin.  It appears the group never met.

Resource Allocation and Scheduling

Resource allocation for shared resources (such as corporate HPC systems) has been a thorny issue for decades.  I’ve written above about the charging regime in the Csironet era, which largely contributed to its downfall.  I’ve written above about share schemes and the fair share concept which arose in the 1990s. Here’s a paper from 1991 (presented to the Australian Supercomputer Conference): Resource Allocation on Supercomputers.

Here is a talk given in 2017 on Resource Allocation.  The talk covers allocation of compute and storage resources, with examples of good and bad.  I’ve mis-quoted Tennyson’s “The Brook” with

“Flops may come and flops may go, But bytes go on for ever.”

John Mashey once wrote: “Disks are binary devices: new or full”

Scheduling of compute loads was and maybe still is an important issue.  I’ve written about some of the issues in the context of the CSF and HPCCC.  The larger the system, and the more parties involved, the more important scheduling becomes: it is part of the process for ensuring that the resources are delivered in accordance with whatever policies are laid down.  The workload benchmark from 2003 was an important tool in getting a good result for the 2003 HPCCC procurement.  Only one vendor, NEC, was able to demonstrate compliance with the workload benchmark, which attempted to simulate a system running as much background work as possible while still responding rapidly to urgent tasks such as running operational weather forecast models.  The scheduler was important for the CSF Cray reaching an average of 98.5% CPU utilisation.

It is disappointing to see that some HPC centres still show large amounts of idle time, and still have large areas of storage under-utilised.

Colleagues

It has been a privilege to work with many colleagues who have given me strong support in my career.  I will single out Alan Plumb who gave me a start in my support role, Jorgen Frederiksen for his confidence in me and supporting me, Brian Tucker for employing me and giving me opportunities, Len Makin (as as a wonderful deputy for about 15 years), Jeroen van den Muyzenberg as systems administrator, and Peter Edwards as a dedicated systems administrator of CSIRO’s Data Store for about 18 years, and a great supporter for the DMF User Group.  Gareth Williams and Tim Ho in more recent times have been part of my story.

When I first arrived at DIT Carlton from May 1990, I was amazed at the level of support given to me by the administrative staff – led by Sue Wilson, but including Marita O’Dowd, Teresa Curcio, and later Jim Taylor. Teresa went on to support CSIRO HPSC, ASC, and IMT SC for many years as the sole administrator.

Personal Reflections

My university education and early career became captivated by the power and potential of computation.  I was inspired by the possibilities of computation to model the real world, and to the benefits from this, for example in weather forecasting, as I envisaged in 1962 (see above).  Computation enabled scientists to gain greater understanding of the world around us.

My career as a research scientist never eventuated, but I found my niche at Aspendale in doing the programming in support of the scientists, such as Alan Plumb, Peter Baines, Peter Webster and Jorgen Frederiksen.

Having used to the fullest extent the computational facilities available, I became motivated to improve the services in the service of science, becoming Computing Group Leader at Aspendale, and then moving into corporate CSIRO HPC from 1990.  It was not just the facilities, but the policy framework that I wanted to provide, so that users could get the maximum benefit – this meant policies that allowed open access and promoted high utilisation of the limited CPU resources available at the time – 98.5% on the CSF Cray Y-MP.  Providing good storage facilities became a major focus for me, and so I jumped at the opportunity to initiate the DMF service on cherax, which provided virtually ‘infinite’ capacity for users to store data from 14 November 1991.

There is a Christmas carol entitled “A Boy was Born” set to music by Benjamin Britten.  It contains the line: “He let himself a servant be”, and that became my guide in the service to science.

Clive James wrote something like: “if you can’t be an artist, then at least serve the arts”.  I resonated with this for the science and scientists I worked with, seeking to provide the best environment – processors, storage, software, services and support.

I have been interested in the intersection of science and faith for many years, and have been helped in this journey by several sources.  I read “The Mind of God” by Paul Davies, who gives an overview of the universe we live in, the development of life and the complexity which leads to human beings, who, he remarks are the only conscious beings capable of understanding this universe.  He concludes with the lines:

“I cannot believe that our existence in this universe is a mere quirk of fate, an accident of history, an incidental blip in the great cosmic drama.  Our involvement is too intimate.  The physical species Homo may count for nothing, but the existence of mind in some organism on some planet in the universe is surely a fact of fundamental significance.  Through conscious beings the universe has generated self-awareness.  This can be no trivial detail, no minor byproduct of mindless, purposeless forces.  We are truly meant to be here”.

He and others write about the remarkable world we live in, in which everything is just right for the development of carbon, life, etc.

In 2018, Professor Stephen Fletcher was visiting CSIRO Clayton from Loughborough University, and gave a series of 8 lectures entitled: “Life, The Universe and Everything”.  I was able to attend some of these.  In the last lecture, he addressed Philosophical Questions about science, knowledge, truth, the big bang, etc.  “Both space and time came into existence at the start of the Big Bang”.  “After the Big Bang… A very weird thing is…the apparent design of the Universe to allow the evolution of life”  “Some Physical Constants are very ‘Finely-tuned’ for Life to Evolve..”.

He concluded that, on the basis of the special nature of our universe, that there were four possible explanations.  These were:

  1. A Divine Creator exists
  2. We are Living in a Computer Simulation
  3. Our Universe is just one member of a multitude of Universes that together comprise a “Multiverse”.
  4. The Inevitability Hypothesis – A future physical theory (a “Theory of Everything”) might still be developed that has no free parameters.
The first of these seems to me to be the one.
At the beginning of this section, I reported on a talk I gave on the themes of my career: Mathematics, Meteorology, Machines.  To this I now add Maker, who I acknowledge.

Back to contents