DELTA taxonomic computer programs

By February 21st, 2011

The DELTA system, developed by CSIRO’s Dr Michael Dallwitz, is a set of computer programs for manipulating descriptive taxonomic information; these programs have revolutionised the way taxonomists carry out their work, maintain their information, and make their findings available to others.

DELTA stands for Description Language for Taxonomy and was developed by Dr Dallwitz to overcome the limitations and difficulties inherent in other methods of recording taxonomic descriptions for computer processing. The scientific achievement and strength of DELTA is the flexibility with which diverse taxonomic products can be made from a single database. The interactive-key program INTKEY is the most powerful of its kind, and benefits both researchers who gather taxonomic data, and end users of the data.

A review in the prestigious international journal Nature stated that the DELTA based publication The Families of Flowering Plants by Drs Watson and Dallwitz may represent a stage in plant taxonomy as important as the publication of Linnaeus’s Genera Plantarum in 1737.

The DELTA system has been adopted by several international botanical organisations as a preferred format and it has attracted significant investments from the Australian Nature Conservation Agency and the National Science Foundation in America. The international demand has been so strong there are now versions of the software and its documentation in at least seven languages. For his work in the development of DELTA, Michael Dallwitz was awarded a CSIRO Medal for Research Achievement in 1996.

The origins of DELTA

In 1970, Michael Dallwitz was appointed to CSIRO’s Division of Entomology to provide mathematical expertise in ecological projects. He contributed to such projects as modelling of cattle tick populations and tick borne diseases, and insect development rates. However, insect taxonomy was also a concern of the Division of Entomology, and in 1971 he started writing a program for generating identification KEYS (KEY: Dallwitz, 1974). A few years later he met Leslie Watson of the Australian National University, who was using Richard Pankhurst’s KEYGEN program and had, with Peter Milne of CSIRO Division of Computing Research, devised a more flexible format for preparing data for this program (Watson L, Milne P, 1972).

Dallwitz and Watson discussed data formats and decided the initial approach could be improved. This was the origin of the DELTA format (DEscriptive Language for TAxonomy) and Dallwitz has worked closely with Watson ever since commenting that: the development of the programs would not have been possible without his collaboration

Since its inception there has also been considerable correspondence, both paper and electronic, about the programs and the DELTA data format, and constant planning for future developments. Mike’s philosophy behind the development of the programs was to produce practical tools, not just to develop methods.

The programs have evolved through feedback from users, the aim being to provide pathways for the data from one program to another to avoid manual manipulation of data wherever possible. As Michael Dallwitz commented back in 1993:

We want the programs and the data to have depth and flexibility, without ad hoc restrictions built in, so that people can use them in ways we did not anticipate. We want the programs to be able to benefit both the compiler of the data and the end user. Perhaps these aims are rather ambitious, but I think we are succeeding to some extent.

The DELTA system

The DELTA format (DEscription Language for TAxonomy) is a flexible and powerful method of recording taxonomic descriptions for computer processing. It was adopted as a standard for data exchange by Biodiversity Information Standards (TDWG).

The DELTA System is an integrated set of programs based on the DELTA format. The DELTA format is a multi-purpose format for taxonomic data, and has been improved and extended over the years. Whilst intended for general use to record and exchange data, and as a data source for any taxonomic programs, Mike and collaborators have themselves written a conversion program which converts DELTA format data into the formats required by a number of other programs as well as into natural language (CONFOR) and programs such as KEY, INTKEY, TYPSET and DIST which perform some of the tasks required by taxonomists. Over the years new features have been added to these programs including optional pop-up menuing and the ability to display images.

INTKEY windows showing an identification in progress. Here the abdomen characteristics of adult insects are being interrogated.

INTKEY windows showing an identification in progress. Here the abdomen characteristics of adult insects are being interrogated. [Photo: Mike Dallwitz, CSIRO]

The System was developed by Michael Dallwitz at the CSIRO Division of Entomology during the period 1971 to 2000. It is in use worldwide for diverse kinds of organisms, including viruses, corals, crustaceans, insects, fish, fungi, plants, and wood. The programs are free for non-commercial use.

Over the years, in response to requests from users world-wide, Mike Dallwitz travelled extensively to attend conferences, make working visits to collaborators, provide demonstrations and training workshops, and participate in projects such as the International Organisation for Plant Information (IOPI) and the International Working Group on Taxonomic Databases (TDWG).

Definition of the DELTA format

A detailed description of the DELTA format is provided by MJ Dallwitz and TA Paine can be found by following the link in the Sources below. The description is primarily for the benefit of programers, and contains more detail than would usually be required by users of the DELTA format. Topics covered include:

  • general introduction
  • introduction to the definition of the DELTA format
  • changes to the format
  • the character list used to describe taxa
  • range of character numbers
  • number of characters
  • maximum number of states
  • numbers of states
  • character types
  • taxon descriptions
  • number of taxa
  • implicit values
  • character dependencies.

Capabilities of DELTA

The DELTA System is capable of producing high-quality printed descriptions. DELTA data can include any amount of text to qualify or amplify the coded information, and this text can be carried through into the descriptions. Common features can be omitted from the data and the descriptions, while remaining available for identification and analysis. There is extensive control over the combination of attributes into sentences and paragraphs, the omission of repeated words, and the insertion of headings. The most important or diagnostic attributes (derived automatically or manually) for each taxon can be emphasised in full descriptions, or short descriptions containing only these attributes can be produced. The descriptions can be fully typeset without the requirement for any manual editing. These features are exemplified in books such as The Grass Genera of the World (CABI International: Wallingford), which was generated automatically from a DELTA database, and contains descriptions of about 800 genera in terms of more than 500 characters.

The program KEY generates conventional identification KEYS. In selecting characters for inclusion in the KEY, the program determines how well the characters divide the remaining taxa, and balances this information against subjectively determined weights which specify the ease of use and reliability of the characters. KEYS can be tailored for specific purposes by adjusting the weights, restricting the KEYS to subsets of the characters and taxa, and changing the values of parameters that control various aspects of the KEY generation. For example, KEYS could be produced for particular countries or climates; using only vegetative, floral, or fruit characters; starting with important characters; or biased towards common species.

DELTA data can easily be converted to the forms required by programs for phylogenetic analysis, e.g. Paup, Hennig86, and MacClade. The characters and taxa required for these analyses can be selected from the full data set. Numeric characters, which cannot be handled by these programs, are converted to multistate characters. Printed descriptions can be generated to facilitate checking of the data, and INTKEY can be used for further data checking, and for finding differences, similarities, and correlations among the taxa.

The interactive KEY program, INTKEY

The interactive KEY program, INTKEY, is easy to use and has powerful features, including:

  • entry and deletion of attributes in any order during an identification
  • calculation of the ‘best’ characters for use in identification
  • automatic recovery from errors made by the user, or errors in the data
  • allowing the user to express variability or uncertainty in the attributes of a specimen
  • direct handling of numeric values, including ranges of values and non-contiguous sets of values
  • retrieving free-text information
  • automatic handling of characters that become inapplicable when other characters take certain values
  • character illustrations and notes, and selection of character states from the illustrations
  • illustrations of taxa, and simultaneous viewing of several illustrations
  • finding similarities or differences between taxa
  • describing taxa in terms of nominated sets of characters
  • generating diagnostic descriptions
  • altering the handling of unknown, inapplicable, and overlapping values, as required for different applications
  • restricting operations to subsets of characters or taxa
  • keywords (definable by the user or the author) to represent subsets of characters and taxa
  • finding characters by included words, and taxa by name, synonyms, or common names
  • ‘character reliabilities’ to guide the selection of characters
  • obtaining lists of taxa possessing or lacking particular attributes or combinations of attributes
  • obtaining lists of taxa unrecorded for particular characters or sets of characters
  • coalescing descriptions (e.g. to generate generic descriptions from species descriptions)
  • user-definable toolbar buttons to represent any command or sequence of commands
  • input of complex or lengthy sequences of commands from files
  • selective output of results to files
  • normal and advanced modes of operation
  • short response times with large sets of data.

INTKEY windows describing the differences between two species of British insects

INTKEY windows describing the differences between two species of British insects, Coenagrion puella and Coenagrion pulchellum. [Photo: Mike Dallwitz, CSIRO]

An example: the description of flowering plants

A good example is the DELTA based publication ‘The Families of Flowering Plants’ by Drs Watson and Dallwitz. A review of this publication in the prestigious international journal Nature stated that may represent a stage in plant taxonomy as important as the publication of Linnaeus’s Genera Plantarum in 1737. Carl Linnaeus (1707-1778) was the Father of Taxonomy – his system for naming, ranking, and classifying organisms is still in wide use today (with many changes). His ideas on classification have influenced generations of biologists during and after his own lifetime, even those opposed to the philosophical and theological roots of his work. More information can be found by following the link in the Sources below.

Descriptions in foreign languages

To produce descriptions, KEYS, and INTKEY packages in different languages, it is only necessary to translate the character list. Data sets have been produced in Chinese, Dutch, French, German, Greek, Indonesian, Italian, Portuguese, and Spanish. The INTKEY program itself can readily be translated into other languages, as all of the program text (menus, commands, prompts, diagnostic messages, and help) are in simple text files separate from the program files. English, French, German, Italian, and Spanish versions are currently available.

Availability of DELTA programs

In February 1988 Mike Dallwitz and Richard Pankhurst initiated The DELTA Newsletter, which can be found by following the link in the Sources below. This was intended to promote communication among scientists developing and applying computer technology in the collection, storage, analysis, and presentation of descriptive taxonomic data. It was not restricted to software or applications supporting or implementing the DELTA Standard.

Topics included: computer programs for taxonomy, data formats, data interchange standards, data capture, data analysis, database design, description printing, expert systems, information retrieval, interactive identification, KEY making, mapping systems and taxonomic characters.

The latest versions of the DELTA programs and several data sets can be found by following the link in the Sources below.

Other applications

DELTA has great potential in other areas of science, industry and the community through the use of interactive identification and information retrieval packages in areas such as biodiversity, ecology, archaeology, molecular biology, plant breeding, agronomy, farming, horticulture, quarantine, pest control and forensic science.

Sources