Deciphering the cattle and sheep genomes

By Colin WardFebruary 14th, 2011

The turn of the millennium heralded the remarkable achievement of the sequencing of the human genome. This was a major scientific and historical landmark that opened the way for genome sequencing of other mammalian species. Genome sequences are necessary for a full genetic description of an animal and underpin a wide range of genetic and post-genomic technologies. In that same year, 2001, a CSIRO workshop identified key limitations and opportunities in Australian livestock industries that could be addressed if the livestock genome sequences were available. A group at this workshop accepted the challenge to link with like-minded groups worldwide to identify strategies to progress their ambition.

In 2007, an assembled draft bovine genome sequence was released, and in 2009, the complete sequence of the bovine genome was published along with a study of global cattle genetic diversity in two high impact papers in the journal Science. These were accompanied by more than 30 companion papers. The CSIRO livestock genomics group played an integral role in the immense task of sequence generation and assembly, and finally, detailed gene annotation and analysis.

Completion of the sheep genome sequence is in hot pursuit. In 2007, a framework map of the sheep genome based on the human genome was published, and last year, a high density DNA array tool was developed to speed up production of the sheep genome sequence. The completed genome sequence will be published this year. Ensuing generations of new genetic and genomic tools, will not only enable the reinvigoration of ruminant physiology and behaviour, but will also have profound impacts on livestock production for decades to come.

Historical livestock genetics research by CSIRO

CSIRO has a long history of research involving the breeding of livestock animals with desirable production traits. In the 1930s work on cattle genetics began in the Division of Animal Health when geneticist RB Kelley began to investigate the potential of zebu cattle for northern Australia. This research culminated in the development of the Belmont Red breed of cattle by Jim Rendel and Greig Turner from the Division of Animal Genetics who crossed Afrikaner cattle, imported from America, with Herefords and Shorthorns in a research program that extended over the 1950s and 1960s (see also Cattle for the tropics). Other genetic work on cattle was that of Harry Wharton, from the Division of Entomology, who in the 1970s believed that the best way to control the cattle tick problem was to select tick resistant lines. This was done by selective breeding of European (Bos taurus) cattle showing tick-resistance as well as introducing tick-resistant Zebu (Bos indicus) cattle into the herd. By 1982 it had been shown that a definite improvement in tick resistance in the Bos taurus herd had been achieved by yearly introductions into the herd of tick-resistant individuals and by culling tick-susceptible individuals. It was moreover demonstrated that genetic improvement for resistance to ticks was possible without affecting other factors such as milk yield.

In 1982 the CSIRO Division of Tropical Animal Science was established with a focus on the northern Australian beef industry and the unique challenges of its production environment. The research was primarily targeted at improving livestock genetics by continuing selective breeding programs and also the development of vaccines against ectoparasites. The breeding programs were initially focused on enhancing meat production and disease resistance. While significant advances were made by these selective breeding programs they were suitable only for easy to measure traits, and typically resulted in incremental herd improvements.

Molecular genetics – a new approach

In 1988 Jay Hetzel initiated research at CSIRO Rockhampton to use modern molecular genetics to markedly accelerate the process of selective breeding especially for production traits that are: (i) difficult and expensive to measure, (ii) sex limited, or (iii) ones that can only be measured late in life or after slaughter. The traits most likely to benefit from DNA marker assisted breeding included:

simply inherited genetic defects
meat quantity and yield
meat quality
disease resistance
reproductive efficiency.

By following the segregation of a trait within a large family structure, DNA genetic variations or genetic markers in the cattle genome could be identified that are associated with the trait of interest. These could then be used in DNA marker assisted selective breeding programs to markedly accelerate ‘genetic gain’ in a livestock population.

Within the space of two years, the Livestock Genomics group initiated by Jay Hetzel had begun:

the production of a genetic map of cattle
the mapping of the Poll (horn) gene
the search for the mutation causing Pompe’s disease in Brahman cattle, and importantly,
some of the first studies to map quantitative trait loci (QTL) i.e. regions within the genome that are associated with a range of production traits.

In a preview of the international cooperation that was to follow the first ‘low’ density genetic linkage map of the cow was published by CSIRO (Barendse et al., 1994, Nature Genetics, 6: 227-235). Subsequently, a number of QTL were identified, ultimately leading to a CSIRO spin-out company, Genetic Solutions P/L.

Limitations

The commercial and scientific exploitation of this new genetic information was however restricted by the large genetic intervals spanned by these QTL which typically covered genome segments as large as 30-50 centimorgans, which may include many hundreds of genes. This realisation, although a disappointment at the time, emphasised the urgent need for better and denser genetic maps.

Bill Barendse from the CSIRO group continued to pursue candidate genes underlying these QTL and ultimately identified the first commercial genetic markers associated with marbling and tenderness in beef cattle. However, it was soon recognised that many genes often contribute to complex production traits and hence the candidate gene approach would need to be supplemented by whole genome approaches.

The bovine genome project is born

The advent of the Human Genome Sequencing Project, which gathered pace during the latter part of the 1990s and culminated in two historical publications in Nature (the international public consortium – Lander et al., 2001, Nature, 409: 860-921) and in Science (the American private venture – Venter et al., 2001, Science, 291: 1304-1351) in February 2001, precipitated great interest in all of biology and especially in the livestock research community. The complete sequence of the cattle genome in combination with the discovery of a very large number genetic variants (Single Nucleotide Polymorphisms or SNP) were recognised as powerful and efficient tools for identifying relatively small genetic intervals associated with livestock production traits sampled from large commercial herds. Moreover, the sequence also allowed the development of functional genomics and hence a better understanding of how genes contribute to phenotypes.

A watershed meeting

In a watershed meeting held in Victoria in December 2001 an enthusiastic group of CSIRO Livestock Industries scientists, including Ross Tellam and Brian Dalrymple, committed to the enormous task of facilitating the sequencing of the bovine and ovine genomes. The inevitable dawn brought the great uncertainties. Was it possible? After all, it had taken a decade to sequence the human genome at an estimated cost of US $2.7 billion (National Human genome Institute )!

The Bovine Genome Sequencing and Analysis Consortium

The translation of enthusiasm into action was daunting. However, two important enabling factors made the Bovine Genome Sequencing and Analysis Project a reality. First, parallel conversations were being held in several livestock research groups throughout the world and an informal international network was established to promote the common aim of sequencing the 2.7 billion nucleotide base pairs (bp) of the cow genome. Second, the cost of genome sequencing was decreasing at a rapid rate due to technological and methodological developments. Through an initiative in the USA, a White Paper advocating the sequencing of the cow was submitted to the National Human Genome Research Institute (NHGRI) for their consideration. The cow sequence was made a priority by the NHGRI initiative in late 2002, but with the caveat that they would provide half of the necessary funds and the remainder would need to be raised by the livestock community.

In a remarkable example of initiative, international cooperation and hard work, the international Bovine Genome Sequencing and Analysis Consortium was formed and the funds for the remainder of the bovine sequence were raised by late 2003. CSIRO was represented by Ross Tellam in this consortium which was initially led by Steve Kappes (USDA) and Richard Gibbs (Baylor College of Medicine). Sequencing began at the Baylor College of Medicine in Houston later the same year. The animal sequenced, L1 Dominette, was an inbred Hereford. In addition, low coverage sequence was also obtained from animals representing six other breeds of cattle. The latter information was used to identify about 100 000 genetic polymorphisms (SNP) i.e. differences in DNA sequence between individuals. Many large scale resources supporting the genome sequence were also generated over the next few years by the international livestock research community. These included full length cDNAs, ESTs, highly specific BAC (bacterial artificial chromosome) end sequence markers, dense genetic linkage maps, physical maps and cell based resources. ESTs or Expressed Sequence Tags are small pieces of DNA sequence usually 200 to 500 nucleotides long, that provide researchers with a quick and inexpensive route for discovering new genes, for obtaining data on gene expression and regulation, and for constructing genome maps by matching base pairs to locate the corresponding portion of chromosomal DNA.

Genome assembly, annotation and analysis

By 2005 much of the sequence was available but not assembled i.e. the pieces of the enormous puzzle (~20 million sequence reads) had been gathered but the task of assembling them into the correct linear order had not commenced. The assembly process utilised many of the additional resources generated by the research community and proved to be a significant technical challenge taking about two years to complete. (In 2010, even with the advent of second and third generation sequencing technologies, autonomous assembly remains a rate limiting step in all mammalian genome projects.) During this period additional sequence using a minimal tiling path of BAC clones was generated at Baylor College of Medicine. Brian Dalrymple from CSIRO contributed information on the order of BAC clones in the process of assembly.

Annotation and analysis of the assembled bovine genome sequence took place in 2007 and 2008. This complex process was co-led by Chris Elsik (Georgetown University), Kim Worley (Baylor College of Medicine), Ross Tellam (CSIRO) and several others from the international consortium. The analysis was facilitated by hundreds of hours of pre-dawn and late night phone conferences for the Australian participants. Many people with an amazing diversity of expertise contributed during this phase and they provided unique insights into mammalian evolution with particular emphasis on the genes involved in innate immune defence, lactation, metabolism and reproduction. In parallel, the Bovine Hapmap Consortium, including Bill Barendse from CSIRO, analysed SNP data for a population of cattle representing a wide diversity of breeds.

Cover of the April 24th 2009 edition of _Science_

Cover of the April 24th 2009 edition of Science which contained two papers from the Bovine Genome Sequencing and Analysis Consortium and the Bovine Hapmap Consortium. [Source: Reprinted with permission from AAAS]

Discoveries and applications

Collectively, a large number of scientific discoveries were made and these were published as a series of papers in 2009 (see publications). The scientific discoveries were rapidly followed by an acceleration of potential commercial applications with CSIRO’s Bill Barendse contributing strongly in this area. The bovine genome sequence is transformational as it:

provides new scientific insights into mammalian evolution
provides a better understanding of ruminant biology
has precipitated the development of new genetic tools which can be used to discover regions in the genome where there are genetic polymorphisms contributing to production traits of interest.

The latter can also be used to discover underlying molecular mechanisms. These genetic tools are in widespread use by researchers throughout the world and they have since led to many exciting new scientific discoveries and commercial applications.

The Bovine Genome Sequencing and Analysis Project is a tribute to what can be achieved by international scientific cooperation, a common purpose and good will. CSIRO personnel were involved in nearly all aspects of the project and they continue research that will ultimately improve scientific knowledge and generate new commercial products that benefit both the livestock industry and consumers whilst minimising the environmental footprint of the industry.

This is just the beginning!

In addition to the suite of publications in 2009 documenting the bovine genome sequencing and hapmap (haplotype mapping) efforts, there has been evolution of the bovine SNP chip by several international initiatives toward greater and more informative SNP densities (10 000, 50 000, 600 000 per slide) as well as a trend toward sequencing the genomes of additional elite performing animals. This has led to:

the discovery of genes causing genetic diseases in dairy cattle which have had immediate translation into breeding practice
the identification of commercially relevant genetic markers associated with a range of beef and dairy cattle production traits
the introduction of ‘genomic selection’ initiated in 2008 in the dairy industry to accelerate genetic gain (genomic selection is a form of marker-assisted selection in which genetic markers covering the whole genome are used so that all quantitative trait loci (QTL) are in linkage disequilibrium with at least one DNA marker. The process captures the contributions of a large number of genes each of small effect on a phenotype
the SNP chip has also been used for parentage testing, identity tracking and has helped quantify the genetic variation present in endangered populations of rare cattle breeds.

Perhaps the greatest future challenge is to better understand the mechanistic relationship between genotype and phenotype. To achieve this, improvements in the current genome assembly including targeted re-sequencing of ‘problem’ regions in the 2.7 billion bp genome is required. Undoubtedly the genome sequences will underpin livestock genetics research and many new commercial applications over the next decade and beyond.

An example of increased anti-microbial capacity in the cow

An example of increased anti-microbial capacity in the cow. The cathelicidin genes (red) encode antimicrobial peptides in mammals which help to defend against infections at mucosal surfaces. The cow has greatly expanded the number of genes in this family. This may be an evolutionary adaption to increased microbial challenge in the local environment of the cow. In particular, the cow contains very large populations of microorganisms in its rumen. These microorganisms digest forage but there is also a need to have a very strong and robust innate immune defence system in the cow to guard against opportunistic infections. [Source: Ross Tellam, CSIRO]

The International Sheep Genomics Consortium and the development of the SNP50 Beadchip

The International Sheep Genomics Consortium (ISGC) was founded in 2005 by a group of Australian and New Zealand scientists with a shared interest in the genetics of sheep. The group, led by CSIRO’s James Kijas and Brian Dalrymple, concluded that working in a coordinated manner would assist in the development of genomic tools of use by everyone with an interest in improving the production and health of sheep flocks.

A central focus for the consortium has been the identification and application of SNP markers. The group recognised the need to identify tens of thousands of SNPs in 2006, and set about their discovery using traditional Sanger-based sequencing. This yielded over 6 000 SNPs and coincided with the emergence of next generation sequencing technologies. Next, the CSIRO team performed a reduced representation experiment which, when coupled with next generation sequencing, identified over 77 000 SNPs. In parallel, a team led by John McEwan (AgResearch, NZ) used 454 sequencing technology to identify another 273 000 high quality SNP markers. Subsequently during 2008, the International Sheep Genomics Consortium designed the first high density SNP assay for investigation of the sheep genome. In partnership with Illumina, the commercial SNP50 Beadchip was launched in January 2009. It has proven to be a high quality research tool that is currently underpinning the implementation of genomic selection and the dissection of complex traits via genome wide association studies in sheep.

The sheep genome project

The emergence of next generation sequencing also prompted the consortium to plan a Reference Genome Project. The suitability of using short sequence reads (< 100 bp) for the assembly of large and complex eukaryotic genomes was initially unproven. However, this changed in early 2009 with the publication by a group led by Chinese researchers, of the draft sequence of the giant panda genome constructed solely from paired-end next generation sequence reads. This showed such a venture was technically possible, so at the ISGC workshop held in January 2009 in San Diego, the consortium agreed to initiate the sequence of the reference sheep genome. The data generation phase of the sheep reference genome project commenced at two sequencing facilities in late 2009 – the Kunming Institute of Zoology and Beijing Genome Institute Shenzhen led by Wen Wang, Jun Wang and Xu Xun. Simultaneously, work at the Genomics Center for Comparative and Functional Genomics at The Roslin Institute, led by lan Archibald, produced additional sequence data. Preliminary analysis has been completed, and the ISGC plans to launch the reference sheep genome sequence in January 2011.

Honours and awards

The CSIRO team, led by Ross Tellam, Bill Barendse, James Kijas and Brian Dalrymple were, awarded the CSIRO Chairman’s Medal for 2010 for their contributions to the bovine and ovine genome projects and the group led by Bill Barendse were awarded a CSIRO Medal for Research Achievement in 2003 for work on the use of quantitative trait loci (QTL) to identify the first commercial genetic markers associated with marbling and tenderness in beef cattle.

The Livestock Genomics Team receiving the CSIRO Chairman's medal

The Livestock Genomics Team receiving the CSIRO Chairman’s medal. Front row left to right: James Kijas, Bill Barendse, Megan Clark (CSIRO Chief Executive), Senator the Honourable Kim Carr, Ross Tellam, Brian Dalrymple, Simon McKeon (Chairman CSIRO Board); second row left to right: Russell McCullock, Rowan bunch, Wes Barris, Warren Sim, Nick Corbet; third row left to right: Juca Porto-Neto, Blair Harrison, Rachel Hawken, Paul Williams, Sean McWilliam. Absent: David Townley, Abhi Ratnakumar, Ashley Waardenberg, Ylva Strandberg, Dave Tang, Lillian Sando, Aaron Ingham, Vicki Whan, Evgeny Glazov. [Source: CSIRO]