To the Editor

Advances in genomics over the past 20 years have enhanced the precision and efficiency of breeding programs1 in many temperate cereal crops2,3. One of the first applications of genomics-assisted breeding has been the introgression of loci for resistance to biotic stresses or major quantitative trait loci (QTLs) for tolerance to abiotic stresses into elite genotypes through marker-assisted backcrossing (MABC)4. For instance, introgression of a major QTL for submergence tolerance (Sub1) into widely grown rice varieties has substantially improved yield in >15 million hectares of rain-fed low-land rice in South and Southeast Asia5. Despite this success story, the overall adoption of genomics-assisted breeding in developing countries is still limited especially for complex traits like yield under environmental stress in several other crops6,7.

Although maize, rice and wheat dominate global food production, several other crops are of great importance for some communities in developing countries (Supplementary Table 1). This group includes sorghum and millets, groundnut, cowpea, common bean, chickpea, pigeonpea, cassava, yam and sweet potato (Table 1). As they are not extensively traded and receive little attention from researchers compared to the main crops, these important crops for marginal environments of Africa, Asia and South America are often referred to as 'orphan crops'. Breeding for orphan crops is lagging behind major crops although they are key staple crops in many low-income countries where small-holder farmers cannot afford to buy improved seed. The magnitude of the breeding effort for those orphan crops and the capacity of adopting modern technologies is extremely variable across developing countries and generally directly related to the health of the national economy.

Table 1 Details on production of three world's major food crops and selected orphan crop species

In any crop, many of the key traits affecting crop performance are under complex genetic control and show quantitative variation8. These traits are of growing importance in breeding as favorable alleles for most simply inherited traits are already prevalent and often fixed in elite germplasm, suggesting that genetic gain for such traits may be difficult to achieve. The overall objective for most breeders is to improve crop productivity in a target environment. Yield is a direct reflection of biomass or the proportion of biomass that is converted to the harvestable commodity. Yield is a result of the integration and interaction of many physiological and metabolic processes over time and in environments that are increasingly variable due to climate change9.

The prediction of phenotype on the basis of genotypic composition is challenging, as crop performance can be profoundly influenced by weather conditions, soil composition, pathogens and pests and trial management (e.g., fertilizer input, weed control, water supply). Interaction of genotype with the environment can be evaluated through suitable experimental design, multiple site evaluation and careful measurement of the environmental variables. Accurate prediction of plant phenotype from genotype through genomics-assisted breeding is exacerbated when landraces or wild germplasm, representing different gene pools, are used as sources of favorable alleles for target traits. These sources have been important in improving disease and pest resistance traits, some traits influencing nutritional and sensory quality traits, and in some cases polygenic traits such as yield10. Such success, however, is yet to be achieved in routine practicethe in the above mentioned orphan crops.

Here we provide a short overview of both the potential and the challenges in implementing genomics-assisted breeding in orphan crops. Additionally, we present a critical appraisal of the application of association mapping or genome-wide association studies (GWAS)11, marker-assisted recurrent selection (MARS)7 and genomic selection (GS)12,13 in improving grain yield of orphan crops in developing countries.

Until recently, only major commercial crop species have benefited from the application of next-generation sequencing and high-throughput tools in molecular biology. Today, however, the use of such high-throughput technologies to assay molecular markers and transcript sequences, genome structural variation, gene space or even genome sequences is now feasible for orphan crops14,15,16. The low sequencing cost has motivated Elshire et al.17 to develop a robust, cost-effective, highly multiplexed sequencing approach known as 'genotyping by sequencing', which is expected to replace high-density-marker genotyping (Supplementary Fig. 1). Importantly, these new sequencing technologies for characterizing germplasm collections will help overcome 'ascertainment bias', which has hampered genotypic evaluation of diverse collections (Supplementary Fig. 2).

One of the prerequisites of molecular breeding has been the identification of molecular markers associated with traits of interest18. Linkage mapping to detect QTLs generally focuses on the analysis of families from crosses between two inbred lines. Although the statistical power for QTL detection is high, the genetic resolution of these QTLs is poor. Conversely, GWAS or association mapping takes advantage of historic recombinations that reveal chromosomal regions of linkage disequilibrium (LD) where markers remain associated with traits of interest over many generations. GWAS, widely used in human genetics, is being increasingly adopted in crop species like maize19,20,21,22,23 and rice24,25,26,27. The best features of linkage and association approaches can be combined through nested association mapping populations28,29 that enable high-power and high-resolution analysis through joint linkage mapping and association mapping. However, many breeders feel that the association mapping approach has failed to deliver beneficial alleles or haplotypes for crop improvement. Limited population size, low heritability, lack of candidate genes, low marker density and the difficulty in identifying beneficial alleles are the main limiting factors. Furthermore, variation in phenology greatly reduces the effectiveness of association mapping in singling out the effects of loci that influence yield potential per se as opposed to those affecting plant development or phenology. Clearly, many or, indeed, most, functional genes in the genome will contribute directly or indirectly to yield. Therefore, it is unreasonable to expect that any genetic approach will unequivocally resolve the plethora of polymorphisms that contribute directly or indirectly to yield, even with proper experimental design30 and the use of appropriate GWAS statistical models31,32.

Cloning of QTLs is becoming increasingly feasible for manipulating quantitative traits by means of marker-assisted selection or genetic engineering33. Over 50 major QTLs of agronomic interest have been cloned34. QTL cloning has also revealed the important role of noncoding sequences in modulating gene expression35. In particular, the availability of gene sequences allows a direct and targeted search of novel alleles, hence expanding the pool of variability available for breeding purposes.

One of the difficulties for developing superior genotypes for complex traits such as 'yield in stressful environments' is that these traits are often controlled by multiple, small-effect QTLs and/or several epistatic QTLs. When selecting for those complex traits, MARS or GS appears to be best suited for stacking beneficial QTL alleles (Fig. 1). MARS7 allows the improvement of polygenic traits by stacking favorable alleles at a large number (10–40) of the most significant loci involved in the expression of the target traits. Although MARS is being used routinely in private sector breeding programs3, no published reports that discuss its use by the public sector are available. In GS12,13 the breeding value of genotypes is calculated by incorporating data for all marker loci. Implementation of GS indicates that the quality of prediction decays with generational distance from the training population36. Part of this decay is the result of basic GS models accurately capturing kinship or broad-scale pedigree relatedness, but failing to identify markers in extremely high LD with the causal loci. Such loci are likely to remain informative through many generations, particularly when only a moderate number of markers are used. This is exactly what GWAS excels at. The challenge is now to bridge these models and approaches (GWAS and genomic selection) to accelerate genetic gains in orphan crops.

Figure 1: Schematic representation of three molecular breeding approaches for crop improvement.
figure 1

(a) MABC is the most extensively used molecular breeding approach, largely deployed for introgressing transgene or major loci for resistance to biotic stress, grain quality traits or a major QTL explaining higher phenotypic variance for abiotic stress tolerance. In the absence of a predictive gene-based marker, the first step in this approach is identification of loci based on analysis of genotypic and phenotypic data from populations segregating for the traits of interest. Subsequently, the elite variety lacking the desired allele(s) at the target locus or loci is used for backcrossing with the donor genotype. Molecular markers associated with the QTL are used for screening the backcross populations for identification of the superior lines that possess the favorable QTL allele at each cycle of backcrossing. Depending on the population size and considering one or two target loci, two to three backcrosses are usually sufficient for recovering most of the recipient genome4. Progress can be monitored with the help of randomly selected molecular markers for monitoring the background selection. After generating the backcross progenies with the desired genome coverage from the elite genotype, one cycle of selfing is done and homozygous MABC lines are used for selecting the superior lines for field evaluation. (b) The MARS approach is deployed to accumulate favorable alleles for 10–40 loci in a set of complex and simple traits including yield. In the first step, segregating populations are generated by crossing elite varieties that are superior for the targeted traits but presenting favorable alleles at different loci. These populations are genotyped and phenotyped at a suitable inbreeding level in the targeted environments and these data are used for identification of QTLs for the target traits. Looking at allelic complementarity at target loci, selected genotypes are crossed to stack favorable alleles across successive cycles of recurrent selection. After completing two to three recombinant cycles, superior lines selected based on ideal genotypes are self-fertilized for field evaluation. (c) The GS approach, rather than relying on mapped loci, uses the breeding values that are calculated based on high-density genotypic data and historical phenotypic data from a 'training population' usually made up of breeding lines. Based on genomic-estimated breeding values (GEBVs), superior lines are selected for new crosses. The progeny lines from these crosses are genotyped and, based on the model developed for calculating GEBVs with the training populations, GEBVs are calculated for the progeny lines. Subsequently, the progeny lines having higher GEBVs are used for the next cycle of crossing. At any point in the cycle, progeny lines with higher GEBVs can be extracted for the field evaluation.

Although several success stories have been published, mainly based on the MABC approach4, the overall impact of genomics-assisted breeding on crop development programs in developing countries remains very limited6,7, especially for complex traits. In the short term, the adoption and successful application of this approach outside of developed countries requires the following: first, scientists trained in modern breeding technologies; second, improved local infrastructure capacity for accurate and relevant phenotyping; third, local access to marker technologies for efficient genotyping; and fourth, the deployment of suitable data-management systems (Box 1).

High-throughput genotyping and sequencing facilities could benefit from economies of scale, as large-scale service laboratories are also preferred by large seed companies. With the availability of sequencing service laboratories, the same approach can be adopted for breeding in developing countries, although sample shipment across national borders sometimes poses logistical and quarantine challenges, especially in the developing world, and raises concerns about germplasm protection. Even so, the availability of some medium-scale genotyping and sequencing facilities in developing countries that have stable economies will be useful from a capacity-building perspective, because it will expose scientists to modern technologies while allowing some of the projects to be run locally. Physically accessible facilities are often also important to build confidence by breeders in new technologies.

An essential component of molecular breeding infrastructure is facile access to computational tools used in every step of the process, from sample handling to phenotype prediction. Some steps of the process require high-power computers and associated services. Such facilities could be shared, and perhaps located in genotyping centers, as long as high bandwidth network access is easily available. Computational tools needed at every breeding station are much more modest and a good laptop and internet access are usually sufficient.

Facilitated access to high-throughput genotyping and sequencing technologies, as mentioned above, is expected to enhance breeding progress in developing countries in at least three ways: first, high-resolution germplasm fingerprinting by low-coverage sequencing or single-nucleotide polymorphism genotyping will facilitate selection of parents for new crosses and introgression of novel alleles from exotic germplasm; second, accurate mapping of loci associated with traits of interest using either bi- or multiparental mapping or GWAS methodology will enhance identification of marker-trait associations of interest to the breeders; third, high-throughput genotyping or sequencing approaches will enable genome-assisted breeding, including for orphan crops. All of these advances will require enhanced capacity for precision phenotyping in target environments (Box 1). The phenotyping capacity could be enhanced by training and collaborative programs with Consultative Group on International Agricultural Research (CGIAR) institutions and advanced research organizations in tight collaboration with partners from developing countries.

The above approaches have the potential to convert orphan crops into genomic resource–rich crops. Even the definition of an 'orphan crop' is rapidly changing. For instance, a few decades ago access to mutants was critical for genetic studies and peas, maize, barley and tomato were regarded as model crops, whereas rice was considered an orphan crop. In fewer than 20 years, however, rice has moved from being an orphan crop to become a core model for the cereals. Similarly, as a result of investment from several public initiatives, many orphan crops and legumes in particular are presently becoming genomic resource-rich37. For example, a genome sequence has become available for pigeonpea38, whereas pea, the model legume crop from Mendel's time, is yet to be sequenced.

The remaining 'orphan crops' are thus those that have genomes too complex to sequence or characterize. Complexity can be due to polyploidy or large genome size, as is the case for wheat. In wheat, many of the genomic resources taken for granted in other species are only now being developed (http://www.wheatgenome.org/). Similar problems hamper resource development for sugarcane, onion, faba bean and many other important crops, horticultural and forestry species with large and complex genomes.

The ultimate aim of genomics-assisted breeding is to increase the rate of genetic gain across target environments, in less time and at lower cost compared with conventional selection based exclusively on phenotype. Nonetheless, it is critical to remember that even with the availability of the best genotyping resources, genomics-assisted breeding may not be successful in the absence of quality phenotypic data.

In our opinion, development and availability of genomic resources, due to advances in technology, should not be an issue anymore in the orphan crops. Centralized service facilities for high-throughput sequencing and genotyping, together with access to genomics and analytical breeding tools, should enhance implementation and adoption of molecular breeding in staple crops in developing countries. Continued training of breeders and geneticists in modern genomics and molecular breeding approaches and their retention in developing countries coupled with adequate institutional and governmental support will be critical for the sustainable and effective integration of genomics-assisted breeding in crop improvement programs for ensuring food security in developing countries.