Sugarcane Genomics and Transcriptomics Resources
Sugarcane is a widely cultivated plant within Poaceae, which fixates CO2 via C4 photosynthesis. Sugarcane is one of the most important crops around the world, as it is the main source for common sugar, bioenergy and other bioproducts. Modern sugarcane cultivars are the result of classical breeding approaches thar involved interspecific crosses between members of the complex Saccharum. Their genomes are among the most complex in crops. These hybrids are polyploid, highly polymorphic and can present aneuploidy. Recent developments in sequencing technologies are allowing accessing Sugarcane genetic information on a genome-wide level. In Brazil, two research groups used long read sequencing technologies to read the genome information for the hybrid SP80-3280, to a shallow level, here we present a comparison of these two genome versions and a comprehensive gene set for this cultivar. Besides the Brazilian cultivar, there is a French, and a Colombian cultivar with genome sequences available, as well as genomic information for some of the parental species. However, there are many studies accessing the transcriptome of diverse cultivars from around the world. Despite of this, these transcriptomics data cannot easily be exploited due to the lack of a commonly accepted reference. We have exploited publicly available transcriptomics data for 48 cultivars from around the world to create a sugarcane pan-transcriptome. In total we detected over five million protein-coding transcripts, that can be clustered into similarity groups, representing genes and closely related paralogues. We were able to identify approximately twelve thousand groups of transcripts that tend to appear in all cultivars, that we call core set. We show that we can attribute a probable origin for most of these transcripts (S. spontaneum, S. officinarum or S. barbieri). We are making this resource available to the public, and we are developing a platform to ease mining of these data.
Data availability
Transcriptome assemblies (FASTA): 48 genotype-specific transcriptome assemblies exploiting public RNA-Seq data.
Quality of our 48 transcriptome assemblies: Genotype-specific transcriptome evaluation generated with BUSCO, Transrate and Salmon.
CDS (FASTA): CDS files from our 48 genotype-specific transcriptome assemblies.
PEP (FASTA): PEP files from our 48 genotype-specific transcriptome assemblies (Over than 5.2e6 protein-coding transcripts).
Local BLAST server (temp): Temporarily available BLAST server to query our transcriptome assemblies.