Sugarcane Genomics and Transcriptomics Resources

Sugarcane is a widely cultivated plant within Poaceae, which fixates CO2 via C4 photosynthesis. Sugarcane is one of the most important crops around the world, as it is the main source for common sugar, bioenergy and other bioproducts. Modern sugarcane cultivars are the result of classical breeding approaches thar involved interspecific crosses between members of the complex Saccharum. Their genomes are among the most complex in crops. These hybrids are polyploid, highly polymorphic and can present aneuploidy. Recent developments in sequencing technologies are allowing accessing Sugarcane genetic information on a genome-wide level. In Brazil, two research groups used long read sequencing technologies to read the genome information for the hybrid SP80-3280, to a shallow level, here we present a comparison of these two genome versions and a comprehensive gene set for this cultivar. Besides the Brazilian cultivar, there is a French, and a Colombian cultivar with genome sequences available, as well as genomic information for some of the parental species. However, there are many studies accessing the transcriptome of diverse cultivars from around the world. Despite of this, these transcriptomics data cannot easily be exploited due to the lack of a commonly accepted reference. We have exploited publicly available transcriptomics data for 48 cultivars from around the world to create a sugarcane pan-transcriptome. In total we detected over five million protein-coding transcripts, that can be clustered into similarity groups, representing genes and closely related paralogues. We were able to identify approximately twelve thousand groups of transcripts that tend to appear in all cultivars, that we call core set. We show that we can attribute a probable origin for most of these transcripts (S. spontaneum, S. officinarum or S. barbieri). We are making this resource available to the public, and we are developing a platform to ease mining of these data.

Poster presented at the XVIII Brazilian Congress of Plant Physiology
Poster presented at the XVIII Brazilian Congress of Plant Physiology

Poster PDF

Data availability

Transcriptome assemblies (FASTA): 48 genotype-specific transcriptome assemblies exploiting public RNA-Seq data.

Quality of our 48 transcriptome assemblies: Genotype-specific transcriptome evaluation generated with BUSCO, Transrate and Salmon.

CDS (FASTA): CDS files from our 48 genotype-specific transcriptome assemblies.

PEP (FASTA): PEP files from our 48 genotype-specific transcriptome assemblies (Over than 5.2e6 protein-coding transcripts).

Local BLAST server (temp): Temporarily available BLAST server to query our transcriptome assemblies.

Diego Mauricio Riaño-Pachón
Diego Mauricio Riaño-Pachón
Assistant Professor (MS3.2) in Computational, Evolutionary and Sistems Biology

I am a computational biologist/bioinformatician at the University of São Paulo, Campus Luiz de Queiroz (Piracicaba/SP, Brazil).

Felipe Vaz Peres
Felipe Vaz Peres
Master student - PPG - Ciências CENA/USP - Análise multi-genotípica de RNAs longos não codificantes em cana-de-açúcar

decoding the data-driven secrets of life.

Jorge Mario Muñoz Pérez
Jorge Mario Muñoz Pérez
PhD student - PPG - Ciências CENA/USP - Sugarcane Co-expression networks

I am PhD student at PPG - Ciências at the Center for Nuclear Energy in Agriculture, University of São Paulo. I am interested in the intersection of Plant Biology, Programing and Mathemathics. I have worked in Plant Biotechnology, Population Genomics, Machine Learning applied to the discovery of anti-microbial peptides. Currenlty i turned into Sugarcane omics, focusing on genomic resources and gene Co-expression network analysis.

Verusca Semmler Rossi
Verusca Semmler Rossi
Master student

I am a computaional biologis/bioinformatician at the University of São Paulo, Campus Lui de Queiroz.

Related