Expressed centromere specific histone 3 (CENH3) variants in cultivated triploid and wild diploid bananas (Musa spp.)

Centromeres are specified by a centromere specific histone 3 (CENH3) protein, which exists in a complex environment, interacting with conserved proteins and rapidly evolving satellite DNA sequences. The interactions may become more challenging if multiple CENH3 versions are introduced into the zygote as this can affect post-zygotic mitosis and ultimately sexual reproduction. Here, we characterize CENH3 variant transcripts expressed in cultivated triploid and wild diploid progenitor bananas. We describe both splice- and allelic-[Single Nucleotide Polymorphisms (SNP)] variants and their effects on the predicted secondary structures of protein. Expressed CENH3 transcripts from six banana genotypes were characterized and clustered into three groups (MusaCENH-1A, MusaCENH-1B, and MusaCENH-2) based on similarity. The CENH3 groups differed with SNPs as well as presence of indels resulting from retained and/or skipped exons. The CENH3 transcripts from different banana genotypes were spliced in either 7/6, 5/4 or 6/5 exons/introns. The 7/6 and the 5/4 exon/intron structures were found in both diploids and triploids, however, 7/6 was most predominant. The 6/5 exon/introns structure was a result of failure of the 7/6 to splice correctly. The various transcripts obtained were predicted to encode highly variable N-terminal tails and a relatively conserved C-terminal histone fold domain (HFD). The SNPs were predicted in some cases to affect the secondary structure of protein by lengthening or shorting the affected domains. Sequencing of banana CENH3 transcripts predicts SNP variations that affect amino acid sequences and alternatively spliced transcripts. Most of these changes affect the N-terminal tail of CENH3.