Eight additional genomes have been assembled to varying levels of
completeness (Table 1). Amongst these are the genome sequences of a
father-mother-daughter trio from the Jamaican Lion (JL) cultivar, which
was sequenced using PacBio
(McKernan et al.,
2020). The parental genome assemblies including gene annotation are
available on the NCBI database, while all three genome assemblies are
available on the Medicinal Genomics website
(https://www.medicinalgenomics.com/jamaican-lion-data-release/).
In addition to these three genomes, 40 genomes from a diverse range of
cultivars were sequenced with Illumina short-read sequencing as part of
the Medicinal Genomics ‘Cannabis Pan-Genome Project’
(McKernan et al.,
2020). The whole-genome sequencing (WGS) data generated in this project
are available on the NCBI sequence read archive (Supplementary Table 2).
These genome sequences will be an invaluable resource for characterising
the genetic basis behind the wide phenotypic diversity observed within Cannabis. Specifically, they will facilitate the development of a Cannabis pan-genome, where gene sets unique to specific cultivars
could be defined. Such cultivar-specific genes are often representative
of niche phenotypic adaptations that have evolved in response to
specific environmental conditions
(Montenegro et al.,
2017; Tao et al., 2019). Cultivar-specific genes could be key targets
for breeding, where new cultivars could be designed with desirable
traits for specific production purposes
(Tao et al., 2019).
There is also a wealth of additional genomics data available. This
includes sequences of organellar genomes, of which there are seven
mitochondrial and nine chloroplast genome assemblies available
(Supplementary Table 3). The organellar genomes are particularly useful
for resolving phylogenetic relationships. The rate of nucleotide
substitution of mitochondrial coding sequences is lower than that of the
nuclear and plastid genomes, making them useful molecular markers for
resolving deep taxonomic relationships
(Knoop, 2004; Wolfe et
al., 1987). Despite this high intragenic sequence conservation,
angiosperm mitochondria can exhibit high variation in genome
organisation both within and between species
(Cole et al., 2018;
Davila et al., 2011; Palmer and Herbon, 1988). Perhaps taking a
comparative genomics approach to investigate organisational variation
within the mitochondrial genome between different Cannabis cultivars would be insightful for resolving relationships within the Cannabis genus. In contrast, the chloroplast genome is
characterized by both stability in genome organisation and sequence
conservation between species
(Palmer and Herbon,
1988). Hence the chloroplast genome is often used to resolve
phylogenies at the ordinal and familial taxonomic levels
(Oh et al., 2016;
Vergara et al., 2015; H. Zhang et al., 2018).
Furthermore, genotyping by sequencing (GBS), amplicon sequencing,
bisulfite sequencing and Hi-C data are available for a multitude of
different hemp as well as marijuana varieties (Supplementary Table 2).
GBS is an efficient and cost-effective method to genotype a large number
of samples, providing insight into the population structure and genetic
diversity within a species
(He et al., 2014).
There have been at least three population-based studies that have
generated GBS data for ~400 samples, representing both
hemp and marijuana lines
(Lynch et al., 2016;
Sawler et al., 2015; Soorni et al., 2017). These studies find that hemp
and marijuana often form distinct populations, not segregating based
only on the BT and BD loci, but on a
genome-wide level
(Lynch et al., 2016;
Sawler et al., 2015; Soorni et al., 2017). Bisulfite sequencing detects
DNA methylation and is useful for understanding epigenetic gene
regulation (Elhamamsy,
2016; Li et al., 2020). Two bisulfite sequencing datasets are available
for analysis (McKernan
et al., 2020; Niederhuth et al., 2016). Given that economically
important traits like sex expression and flowering time are under strong
environmental control, it will be interesting to explore to which extent
those traits are epigenetically regulated. This may open the possibility
of breeding ‘climate smart’ Cannabis plants, similarly to other
crops where epigenetically regulated heat, drought or cold adaption are
explored for crop improvement
(Varotto et al.,
2020).
Lastly, the 3D organisation of the genome within the nucleus can be
mapped with Hi-C data
(Rodriguez-Granados et
al., 2016). One Hi-C dataset exists for the JL cultivar and is
available on NCBI (Gao
et al., 2020). Additional Hi-C datasets are available for the Jamaican
Lion genomes through the Medicinal Genomics website
(https://www.medicinalgenomics.com/jamaican-lion-data-release/). The 3D
organization of the genome and its implications for gene regulation are
currently being heavily investigated in plants
(Santos et al.,
2020). The available Cannabis Hi-C data are both useful for
facilitating genome assembly as well as for understanding epigenetic
regulation of gene expression
(Burton et al., 2013;
Lieberman-Aiden et al., 2009; Xie et al., 2015).
There have also been many studies that have focused on characterising
the Cannabis transcriptomes (Supplementary Table 2). Perhaps most
notably, in 2019, an extensive ‘transcriptome atlas’ was generated for Cannabis(Braich et al.,
2019). This study involved RNA-sequencing of 71 samples taken from
multiple tissues of the Cannbio-2 cultivar (CN2), at various
developmental stages. This transcriptome data will be useful for the
annotation of new genome assemblies, as well as for inferring gene
functions based on spatiotemporal gene expression patterns. Other
studies have characterised the transcriptome of hemp lines grown under
salinity and drought stress
(Gao et al., 2018; Liu
et al., 2016), as well as during bast fibre development
(Behr et al., 2016;
Guerriero et al., 2017). Three further studies have focused on
sequencing the transcriptome of glandular trichomes, with the aim of
profiling the expression of genes involved in terpene and
phytocannabinoid biosynthesis
(Booth et al., 2020;
Livingston et al., 2020; Zager et al., 2019). Furthermore, two recent
studies have focused on identifying the sex chromosomes based on
characterising the expression of sex-linked genes in male and female
plants (McKernan et
al., 2020; Prentout et al., 2020). The transcriptomes of the PK and FN
cultivars sequenced in 2011 are also available
(van Bakel et al.,
2011).
While wide-spread illegalization of Cannabis has stunted genomics
research in the past, it is clear that there have been major advances in
this field in recent years. With chromosome-level genome assemblies now
available, as well as genome-wide annotations and abundant transcriptome
data, the resources for future research are plentiful.
8. More than the sum of its parts: Medical applications of
phytocannabinoids
Cannabis plants represent a rich source of biologically active
compounds, including more than 100 plant-derived cannabinoids
(phytocannabinoids) and more than 200 terpenoids
(Russo, 2011). Thus
far, research into the medicinal effects of Cannabis has largely
focussed on phytocannabinoids. Among these, the most well-studied are
the psychoactive THC, and the non-psychoactive CBD, though other
phytocannabinoids such as CBG and CBC also show therapeutic potential
(Russo, 2011) (see
chapter 3 for details on phytocannabinoid synthesis and genetics).