Results and Discussion
Genome Quality Metrics
The H. glycines genome assembly comprises 2,109 contigs, all of
which were incorporated into the expected 9 pseudomolecule scaffolds
using the Juicer pipeline, in agreement with cytological observations
[41, 94]. The genome size of the new assembly is 157,982,452 bp,
within the expected range for this clade of species (Table 1). However,
this increased genome size may be due to the incorporation of repetitive
haplotigs, an assumption supported by the inflation of repeats (61.4 Mb)
compared to the previous TN10 H. glycines draft (42.1 Mb). Still,
total repeat content (38.9%) is within the published range (34-47.7%)
[40, 41], and yet maintains a lower repeat content than the X12
assembly (67.3 Mb) (Supplementary Table 1).
To assess quality and completeness, the input sequences were aligned to
the assembly. High rates of alignment: 97.3%, 97.2%, and 73.5%, were
observed for Pacbio subreads, Pacbio CCS reads, and 260bp PE Illumina
reads, respectively. To evaluate the genic complement of the annotation,
we ran BUSCO3 (Benchmarking Universal Single Copy Orthologs) [86] on
both the genome and its predicted proteins. Of the 982 possible Nematoda
BUSCO genes, 634 (64.6%) and 674 (68.6%) were complete, 46 (4.7%) and
122 (12.4%) were duplicated, and 86 (8.8%) and 88 (9%) were
fragmented, respectively. A stringent BLAST of missing BUSCO proteins on
the predicted proteins found 141 of the missing BUSCO proteins (median
e-value of 5.8e-18), achieving a possible complete rate of 83%
[95]. Overall, the high proportion of input read mapping, high BUSCO
scores, and complete incorporation of all contigs, suggests this latestH. glycines assembly is of high quality.
Improvements over existing soybean cyst nematode
assemblies
This pseudomolecule assembly is a massive step forward in the genomics
of plant-parasitic nematodes, increasing the ability of interspecies
comparisons. To assess the contiguity and accuracy of our new assembly
we used gene-based synteny with BLAST and i-ADHoRe [93] as well as
direct chromosome alignments using Mummer3 (Figure 1). With the
gene-based approach, 67Mb and 31.7Mb of synteny was found to the TN10
draft and the X12 genome assemblies, respectively. Using Mummer, these
assessments rose to 92.4Mb and 43.4Mb, respectively. Considering that
more than 61Mb of repeats are in the genome, synteny to 42-58% and
20-27% of the genome in the TN10 draft and X12 assemblies,
respectively, is high.
Assignment of contigs to chromosomes was improved in this assembly
compared to existing X12 assembly. These differences resulted in the
identification of a number of large chromosomal misjoins in the X12
assemby: including multiple interchromosomal translocations and the
misassignment of chromosome 9 (Figure 1; Supplemental Table 2).
Surprisingly, after adjusting for these large chromosomal misjoins in
X12, there were very few chromosomal rearrangements between the two
lines of a highly adaptable species (Figure 1).
Gene Annotation
The gene annotation resulted in 22,465 gene models, encoding 23,933
transcripts with an average gene length of 4,569bp, values that are
comparable to related species (Table 2). While the frequency of genes is
substantially larger than the previously published X12 annotation
(11,882), the propensity for parasites to duplicate genes involved in
host-parasite interactions requires a novel approach to gene prediction.
To prevent the obliteration of parasitism genes thought to be maintained
at high copies in the nematode, we developed an annotation approach was
taken to predict all transcribed elements in the genome, including
repetitive elements. A genome without repeat masking was used to allow
highly similar, high-copy number genes to be identified. However,
because repetitive elements frequently reside in noncoding regions of
genes, multiple genome-guided transcriptomes and gene predictions
enabled the dissection of high-confidence gene models (Supplemental
Table 3). This improvement in gene prediction is indicated by our total
gene count (22,265). Our analyses included known parasitism genes and
repeats missing from X12 (11,882) and produced a more highly contiguous
genome than the previous TN10 assembly (29,769). Our average and median
gene and transcript lengths are the largest among the compared species,
while exon count per transcript has also increased relative to earlier
annotations of H. glycines and other related species. Another
line of evidence to support these gene predictions lies with the high
proportion of genes that have functional annotations with 85.2% of
predicted proteins or transcripts having homology to sequences in
Interpro, Swissprot, NCBI NR, or NCBI NT databases (Supplemental Table
4; Supplemental Table 5).
Effector gene prediction
With a complete genome, we can now better understand the molecules that
are exchanged between the parasite and host. The first step to
identifying these molecules lies in characterizing transcripts that
produce proteins with signal peptides and without transmembrane domains,
which partitioned 1,514 transcripts from the 23,933 total transcripts,
and which were attributed to 1,421/22,465 genes (Table 3). A second
elementary step in identifying these molecules lies with attributing the
previously published effectors to genes in the genome [11, 19].
Using DNA sequence similarity, 125 potential effector genes were
identified with a minimum query alignment and sequence identity of 50%.
Using the same parameters with protein query length and identity with
Diamond, 362 effector-like protein-coding genes were identified.
However, only 44/125 and 117/362 of these putative effectors encode
secreted proteins, indicating that genes may be variable among SCN lines
in their propensity to be secreted, a variability that may contribute to
SCN virulence.
Expression of Effector
genes
In the hopes of further resolving
the genes important to the host-parasite exchange, we leveraged existingH. glycines RNA-seq (SRP122521). All possible comparisons were
made between pre-parasitic (i.e., before root penetration) second-stage
juvenile nematodes (PP), second-stage parasitic (i.e., after root
penetration) nematodes on a susceptible host (C for compatible), and
second-stage parasitic nematodes on a resistant host (IC for
incompatible) (Supplemental Data 1). These data were integrated into a
tabular database of genes, functional annotations, sequences, and
differential expression. Using this database, we filtered genes in theH. glycines genome for key traits of genes involved in
parasitism, including differential expression at a p-value of 0.05,
>1 log2 fold change, signal peptide presence, and the
absence of a transmembrane domain. Of the 1,421 genes we assessed, 61
genes were differentially expressed between the PP and C, 392 genes
between PP and IC, and 609 in novel comparisons between C and IC samples
(Table 3). Among these comparisons, we assessed which genes may be
involved in host nucleus reprogramming by annotating NLS signals in
these differentially expressed genes. We only found four and fourteen
genes encoding proteins with NLS signals in the C and IC vs PP
comparisons, respectively. However, comparisons between C and IC
revealed 113 differentially expressed genes that were also secreted and
nuclear targeted.
While effector genes upregulated in the preparasitic stages are likely
to be associated with the migratory phase of parasitism, getting a list
of candidate effectors for parasitic stages was less complete. Only four
published effectors were upregulated in C vs PP, and three were
upregulated in IC vs PP (Table 3). However, by comparing effector genes
that were downregulated in parasitic IC samples but also upregulated in
C samples vs PP, we discovered a number of effector genes that were
downregulated when encountering resistance: 6E07[11], 4G06
(ubiquitin extension)[11], 4D06[11], three versions 45D07 type
effectors (chorismate mutase) [14], 30C02 (defense suppressor)
[29], four 2D01 type effectors (interacts with plant LRR)[96],
20E03[11], 12H04[11], 5A08 (RAN-binding, interacts with soybean
LRRs [97]), and Gland14 (endopeptidase) [19]. While interesting,
these decreases in expression could also be interpreted as the early
stages of nematode death on a resistant host (IC), warranting further
investigation into the mechanisms for these expression changes.