Genome sequencing, assembly and protein-coding gene predictions
To yield a high-quality genome assembly, the genomes were sequenced
using a whole genome shotgun sequencing strategy with a combined
strategy that included both the Pacific Biosciences RS II (Pacific
Biosciences, Menlo Park, CA, USA) and Illumina MiSeq platforms (Illumina
Inc., San Diego, CA, USA) by Biomarker Technologies Co, LTD (Beijing,
China). The PacBio long reads were corrected and assembled using Canu
for draft genomes (Koren et al., 2017). FinisherSC was used to improve
the contiguity of draft genomes (Lam, Labutti, Khalak, & Tse, 2015),
and pilon was used to polish the draft genomes with collected Illumina
data by Musket (Liu, Jan, & Bertil, 2013; Walker et al., 2014).
HaploMerger2 was used to separate the two haploid sub-assemblies from
the assembly (Huang et al., 2012; Huang, Kang, & Xu, 2017). Ab initio
predictions were carried out using the reference protein domains ofPeniophora sp. andArmillaria ostoyae (Sipos et al., 2017; Varga et al., 2019) forAuriscalpium and Strobilurus , respectively. Based on the
two high-quality sequencing datasets described above, the protein-coding
gene set of genomes were refined following the GETA gene annotation
method (Li et al., 2020). BUSCOs using database of fungi_odb9 were
applied to our gene predictions.
Identification ofcarbohydrate-active
enzymes (CAZymes) and lignocellulolytic genes and Swiss-Prot annotation
Carbohydrate-active enzymes (CAZymes) were identified by using a
combination of pipelines that included the HMM and BLASTP algorithms as
used by Chen et al., (2016). CAZyme annotation by BLASTP algorithms used
a cutoff e-value < 1e-5 and coverage> 20%. CAZyme annotation by HMM algorithms used a
cutoff e-value < 1e-5 for alignments of >
80 amino acids, and for alignments of < 80 amino acids,
we used an e-value of < 1e-3 and coverage> 25%. Perl program was used to extract the
annotation results that conform to the two methods as the final result.
For Swiss-Prot annotation, the BLASTP algorithm was used to align the
protein sequences to Swiss-Prot (Bairoch, Boeckmann, Ferro, &
Gasteiger, 2004) with e-value < 1e-5. Lignocellulolytic
genes (cellulase, hemicellulase, pectinase, lignin oxidase and lignin
degrading auxiliary enzymes) were identified mainly by the Swiss-Prot
annotation with keywords as described by Chen et al., (2016) in their
Table S20. In the following analyses, lignin oxidases and lignin
degrading auxiliary enzymes encoded by lignocellulolytic genes were
combined into one category called ligninases (Chen et al., 2016).