Genome sequencing, assembly and protein-coding gene predictions
To yield a high-quality genome assembly, the genomes were sequenced using a whole genome shotgun sequencing strategy with a combined strategy that included both the Pacific Biosciences RS II (Pacific Biosciences, Menlo Park, CA, USA) and Illumina MiSeq platforms (Illumina Inc., San Diego, CA, USA) by Biomarker Technologies Co, LTD (Beijing, China). The PacBio long reads were corrected and assembled using Canu for draft genomes (Koren et al., 2017). FinisherSC was used to improve the contiguity of draft genomes (Lam, Labutti, Khalak, & Tse, 2015), and pilon was used to polish the draft genomes with collected Illumina data by Musket (Liu, Jan, & Bertil, 2013; Walker et al., 2014). HaploMerger2 was used to separate the two haploid sub-assemblies from the assembly (Huang et al., 2012; Huang, Kang, & Xu, 2017). Ab initio predictions were carried out using the reference protein domains ofPeniophora sp. andArmillaria ostoyae (Sipos et al., 2017; Varga et al., 2019) forAuriscalpium and Strobilurus , respectively. Based on the two high-quality sequencing datasets described above, the protein-coding gene set of genomes were refined following the GETA gene annotation method (Li et al., 2020). BUSCOs using database of fungi_odb9 were applied to our gene predictions.
Identification ofcarbohydrate-active enzymes (CAZymes) and lignocellulolytic genes and Swiss-Prot annotation
Carbohydrate-active enzymes (CAZymes) were identified by using a combination of pipelines that included the HMM and BLASTP algorithms as used by Chen et al., (2016). CAZyme annotation by BLASTP algorithms used a cutoff e-value < 1e-5 and coverage> 20%. CAZyme annotation by HMM algorithms used a cutoff e-value < 1e-5 for alignments of > 80 amino acids, and for alignments of < 80 amino acids, we used an e-value of < 1e-3 and coverage> 25%. Perl program was used to extract the annotation results that conform to the two methods as the final result. For Swiss-Prot annotation, the BLASTP algorithm was used to align the protein sequences to Swiss-Prot (Bairoch, Boeckmann, Ferro, & Gasteiger, 2004) with e-value < 1e-5. Lignocellulolytic genes (cellulase, hemicellulase, pectinase, lignin oxidase and lignin degrading auxiliary enzymes) were identified mainly by the Swiss-Prot annotation with keywords as described by Chen et al., (2016) in their Table S20. In the following analyses, lignin oxidases and lignin degrading auxiliary enzymes encoded by lignocellulolytic genes were combined into one category called ligninases (Chen et al., 2016).