2.3 Data processing methods
After basecalling with Albacore (version 2.3.4) the MinION reads were demultiplexed (with Epi2me). The total yield for LBA9402 was 298,712 reads, totaling 1,027,720,149 bp, with a mean read length of 3441 bp. Nanopore reads were end-trimmed and filtered on average quality (>Q10) and length
(>5000 bp) with NanoFilt (64-fold coverage after filtering). A total of 4,518,191 99-nucleotide paired-end Illumina reads were quality and adapter trimmed using Cutadapt (70-fold coverage). Hybrid assembly was performed using Unicycler version 0.4.7. Besides three contigs representing the two chromosomes and the Ri plasmid a fourth contig of 5386 bp was identified. This represented the bacteriophage ΦX174 genome sequence, which is spiked-in at low concentration during Illumina library preparation. This contig was therefore removed from the assembly. The assembly was annotated using NCBI Prokaryotic Genome Annotation Pipeline (PGAP). In addition, PHASTER was used to annotate prophage sequences (Arndt et al. 2016). For the functional characterization of the encoded proteins eggNOG-Mapper was employed (Huerta-Cepas et al. 2017). Insertion elements (IS elements) were identified using ISEScan (Xie and Tang 2017). In figure 2 and supplementary figure S5, only complete insertion sequences, i.e. including inverted repeats, are shown. IslandViewer was used to predict genomic islands (Dhillon et al. 2015) and CGView was used to generate a circular map of pRi1855 (Stothard and Wishart 2005). Mauve (progressiveMauve) was used to align the LBA9402 and K84 genomes (Darling et al. 2010). BRIG was used to compare pRi1855 with other Ri and Ti plasmids (with BLASTN, e-value cut-off 1e-10) and to visualize the hits in concentric rings (Alikhan et al. 2011). For the comparisons between erythritol catabolism regions and between pRi-1855 and Rhizobium lusitanum strain 629, BLASTn was run locally with BLAST version 2.9.0+. Protein alignments were performed with MAFFT version 7.471, L-INS-I method (Katoh 2013) and visualized with Jalview version 2.11.1.2 (Waterhouse et al. 2009) and Adobe Illustrator. Percentage identities as shown in Table 1 were calculated with the R package seqinr.
2.4 Data availability
The complete genome sequence of R. rhizogenes LBA9402 was deposited in GenBank under accession numbers CP044122, CP044123 and CP044124. The raw reads are deposited in the Sequence Read Archive under accessions numbers SRR10177303 and SRR10177304.
Results