Bioinformatics Analysis
Base call (BCL) files were converted into FASTQ files with the Illumina
bcl2fastq2 Conversion Software (Illumina, San Diego, CA). Programs that
are included in the SSV-Conta package
(https://github.com/emlec/SSV-Conta)
were then used to quantify and characterize all DNA species that are
present in a rAAV vector lot: Quade, a FASTQ files demultiplexer,
Sekator, an adapter trimmer, RefMasker to mask sequence homologies and
ContaVect to analyze residual DNAs (Lecomte et al., 2019). Briefly,
FASTQ files were demultiplexed with Quade according to their barcodes.
The paired-end reads were assigned to a sample when the combination of
the two barcodes (index read 1 and index read 2) was correct and if each
base of the barcodes had a PHRED quality score of at least 25. Passed
paired-end reads were trimmed using Sekator, according to the sequence
quality and removing the adapter, as described in Lecomte et al (Lecomte
et al., 2019). The distribution of residual DNA was determined using
RefMasker and ContaVect programs. The reference sequences were indicated
in the ContaVect configuration files in the following order: the phage
φX174 genome (GenBank accession number J02482.1), the phage λ genome
(J02459.1), the rAAV genome, the plasmid backbone sequence, the plasmid
helper sequence, the adenovirus 5 (Ad5) sequence (nucleotides 1 to 4344
of the Human adenovirus 5 complete genome, AC_000008) and the human
genome (GRCh38 primary assembly). Using RefMasker, homologies between
two reference sequences were masked on the second reference sequence in
the list order, replacing homologous nucleotides with an N base symbol.
ContaVect was run, applying the following main parameters: minimum mean
read quality, 30; minimum quality mapping for read validation, 20;
minimum mapping size, 25 bases. Unmapped and mapped reads that did not
fulfill these criteria were excluded. Sequencing coverage along each
base of the vector plasmid was generated using the program SSV-Coverage,
a program included in the SSV-Conta package. Sequencing data have been
deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under the
accession number PRJEB38306
(https://www.ebi.ac.uk/ena/data/view/PRJEB38306).