Sequence data analysis
After removal of terminal adaptor sequences and low-quality data, reads
were mapped to the reference human genome (hg19) and aligned using BWA
(0.7.12-r1039). MuTect2 (Cibulskis et al., 2013) (3.4–46-gbc02625) was
employed to call somatic small insertions and deletions (InDels) and
single nucleotide variants (SNVs). Mutations were considered as a
candidate somatic mutation only when (i) the mutation had at least five
high-quality reads (Phred score ≥30, mapping quality ≥30, and without
paired-end reads bias) containing the particular base; (ii) the mutation
was not presented in >1% of population in the 1,000
Genomes Project or dbSNP databases (The Single Nucleotide Polymorphism
Database); and (iii) the mutation was not present in an on-house
database of normal samples. For somatic tumor mutations, a mutant allele
must be present in ≥3% of reads. Somatic non-synonymous mutations per
megabase of the panel region were used in tumor mutation burden (TMB)
analysis.
Contra (Li et al., 2012) (2.0.8) was used to detect copy number
variations and LOH HLA algorithm was used to identify LOH based on
informative SNPs. For structure variations (SV), baits were designed to
capture selected exons and introns of RET, ALK, ROS1, and NTRK1
oncogenes based on previously reported SVs in these genes and an
in-house algorithm was used to identified split-read and discordant
read-pair to identify SVs. All final candidate variants were manually
verified with the integrative genomics viewer browser.