2.3 | Variant filtering steps and data analysis
Variants were filtered and prioritized using a four-step strategy to generate a short candidate variant list for experimental validation (Figure 1). Initially, we removed variants with less than 10× coverage. Next, variants were limited to those with low population frequency. The minor allele frequency (MAF) threshold was carefully chosen and variants with an MAF ≥1% in the Genome Aggregation Database (gnomAD) (http://gnomad.broadinstitute.org/) or the Korean Reference Genome Database (KRGDB) (http://coda.nih.go.kr/coda/KRGDB/index.jsp) were removed. The third step was to prioritize variants causing missense, nonsense, frameshifts, and in-frame insertions/deletions variants, or changes affecting consensus splice site sequences. Finally, we performed a gene-specific analysis with an in-silico gene panel composed of 903 genes, filtering for selected phenotype traits Human Phenotype Ontology (HPO)-terms for Microcephaly (HP:0000252) or Online Mendelian Inheritance in Man (OMIM) microcephaly phenotype genes (Supplementary Table 1). To delineate candidate genetic variants, an additional allele analysis was performed under the following conditions: 1) triplicate data with no pathogenic variant (PV)s nor likely pathogenic variant (LPV)s, 2) de novo , compound heterozygous, homozygous, or hemizygous variants, 3) ≤2 alleles in gnomAD or ≤8 alleles if recessive, 4) a CADD score of ~15 or higher and all deleterious predictions in SIFT (http://sift.jcvi.org), PolyPhen2 (http://genetics.bwh.harvard.edu/pph2), and MutationTaster (http://mutationtaster.org/) if missense variants, 5) affected genes with data from animal models and/or functional studies suggesting neurodevelopmental roles.