Genetic variant classification
All genetic variants in the 10 candidate genes reported in the gnomAD database (v.2.1.1) were classified into five categories, benign, likely benign, uncertain significance, likely pathogenic, and pathogenic, following the 2015 American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) standards and guidelines (Richards et al., 2015). Loss-of-function variants of these 10 candidate genes were presumed to be responsible for a congenital hypothyroidism mechanism. The GRCh37/hg19 genomic build was used for all position descriptions. All variants were described according to HGVS variant nomenclature standards (http://varnomen.hgvs.org/) (den Dunnen et al., 2016) and analyzed based on the transcript selected by matched annotation from NCBI and EMBL-EBI (MANE) (https://www.ncbi.nlm.nih.gov/refseq/MANE/) using a Mutalyzer program (https://mutalyzer.nl/). If the genetic variants were not known pathogenic or likely pathogenic variants (PLPV), the null variants (stop-gain, splice site disrupting, or frameshift variants) with flags of low-confidence predicted loss-of-function (pLoF) or pLof flag by loss-of-function transcript effect estimator (LOFTEE, https://github.com/konradjk/loftee) were filtered. For the prediction of variant pathogenicity, multiple in silico software such as REVEL (https://sites.google.com/site/revelgenomics/) (Ghosh, Oak, & Plon, 2017; Ioannidis et al., 2016), GERP++ (http://mendel.stanford.edu/SidowLab/downloads/gerp/) (Davydov et al., 2010), or dbscSNV (https://sites.google.com/site/jpopgen/dbNSFP) (Ghosh et al., 2017; Jian, Boerwinkle, & Liu, 2014) were used. In addition, the ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) for locus specific databases and the Pfam (https://pfam.xfam.org/) and the InterPro (https://www.ebi.ac.uk/interpro/) for functional domain databases were used.
Carrier frequency (CF) and predicted genetic prevalence analysis (pGP)
For CF and pGP analysis, only heterozygous PLPV (not homozygous PLPV) was considered (Hanany et al., 2018; Hanany, Rivolta, & Sharon, 2020). Therefore, the allele frequency of heterozygous PLPV (AFV) and CFV for a variant V were calculated as follows:
\(AF_{V}\) =\(\frac{allele\ count-2*\text{homozygous}\text{\ count}}{\text{allele}\text{\ number}}\)
\(CF_{V}\) =\(\frac{AF_{\text{V\ }}*\ allele\ number\ }{\text{Number\ of\ individuals}}\)= \(2AF_{V}\)
Where the allele count (number of variant alleles), allele number (number of genotyped alleles = 2 * number of individuals) and homozygous count (number of homozygous individuals) for a variant was provided by gnomAD.
For the CF and pGP in a gene level (CFG and pGPG, respectively), two methods were applied. The first method (method 1) followed CFG and pGPGcalculations as previously described (Hanany et al., 2020) as follows:
\(CF_{G}\) = \(1-\prod_{k=1}^{n}{(1-CF_{V})}\)
\(\text{pGP}_{G}\) =\(\frac{\sum_{k=1}^{n}{{(CF_{V})}_{\text{ik}}{(CF_{V})}_{\text{ik}}}}{4}\)
The second method (method 2) was based on the Hardy–Weinberg equation.
The CFG was calculated as follows:
CFG = \(\sum_{k=1}^{n}{\text{CF}_{V}(k)}\)
Using the Hardy–Weinberg equation (p2 + 2pq + q2 =1), the predicted genetic prevalence of congenital hypothyroidism (q2) as pGPG was calculated as follows:
\begin{equation} CF_{G}=2pq=2\left(1-q\right)q\nonumber \\ \end{equation}\begin{equation} q=\frac{-(-2)-\sqrt{{(-2)}^{2}-4(2*CF_{G})}}{2*2}\nonumber \\ \end{equation}\begin{equation} q^{2}=\ \text{pGP}_{G}=\left(\frac{2-\sqrt{4-(8*CF_{G})}}{4}\right)^{2}\nonumber \\ \end{equation}
Comparison analysis between method 1 and method 2 was performed using linear regression (IBM SPSS statistics version 25, Chicago, IL, USA).
The total pGP, in terms of the proportion of individuals in each ethnic group predicted to be affected by the PLPV in all candidate 10 genes, was calculated as the sum of each pGPG.