Genetic variant classification
All genetic variants in the 10
candidate genes reported in the gnomAD database (v.2.1.1) were
classified into five categories, benign, likely benign, uncertain
significance, likely pathogenic, and pathogenic, following the 2015
American College of Medical Genetics and Genomics (ACMG)/Association for
Molecular Pathology (AMP) standards and guidelines (Richards et al.,
2015). Loss-of-function variants of these 10 candidate genes were
presumed to be responsible for a congenital hypothyroidism mechanism.
The GRCh37/hg19 genomic build was used for all position descriptions.
All variants were described according to HGVS variant nomenclature
standards
(http://varnomen.hgvs.org/) (den
Dunnen et al., 2016) and analyzed based on the transcript selected by
matched annotation from NCBI and EMBL-EBI
(MANE)
(https://www.ncbi.nlm.nih.gov/refseq/MANE/) using a Mutalyzer program
(https://mutalyzer.nl/). If the
genetic variants were not known pathogenic or likely pathogenic variants
(PLPV), the null variants (stop-gain, splice site disrupting, or
frameshift variants) with flags of low-confidence predicted
loss-of-function (pLoF) or pLof flag by loss-of-function transcript
effect estimator (LOFTEE,
https://github.com/konradjk/loftee) were filtered. For the prediction of
variant pathogenicity, multiple in silico software such as
REVEL
(https://sites.google.com/site/revelgenomics/) (Ghosh, Oak, & Plon,
2017; Ioannidis et al., 2016),
GERP++
(http://mendel.stanford.edu/SidowLab/downloads/gerp/) (Davydov et al.,
2010), or dbscSNV
(https://sites.google.com/site/jpopgen/dbNSFP) (Ghosh et al., 2017;
Jian, Boerwinkle, & Liu, 2014) were used. In addition, the
ClinVar
(https://www.ncbi.nlm.nih.gov/clinvar/) for locus specific databases and
the Pfam (https://pfam.xfam.org/)
and the InterPro
(https://www.ebi.ac.uk/interpro/) for functional domain databases were
used.
Carrier frequency (CF) and
predicted genetic prevalence analysis (pGP)
For CF and pGP analysis, only heterozygous PLPV (not homozygous PLPV)
was considered (Hanany et al., 2018; Hanany, Rivolta, & Sharon, 2020).
Therefore, the allele frequency of heterozygous PLPV
(AFV) and CFV for a variant V were
calculated as follows:
\(AF_{V}\) =\(\frac{allele\ count-2*\text{homozygous}\text{\ count}}{\text{allele}\text{\ number}}\)
\(CF_{V}\) =\(\frac{AF_{\text{V\ }}*\ allele\ number\ }{\text{Number\ of\ individuals}}\)= \(2AF_{V}\)
Where the allele count (number of variant alleles), allele number
(number of genotyped alleles = 2 * number of individuals) and homozygous
count (number of homozygous individuals) for a variant was provided by
gnomAD.
For the CF and pGP in a gene level (CFG and
pGPG, respectively), two methods were applied. The first
method (method 1) followed CFG and pGPGcalculations as previously described (Hanany et al., 2020) as follows:
\(CF_{G}\) = \(1-\prod_{k=1}^{n}{(1-CF_{V})}\)
\(\text{pGP}_{G}\) =\(\frac{\sum_{k=1}^{n}{{(CF_{V})}_{\text{ik}}{(CF_{V})}_{\text{ik}}}}{4}\)
The second method (method 2) was based on the Hardy–Weinberg equation.
The CFG was calculated as follows:
CFG = \(\sum_{k=1}^{n}{\text{CF}_{V}(k)}\)
Using the Hardy–Weinberg equation (p2 + 2pq +
q2 =1), the predicted genetic prevalence of congenital
hypothyroidism (q2) as pGPG was
calculated as follows:
\begin{equation}
CF_{G}=2pq=2\left(1-q\right)q\nonumber \\
\end{equation}\begin{equation}
q=\frac{-(-2)-\sqrt{{(-2)}^{2}-4(2*CF_{G})}}{2*2}\nonumber \\
\end{equation}\begin{equation}
q^{2}=\ \text{pGP}_{G}=\left(\frac{2-\sqrt{4-(8*CF_{G})}}{4}\right)^{2}\nonumber \\
\end{equation}Comparison analysis between method 1 and method 2 was performed using
linear regression (IBM SPSS statistics version 25, Chicago, IL, USA).
The total pGP, in terms of the proportion of individuals in each ethnic
group predicted to be affected by the PLPV in all candidate 10 genes,
was calculated as the sum of each pGPG.