Ordinal Logistic Regression

To establish the probability that any given AKAP11 variant has a statistically significant role in the diagnosis of BPD or SCZ, we run a logistic regression on every variant entered in the SCHEMA \citep{j2020} and BipEx \citep{d2021} browsers for SCZ and bipolar I, respectively.  Based on the consequence terms of each variant defined by Sequence Ontology \citep{Cunningham2015}, every entry is classified into 4 classifications,  \(j\), ranging from statistically unlikely to cause diagnosis to statistically likely to be pathogenic.  Any variant without sufficient genome annotation was removed from the analysis (i.e., lack of CADD scores or protein sequence identification), resulting in a variant population of n = 745 and n = 380 for SCZ and BPD, respectively.
With categorized variants, we perform ordinal logistic regression \citep{e2018} where the probability of a variant being placed into a certain class is assessed via a given set of independent variables.  This was performed with the XLSTAT Version 2023.1.2 software package \citep{lumivero2023}.  Five pertinent values were chosen as independent variables to collectively conclude how likely each variant is to be placed in each class: (i) the CADD functional annotation \citep{Rentzsch2019}, allele number for the (ii) control and (iii) case group, and allele frequency for the (iv) control and (v) case group.  50 iterations were performed to increase the precision of the regression coefficients.  Using XLSTAT automatic weightings for each independent variable, we find that there is little significant influence that (ii) through (v) has on the classification of each variant, and thus rely primarily on the CADD functional annotation as a means of determining the probability that any given variant is placed in some class.
The ordinal logistic regression model is a form of logistic regression wherein the probability of a categorical event occuring is assessed via a given set of independent variables.  Within the context of this study, we take the above listed independent variables (with the primary weighting given to the CADD annotation) to find the probability that a given variant is classified under some \(j\).