\documentclass[10pt]{article}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}
\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
linkcolor = blue,
urlcolor = blue,
citecolor = blue,
anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
\usepackage{natbib}
\renewenvironment{abstract}
{{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
{\bigskip}
\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}
\usepackage{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\iflatexml
\documentclass[twocolumn]{aastex6}
\fi
\pdfoutput=1
\newcommand\aastex[][]{AAS\TeX}
\newcommand\latex[][]{La\TeX}
\newcommand{\textbfit}[1][]{\textbf{\textit{#1}}}
\newcommand{\mathbfit}[1][]{\textbf{\textit{#1}}}
\newcommand{\textbfss}[1][]{\textbf{\textsf{#1}}}
\newcommand{\mathbfss}[1][]{\textbf{\textsf{#1}}}
\newcommand{\cigma}[][]{\varsigma}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
% \usepackage{wasysym} % Not (yet) supported by LaTeXML.
% \usepackage[percent]{overpic} % Not (yet) supported by LaTeXML.
\makeatletter
\def\env@matrix{\hskip -\arraycolsep % taken from amsmath.sty lines 895ff
\let\@ifnextchar\new@ifnextchar
\array{*{\c@MaxMatrixCols}c}}
\makeatother
\newcommand{\vdag}[][]{(v)^\dagger}
\newcommand{\cz}[][]{\mathrm{c}}
\newcommand{\sz}[][]{\mathrm{s}}
\newcommand{\tz}[][]{\mathrm{t}}
\newcommand{\myemail}[][]{zephyrpenoyre@gmail.com}
\usepackage{graphicx}
% Including figure files
\usepackage{amsmath}
% Advanced maths commands
\usepackage{amssymb}
% Extra maths symbols
\usepackage{bm}
% Bold maths symbols, including upright Greek
\usepackage{hyperref}
\hypersetup{
colorlinks=true,
citecolor=black,
urlcolor=cyan,
}
\slugcomment{Draft version}
\shorttitle{The Split Normal}
\shortauthors{Z. Penoyre}
\begin{document}
\title{Direct parameter finding of the split normal distribution}
\author[1]{Zephyr Penoyre}%
\affil[1]{Affiliation not available}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\selectlanguage{english}
\begin{abstract}
A common standard for reporting results in astrophysics (and likely much of science) is flawed. When reporting a distribution we often assume it is distributed as a Gaussian and report the position of the $16^{th}, 50^{th}$ and $84^{th}$ percentile of the data. This is a useful \textit{summary statistic} - it tells us concrete information about the general form of the distribution. However, it can be misleading if read as a \textit{result} - something that gives a representative reporting of the distribution.
If a distribution is asymmetric then the three stated percentiles have no direct relevance to the data, and it is not possible to recreate an approximation to the distribution from them. This is of particular concern when one author quotes \textit{results} from another (particularly if they are actually repurposing \textit{summary statistics}) to use for meta-data analysis or as priors for their own work.
Asymmetric extensions of the Gaussian normal exist, but are rarely used, because their parameters need to be estimated via costly methods (whilst computing the mean or the median is a direct and efficient calculation). In this paper I show a method for calculating the 3 parameters that define the \textit{split normal} directly and robustly.
I suggest that this could become the new minimal standard for \textit{pragmatic} result reporting - encoding a more representative approximate distribution without any extra overhead and allowing authors to be more definite in separating \textit{summary statistics} and the reporting of usable \textit{results}.%
\end{abstract}%
\sloppy
\section{Introduction}
Whilst it has been rediscovered many times\footnote{Including by myself} \hyperref[csl:1]{({Wallis}, 2014)} the simple extension of the Gauss's normal distribution - the \textit{split normal} function - is poorly known and rarely used.
It has a clear merit - it is a simple extension of the normal distribution to a more general class of distributions. It encodes the entire distribution in just three parameters (assuming it is normalised) compared to two for the normal distribution.
Until now it has been believed the only way to estimate these parameters is through expensive methods such as Maximum Likelihood Estimation \hyperref[csl:2]{(John, 1982)} or Bayesian methods \hyperref[csl:3]{(Villani \& Larsson, 2006)}.
In this short letter we present a method for direct parameter finding of the parameters of a split normal distribution, which can easily and efficiently be applied to any 1D data set. It is comparatively quick and convenient with finding the mean or median of a data set, is a robust statistic and encodes significantly more information via a single extra parameter.
In short I suggest it could and perhaps should become the new standard of estimating and reporting the parameters of many 1D data distributions.
\section{The split normal}
The split normal follows the form:
\begin{equation}
\begin{aligned}
f(x) \propto \ &e^\frac{{-\left(x-\mu\right)^2}}{2 \sigma^2} \ \mathrm{for} \ x<\mu \\
\propto \ &e^\frac{{-\left(x-\mu\right)^2}}{2 \varsigma^2} \ \mathrm{for} \ x>\mu.
\end{aligned}
\end{equation}
It peaks at $x=\mu$ (the mode of the dataset, rather than the mean) and is continuous and smooth at that point. It falls away from that point as two half normal distributions with individual variances $\sigma$ and $\varsigma$\footnote{This is a lovely alternative Greek letter I've just discovered, a specific lowercase $\Sigma$ for use at the end of words. You can reproduce it in \LaTeX \ using \textbackslash varsigma. I call him ``cigma".} to the left and right of the mode respectively.
For clarity we define the cumulative distribution function (CDF),
\begin{equation}
F(x) = \int_{-\infty}^{x} f(x') dx',
\end{equation}
which goes to unity as $x \rightarrow \infty$.
Letting $\epsilon = \frac{\varsigma}{\sigma}$ we can normalise to give the split normal function:
\begin{equation}
\label{function}
f(x) \ = \frac{1}{\sqrt{2 \pi \sigma^2}} \frac{2}{1+\epsilon} \ e^\frac{{-\left(x-\mu\right)^2}}{2 \alpha^2 \sigma^2}, \ \
\alpha=
\begin{cases}
1& x<\mu \\
\epsilon& x>\mu
\end{cases}
\end{equation}
and the cumulative distribution function:
\begin{equation}
\label{Function}
F(x) \ = \frac{1}{1+\epsilon} \left[ 1 + \alpha E\left( \frac{x-\mu}{\sqrt{2}\alpha\sigma}\right)\right],
\end{equation}
where $E(x)$ is the error function\footnote{Weisstein, Eric W. ``Erf." From MathWorld--A Wolfram Web Resource. \url{http://mathworld.wolfram.com/Erf.html}}, the integral of the standard normal function.
When $\varsigma = \sigma$ and $\epsilon = 1$ these results revert to the normal \textit{normal} distribution.
Using $E(0) = 0$ we can see that $F(\mu) = \frac{1}{1+\epsilon}$, thus the ratio of the cumulative distribution above and below $x=\mu$ is $\epsilon:1$ respectively. We see that $\mu$ is not pinned to the center of the distribution. As $\epsilon \rightarrow 0$ almost all the distribution lies below $x=\mu$ and vice-versa as $\epsilon \rightarrow \infty$.
The error function is odd, satisfying $E(-x) = - E(x)$ and thus we can find the cumulative distribution lying between the two ``standard deviations":
\begin{equation}
F(\mu + \varsigma) - F(\mu - \sigma) = E\left(\frac{1}{\sqrt{2}}\right) \approx 0.68.
\label{sixtyEight}
\end{equation}
This is the same fraction of the distribution contained within one standard deviation from the mean of the normal \textit{normal} distribution. The fact that this does not depend on $\mu$, $\sigma$ or $\varsigma$ is crucial for the direct parameter finding when applied to real data, as we discuss in the next section.
\section{Parameter finding}
Suppose we have $N$ data points, where each point has an index $i$ and position $x_i$, with $i$ sorted in ascending $x_i$ (such that $x_0$ is the lowest value of $x$ and $x_{N-1}$ the highest). Also let
\begin{equation}
\Delta = int\left[ N \ E\left(\frac{1}{\sqrt{2}}\right) \right]
\end{equation}
(for large $N$ it is unimportant whether we round up or down).
The first step is to find the $x_i$s corresponding to $\mu-\sigma$ and $\mu+\varsigma$:
\begin{enumerate}
\item Take the subset of points with indices $j$ ranging from $i=0$ to $i=N-\Delta$
\item For each $j$ calculate $w_j = x_{j+\Delta} - x_{j}$, the width of the interval starting at $x_j$ that contains $\sim 68 \%$ of the data.
\item Let $J$ be the value of $j$ for which $w_j$ is minimized
\item From this we know that $\mu-\sigma = x_J$, $\mu + \varsigma = x_{J+\Delta}$ and $\sigma + \varsigma = (1+\epsilon) \sigma = w_J$.
\end{enumerate}
To see why this works, imagine that for a split normal we had slightly overestimated $\mu - \sigma$. Now to include $\Delta$ data points we have to extend over a span slightly larger than $\sigma + \varsigma$. Similarly if we underestimated $\mu - \sigma$ we would also get a larger span. Thus the minimum span which includes $\Delta$ data points corresponds to $\sigma + \varsigma$.
There is still an important quantity we need to find, $\mu$, which will allow us to explicitly find $\sigma$ and $\varsigma$. We will make use of the result $F(\mu) = \frac{1}{1+\epsilon}$ and perform following steps:
\begin{enumerate}
\setcounter{enumi}{4}
\item Take the subset of points with indices $k$ ranging from $k = J$ to $k = J + \Delta$ (i.e. every point within the span $w_J$).
\item For each point we can ask how well the distribution would be fit if $x_k = \mu$ giving a corresponding $\epsilon_k$. Using the relation in step 4 we find $\frac{1}{1+\epsilon_k} =\frac{x_k-x_J}{w_J}$.
\item The fraction of points with $x\frac{1}{1+\epsilon}
\end{cases}
\end{equation}
where $E^{-1}(F)$ is the inverse of the error function\footnote{Weisstein, Eric W. "Inverse Erf." From MathWorld--A Wolfram Web Resource. \url{http://mathworld.wolfram.com/InverseErf.html}}.
\section{Appendix B - Truncating the split normal}
\label{finite}
Often we are dealing with data with some finite maximum and minimum value, and this can lead to a sharp truncation of the distribution function.
In the most basic extension of the split normal to a truncated distribution we can cut $f(x)$ off at some lower (upper) bound at $x=A$ ($=B$), which does not change the functional form of the distribution but does alter the normalisation.
Denoting the untruncated distribution as $f_0(x)$, with CDF $F_0(x)$ we can see that the truncated distribution follows
\begin{equation}
f(x) =
\begin{cases}
\kappa f_0(x)& AB
\end{cases}
\end{equation}
where $\kappa$ is a constant that fixes the normalisation such that the new CDF ($F(x) = \kappa F_0(x)$) goes to 1 as x goes to $\infty$.
Thus
\begin{equation}
\kappa = \frac{1}{F_0(B) - F_0(A)} = \frac{1+\epsilon}{\alpha_A E\left(\frac{\mu-A}{\sqrt{2}\alpha_A\sigma} \right) + \alpha_B E\left(\frac{B-\mu}{\sqrt{2}\alpha_B\sigma} \right)}, \ \
\alpha_{A,B} =
\begin{cases}
1& A,B<\mu\\
\epsilon& A,B>\mu
\end{cases}
\end{equation}
where $A$ and $B$ can straddle the mode ($\mu$) or both sit on one side of it. This can encode a wide range of distributions, including a flat or constant gradient, or a sharply decaying functions. If both are on the same side of $x=\mu$ then $f(x)$ will only depend on one width parameter, $\sigma$ or $\varsigma$ and the solution is identical to a truncated normal distribution.
It is worth noting that for distributions which peak close to a finite edge ($\mu \approx A$ or $\approx B$) we need not apply a truncation on this side, as a small $\sigma$ or $\varsigma$ will give a sharply decaying function regardless of boundaries.
Unfortunately, even for known $A$ and $B$ this complicates the functional form of $f(x)$ and $F(x)$ sufficiently to break the direct method for deriving approximate parameters. More complex methods are thus necessary for fitting a truncated split normal to data.
\section{Appendix C - Translating to other distributions}
\label{translate}
If we know the parameters of a certain split-normal distribution ($\mu, \sigma, \varsigma$ and $\epsilon=\frac{\varsigma}{\sigma}$), it is relatively simple to derive other metrics of the average and spread of the data.
Taking moments of equation \ref{function} we can find the mean:
\begin{equation}
\bar{x} = \mu + \sqrt{\frac{2}{\pi}} \sigma (\epsilon - 1),
\end{equation}
the variance:
\begin{equation}
Var(x) = \sigma^2 \left[ \epsilon + \left(1-\frac{2}{\pi}\right)(\epsilon - 1)^2 \right],
\end{equation}
and the skewness:
\begin{equation}
\gamma = \sqrt{\frac{2}{\pi}} (\epsilon - 1) \sigma^3 \left[ \left(\frac{4}{\pi} - 1\right)(\epsilon - 1)^2 + \epsilon \right].
\end{equation}
See \hyperref[csl:3]{(Villani \& Larsson, 2006)} for fuller derivation and higher moments.
Similarly, using equation \ref{noitcnuF} we can find the median (corresponding to F=0.5):
\begin{equation}
\label{median}
x_{0.5} = \mu + \sqrt{2} \alpha \sigma E^{-1}\left( \frac{\epsilon -1}{2 \alpha} \right), \ \ \alpha =
\begin{cases}
1& \epsilon<1\\
\epsilon& \epsilon>1
\end{cases}
\end{equation}
where $E^{-1}(F)$ is the inverse error function.
We can find any percentile of the data this way, by substituting different values of $F$ into equation \ref{noitcnuF} (e.g. $F=0.25$ and $0.75$ to find the quartiles and interquartile-range).
Let us finish by finding $\sigma_\pm$, the errors often in the scientific literature found by comparing the $16^{th}$ and $84^{th}$ percentile of the data to the mean\footnote{Some authors will also use $\sigma_\pm$ to denote the edges of some confidence interval, so keep a watchful eye out.}.
\begin{equation}
\sigma_\pm = \sqrt{2} \alpha \sigma \left[\pm E^{-1}\left(\frac{\epsilon-1}{2\alpha} \pm \frac{1+\epsilon}{2\alpha}E\left(\frac{1}{\sqrt{2}}\right) \right) \mp E^{-1}\left(\frac{\epsilon - 1}{2\alpha}\right) \right]
\end{equation}
where the conditions on $\alpha$ are alike those from equation \ref{median} and depend on which side of $\mu$ the $16^{th}$ and $84^{th}$ percentile of the data fall.
\subsection{Translating from other distributions}
\label{other}
Whilst it is possible to find the three parameters describing the split normal from the mean, variance and skewness, the equations are a long mess and require choosing one of a family of roots.
It is currently sitting in a notebook of mine, but perhaps it will find it's way into this document eventually.
As far as I can see recovering the split normal parameters from the median and variation thereof is a non-trivial operation and may have to be numerically solved.
\selectlanguage{english}
\FloatBarrier
\section*{References}\sloppy
\phantomsection
\label{csl:1}{The Two-Piece Normal, Binormal, or Double Gaussian Distribution: Its Origin and Rediscoveries}. (2014). \textit{ArXiv e-Prints}, arXiv:1405.4995.
\phantomsection
\label{csl:2}{The three-parameter two-piece normal family of distributions and its fitting}. (1982). \textit{Communications in Statistics - Theory and Methods}, \textit{11}(8), 879–885. \url{https://doi.org/10.1080/03610928208828279}
\phantomsection
\label{csl:3}{The Multivariate Split Normal Distribution and Asymmetric Principal Components Analysis}. (2006). \textit{Communications in Statistics - Theory and Methods}, \textit{35}(6), 1123–1140. \url{https://doi.org/10.1080/03610920600672252}
\end{document}