\documentclass[10pt]{article}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}
\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
linkcolor = blue,
urlcolor = blue,
citecolor = blue,
anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
\usepackage{natbib}
\renewenvironment{abstract}
{{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
{\bigskip}
\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}
\usepackage{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[ngerman,greek,english]{babel}
\usepackage{float}
\begin{document}
\title{Fitting Elephants in the Density Functionals Zoo: Statistical Criteria
for the Evaluation of DFT methods as a Suitable Replacement for Counting
Parameters}
\author[1]{Roberto Peverati}%
\affil[1]{IJQC Special Issue}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\selectlanguage{english}
\begin{abstract}
Counting parameters has become customary in the density functional
theory community as a way to infer the transferability of popular
approximations to the exchange--correlation functionals. Recent work in
data science, however, has demonstrated that the number of parameters of
a fitted model is not related to the complexity of the model itself, nor
to its eventual overfitting. Using similar arguments, we show here that
it is possible to represent every modern exchange--correlation
functional approximation using just one single parameter. This procedure
proves the futility of the number of parameters as a measure of
transferability. To counteract this shortcoming, we introduce and
analyze the performance of three statistical criteria for the evaluation
of the transferability of exchange--correlation functionals. The three
criteria are called Akaike information criterion
(AIC),~Vapnik--Chervonenkis criterion (VCC), and cross-validation
criterion (CVC) and are used in a preliminary assessment to rank 60
exchange--correlation functional approximations using the ASCDB database
of chemical data.%
\end{abstract}%
\sloppy
\section*{1. Introduction}
{\label{440252}}
The success of density functional theory (DFT) as the method of choice
for the calculation of the electronic structure of molecules is
undeniable, and is intertwined with the development of improved
approximations for the description of the exchange--correlation
functional (\emph{xc} functional, or just simply functional)
\textsuperscript{\hyperref[csl:1]{[1]}}.~ Such success is reflected by an indiscriminate
proliferation of approximations, calling for a rugged safari across the
``zoo of functionals''~\textsuperscript{\hyperref[csl:2]{[2]},\hyperref[csl:3]{[3]},\hyperref[csl:4]{[4]}} to select an appropriate
one~\textsuperscript{\hyperref[csl:5]{[5]}}.~
Two philosophies are at odds in the world of functional development: the
first one originated mainly within the chemistry community from the
pioneering work of Becke~\textsuperscript{\hyperref[csl:6]{[6]},\hyperref[csl:7]{[7]}}, which took the approach of
using flexible parametrized mathematical forms that are fitted to
chemical data, exact constraints, or a mix of both. The second
philosophy originated primarily within the physics community from the
ground-breaking work of Perdew~\textsuperscript{\hyperref[csl:8]{[8]},\hyperref[csl:9]{[9]},\hyperref[csl:10]{[10]}}, which expanded the
knowledge and application of exact conditions and advocated for DFT to
remain a purely~\emph{ab initio} method. These two philosophies have
been largely constructive with each other, sharing ideas, providing
criticisms, and validating results~\textsuperscript{\hyperref[csl:1]{[1]},\hyperref[csl:11]{[11]},\hyperref[csl:12]{[12]},\hyperref[csl:3]{[3]}}. A frequent
question used to navigate the~zoo of density functionals---perhaps
guided by the famous John von Neumann's quote: ``with four parameters I
can fit an elephant, and with five I can make him wiggle his
trunk''~\textsuperscript{\hyperref[csl:13]{[13]}}---is: ``how many parameters does this
functional have?''. This question, in fact, underlies the more
fundamental assumption that the number of parameters is a reliable
criterion to evaluate the transferability of the results---but is it
really? As pointed out in several occasions~\textsuperscript{\hyperref[csl:14]{[14]},\hyperref[csl:15]{[15]},\hyperref[csl:11]{[11]},\hyperref[csl:16]{[16]}}, counting
the number of parameters is not always as straightforward as it might
initially appear, especially for functionals that are not directly
fitted to data. In fact, there is no such thing as a truly
parameter-free or ``zero-parameter''~\emph{xc} functional approximation,
since even functionals that are usually considered as such
have~mathematical forms that contain parameters that are then determined
based on theoretical arguments. Since the true functional is still
unknown, and potentially unknowable,\textsuperscript{\hyperref[csl:17]{[17]}} it seems clear
that every~\emph{xc} functional approximation must contain an empirical
element~\textsuperscript{\hyperref[csl:11]{[11]}}.
Instead of counting fitted parameters in ``parametrized functionals''
and compare them to~hidden parameters in ``zero-parameter'' functionals,
the first portion of this article explores the somehow opposite scenario
where every functional---regardless of its development philosophy---is
represented using a simple function containing one single parameter, as
presented in section~{\ref{816226}}.~ This new
representation is a direct adaptation of the recent works of
Piantadosi~\textsuperscript{\hyperref[csl:18]{[18]}} and Bou\selectlanguage{ngerman}é~\textsuperscript{\hyperref[csl:19]{[19]}}, where any
distribution of points in any dimension is represented by a well-behaved
scalar function with a single real-valued parameter. In other words,
quoting Piantadosi's paper title: ``One parameter is always enough'',
even for~\emph{xc} functionals. The result of this procedure is that
every single functional on the first three rungs of Perdew and Schmidt's
``Jacob's ladder''~\textsuperscript{\hyperref[csl:20]{[20]}} (corresponding to LDA, GGA, and
meta-GGA approximations) can be represented by just one single
parameter. Famous ``zero-parameter'' functionals, such as
PBE~\textsuperscript{\hyperref[csl:21]{[21]}} and SCAN~\textsuperscript{\hyperref[csl:22]{[22]}}, as well as popular
``parametrized functionals'', such as the Minnesota
family~\textsuperscript{\hyperref[csl:23]{[23]},\hyperref[csl:24]{[24]},\hyperref[csl:25]{[25]},\hyperref[csl:26]{[26]},\hyperref[csl:27]{[27]},\hyperref[csl:28]{[28]},\hyperref[csl:29]{[29]},\hyperref[csl:30]{[30]},\hyperref[csl:31]{[31]},\hyperref[csl:32]{[32]},\hyperref[csl:33]{[33]},\hyperref[csl:34]{[34]}}, are all defined by one number.~
Having proven the inadequacy of the ``number of parameters'' as a
measure of transferability of~\emph{xc} functionals, the focus of this
article shifts to develop a set of statistical~criteria that can be
appropriately used for this task. Since the exact functional is still
unknown, these criteria must rely on statistical analysis of data across
as many different chemical and physical properties as possible. Luckily,
several benchmark results with hundreds of functionals are already
available in the literature~\textsuperscript{\hyperref[csl:35]{[35]},\hyperref[csl:36]{[36]},\hyperref[csl:2]{[2]},\hyperref[csl:11]{[11]},\hyperref[csl:12]{[12]},\hyperref[csl:37]{[37]},\hyperref[csl:38]{[38]}}, but their analysis is
not unequivocal, and might even produce contrasting recommendations.
This is because the large number of data in these studies can be in
principle sliced and grouped into any number of~\emph{ad hoc} subsets,
that can then be used to statistically validate pretty much any
hypothesis. Recent work from the Author's lab has introduced a new
unbiased subdivision of some of the most popular DFT databases generated
without human intervention by means of data-science
algorithms~\textsuperscript{\hyperref[csl:39]{[39]}}. Interestingly enough, concepts that can
be derived using simple chemical intuition have been also recovered
by~\emph{a posteriori} analysis of the machine-generated groups. This
reassuring fact validates the chemical-intuition--based approach that
was used by DFT developers to group and analyze the data, but the
data-science approach offer several other advantages nonetheless. One of
this advantages is demonstrated in
Section~{\ref{898228}}, where~ the unbiased subsets are
used as the basis for three new statistical criteria obtained adapting
the Akaike information criterion (AIC), the Vapnik--Chervonenkis
criterion (VCC), and a new cross-validation criterion (CVC) to the DFT
results. Preliminary rankings of 60 popular~\emph{xc} functionals are
also presented and briefly discussed.
\section*{2. Fitting elephants: One-parameter fit of
exchange--correlation
functionals.}
{\label{816226}}
This section briefly discusses the application of Piantadosi's encoding
procedure~\textsuperscript{\hyperref[csl:18]{[18]}}~ to describe any local~\emph{xc} functional
with a single real-valued parameter~\(\alpha\in[0,\ 1]\). The simplest
case of a generalized gradient approximation (GGA) exchange functional
is illustrated first, since it just requires a straightforward
mono-dimensional fit. The more complex case of meta-GGA
exchange~functionals ~and GGA exchange--correlation functionals are also
presented next.~Jupyter notebooks with the code developed for each of
these cases accompany the electronic version of this article and are
available for download using the interactive features of this special
issue and on the Author's github page. These programs allow to obtain
single-parameter representations for the majority of the more than
300~\emph{xc} functionals that are included in the LibXC
library~\textsuperscript{\hyperref[csl:40]{[40]},\hyperref[csl:41]{[41]}}.
\subsection*{2.1 GGA exchange
functionals}
{\label{846150}}
The first step to encode a functional into a single parameter using
Piantadosi's procedure is to represent the functional as a series of
points. This task is straightforward for GGA functionals, since they
depend only on two variables: the electron density,~\(\rho\),
and its gradient,~\(\nabla\rho\). Restricting the discussion to the
exchange portion of a general GGA functional, a further simplification
can be introduced by decoupling the two variables. The resulting general
formula for every GGA exchange functional is thus a simple product of
the density-dependent local spin density approximation energy
density,~\(\varepsilon_{x}^{\text{LSDA}}\), and a gradient\selectlanguage{english}-dependent enhancement
factor,~\(F_{x}^{\text{GGA}}(s)\):
\(\begin{equation}E_{x}^{\text{GGA}}=\int\rho\varepsilon_{x}^{\text{LSDA}}\left(\rho\right)F_{x}^{\text{GGA}}\left(s\right)d\mathbf{r}, \end{equation}\)
with the first term simply obtained from the exchange energy density per
particle of the uniform electron gas (UEG):
\(\begin{equation}\varepsilon_{x}^{\text{LSDA}}=-\frac{3}{4}\left(\frac{3}{\pi}\right)^{\frac{1}{3}}\rho^{\frac{1}{3}}, \end{equation}\)
and the second term usually expressed using the dimensionless reduced
variable, \emph{s} :
\(\begin{equation}s=\frac{\left|\nabla\rho\right|}{2{(3\pi^{2})}^{\frac{1}{3}}\rho^{\frac{4}{3}}}. \end{equation}\)
Therefore, the shape of every GGA exchange functional is uniquely
determined by its enhancement factor, which can then be represented as a
set of~\(N\) equidistant points on a grid in the finite
variable~\(u\in\left[0,1\right]\), obtained from~\(s \in \left[0,\inf\right)\) using
Becke's transformation \textsuperscript{\hyperref[csl:7]{[7]}}:
\(\begin{equation}u=\frac{s^{2}}{1+s^{2}}.\end{equation}\)
This numerical representation becomes exact in the limit of infinite
number of points,~\(N\to\infty\). As ~previously
demonstrated~\textsuperscript{\hyperref[csl:42]{[42]}}, a grid of just
simply~\(N=20\) points is practically sufficient to describe
the enhancement factors of most exchange GGA functionals (e.g.
PBE~\textsuperscript{\hyperref[csl:21]{[21]}} and B88 \textsuperscript{\hyperref[csl:43]{[43]}}) with
sub-milliHartrees precision, when used in conjunction with a
well-behaved interpolation between the points---such as a cubic or
univariate spline.~For a handful of more complicated functionals (e.g.
SOGGA11~\textsuperscript{\hyperref[csl:44]{[44]}}) a slightly finer grid of~\(N=100\)
points will suffice to achieve accuracies of
\textasciitilde{}10\textsuperscript{-6} Hartrees.
Once the functional is defined on the grid, the simple sequence of
points~\(x \in \left[0,\text{...},N\right]\) can be represented using Piantadosi's
formula\(\):
\begin{equation}\label{eqn:5} f_{\alpha} \left(x\right) = \sin^{2} \left(2^{\beta x} \arcsin \sqrt{\alpha} \right), \end{equation}
which is uniquely defined by a single parameter~\(\alpha \in \mathbb{R}\), and
a constant~\(\beta \in \mathbb{N}\) that controls the accuracy of the encoding
procedure. It is important to notice that
eq.~{\ref{eqn:5}} ~only reproduces the position of the
points, while the spline interpolation is still required to obtain a
continuous function over the considered interval (an exact fit would
require~\(N\to\infty\), and therefore an infinitely long encoding
parameter). The drawback of this procedure is that
eq.~{\ref{eqn:5}} is extremely sensitive to the value
of the parameter. Hence~\(\alpha\) has to be represented using a
huge number of significant digits. In fact, the entire point of this
exercise is to encode the full complexity of the GGA exchange
enhancement into the length of the single parameter. Such length (i.e.
the number of significant digits required to write~\(\alpha\))
depends on both the number of interpolation points that are used to
represent the functional on the grid, and the accuracy
parameter~\(\beta\). In general,~\(N=20\)
interpolation points and~\(\beta=8\) can be used to represent
simple GGA exchange functionals---such as PBE~\textsuperscript{\hyperref[csl:21]{[21]}}---with
relative errors in the description of the enhancement factor smaller
than 1\%, resulting in parameters that require \textasciitilde{} 60
digits. For functionals that have some oscillation over the entire
interval of~\(u\)---such as
SOGGA11~\textsuperscript{\hyperref[csl:44]{[44]}}---\(N=100\) interpolation points and
a value of~\(\beta=12\) are required for similar accuracies,
resulting in parameters with \textasciitilde{} 350 digits. The single
parameters for both the PBE and SOGGA11 functionals are reported in
Fig.~{\ref{468093}}, together with the corresponding
plots of the enhancement factors,~\(F_{\text{x}}\), as a function
of~\(u\) and~\(s\). A Jupyter notebook with
the details of the encoding procedure---as well as an algorithm to
evaluate the errors for both the spline implementation and the encoding
procedure---is also associated with the Figure and is available on
github.\selectlanguage{english}
\begin{figure}[H]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/GGA-PBE-X/GGA-X}
\caption{{Single parameter representation for the PBE (upper panel) and SOGGA11
(lower panel) GGA exchange functionals as a function of the reduced
density variables~\(u\), eq.~{\ref{eqn:4}}
(left plots) and~\(s\), eq. {\ref{eqn:3}}
(right plots). For both functionals, the black dots are the decoded
points, the orange solid curve is the original enhancement factor as
obtained directly from LibXC, and the dashed red curve is the result of
the decoding of the single parameter and the interpolation via
univariate cubic spline.~Results are obtained with~\(N=100\)~
points and~\(\beta=12\). A Jupyter notebook to encode every GGA
exchange functional in LibXC, as well as to reproduce the plots and to
calculate the encoding and interpolation errors is associated with the
Figure.~
{\label{468093}}%
}}
\end{center}
\end{figure}
\subsection*{2.2 Meta-GGA exchange
functionals}
{\label{981893}}
The next rung in Perdew's Jacob ladder is those of meta-GGA functionals.
Restricting the discussion once again to exchange functionals only, the
enhancement factor for meta-GGA functionals depends only on two
variables, the gradient of the density and the orbital-dependent local
kinetic energy density:
\(\begin{equation} \tau=\frac{1}{2}\sum_{i}\left|\nabla\psi_{i}\right|^{2}. \end{equation}\)
The meta-GGA enhancement factor can be easily represented by points on a
two-dimensional grid using a simple extension of the code used in the
previous case. The steps in this extension include using the popular
transformation of~\(\tau\) into the finite
variable~\(w\in[-1,1]\)~\textsuperscript{\hyperref[csl:45]{[45]}}:
\(\begin{equation}w=\frac{[\frac{3}{10}\left(3\pi^2\right)^{2/3}\rho^{5/3}]\tau^{-1}-1}{[\frac{3}{10}\left(3\pi^2\right)^{2/3}\rho^{5/3}]\tau^{-1}+1},\end{equation}\)
followed by the usage of a grid of~\(N\times N\) equidistant points
on~\(u\) and~\(w\). A two-dimensional spline
(either bicubic or univariate) is then used to interpolate between
points on the considered interval. The implementation of Piantadosi's
encoding procedure is then identical to the previous case, with the only
difference that the series of points are now constructed
as~\(x \in \left[(0,0),\text{...},(0,N),(1,0),\text{...},(1,N)\right]\). Once again, the accuracy of the procedure depends
only on two variables, the number of points used to interpolate the
enhancement factor,~\(N^{2}\), and the accuracy of the encoder
parameter,~\(\beta\). The major hurdle in the procedure is that
the number of digits required to represent the parameter is now much
higher than for the previous case.~Interpolations
with~\(N>20\)~become computationally expensive since they
require \textgreater{} 400 points, and result in parameters with more
than 1500 digits, regardless of the value of~\(\beta\). For
well-behaved functionals, however,~\(N=20\)
and~\(\beta=12\) result in parameters with \textasciitilde{}1500
digits, and overall errors \textless{} 1 \%, similarly to the GGA case.
Single parameters for the exchange enhancement factors of the
SCAN~\textsuperscript{\hyperref[csl:22]{[22]}} and the M11-L~\textsuperscript{\hyperref[csl:30]{[30]}} meta-GGA
functionals are reported in Fig.~{\ref{846623}} as a
three dimensional surface and a corresponding slice
at~\(u=s=0\). A Jupyter notebook with the details of the
encoding procedure is also associated with the Figure and is available
on github.\selectlanguage{english}
\begin{figure}[H]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/mGGA-X/mGGA-X}
\caption{{Single parameter representation for the SCAN (upper panel) and M11-L
(lower panel) meta-GGA exchange functionals as a 3D function (left
plots) of the reduced density variables~\(u\),
eq.~{\ref{eqn:4}}, and the local kinetic energy density
variable~\(w\), eq. {\ref{eqn:7}}. The
right plots represent slices at constant~\(u=0\). For both
functionals, the black dots are the decoded points, the orange solid
curve is the original enhancement factor as obtained directly from
LibXC, and the dashed red curve is the result of the decoding of the
single parameter and the interpolation via univariate cubic spline.
Results are obtained on a grid of~~\(20\times20\)~ points
and~\(\beta=12\). A Jupyter notebook to encode every meta-GGA
exchange functional in LibXC, as well as to reproduce the plots and to
calculate the encoding and interpolation errors is associated with the
Figure.~
{\label{846623}}%
}}
\end{center}
\end{figure}
\subsection*{2.3 Exchange--correlation
functionals}
{\label{150795}}
The extension to include correlation functionals is trivial, especially
in the GGA case. The general shape of the enhancement factor of every
GGA xc functional can in fact be represented using just two variables
that depend on the density and its gradient, using the Wigner-Seitz
Radius:
\(\begin{equation}r_{s}=\left(\frac{3}{4\pi\rho}\right)^{\frac{1}{3}}, \end{equation}\)
and one of the reduced density gradient variables introduced above
(either~\(s\) or~\(u\)). The implementation of
a two-dimensional interpolation and encoding procedure for GGA
exchange--correlation functionals is reported in
Fig.~{\ref{111001}}, using a grid of~\(N\times N\)
points on~\(r_{s}\) and~\(u\) . Since a
three-dimensional interpolation is necessary, the same numerical
complication of the previous case apply. In general, most GGA~\emph{xc}
functionals can be interpolated using~\(N=20\)~and encoded
into single parameters with \textasciitilde{}1500 digits
using~\(\beta=12\). In Fig.~{\ref{111001}} and
related Jupyter notebook, the encoding procedure is applied to the BLYP
GGA~\emph{xc} functional~\textsuperscript{\hyperref[csl:43]{[43]},\hyperref[csl:46]{[46]}} and to the GAM NGA~\emph{xc}
functional~\textsuperscript{\hyperref[csl:47]{[47]}}. Single parameters of
\textasciitilde{}1500 digits are obtained and reported. It is important
to recognize that the BLYP functional diverges at~\(u=1\)
(\(s=\infty\)), hence the interpolation error
for~\(N=20\) grows substantially in the region
where~\(u>0.8\) (\(s>2\)). The interpolation error
can be further reduced by increasing~\(N\), pushing it to
regions of~\(s\) that are not very significant for
chemical systems. Nevertheless, the~\(s\rightarrow u\) transformation
is not ideal for functionals that diverge at the extremes.\selectlanguage{english}
\begin{figure}[H]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/GGA-xc/GGA-xc}
\caption{{Single parameter representation for the BLYP (upper panel) and GAM
(lower panel) exchange--correlation functionals as a 3D function (left
plots) of the Wigner-Seitz radius~\(r_s\),
eq.~{\ref{eqn:8}} and reduced density
variables~\(u\), eq.~{\ref{eqn:4}}. The
right plots represent slices at constant~\(r_s=2.5\). For both
functionals, the black dots are the decoded points, the orange solid
curve is the original enhancement factor as obtained directly from
LibXC, and the dashed red curve is the result of the decoding of the
single parameter and the interpolation via univariate cubic spline.
Results are obtained on a grid of~~\(20\times20\)~ points
and~\(\beta=12\). A Jupyter notebook to encode every GGA or NGA
exchange--correlation functional in LibXC, as well as to reproduce the
plots and to calculate the encoding and interpolation errors is
associated with the Figure.
{\label{111001}}%
}}
\end{center}
\end{figure}
Extension to meta-GGA exchange--correlation functionals, as well as to
functionals with more complex forms sitting on higher rungs of Jacob's
ladder, could be achieved with various degrees of difficulty. For
example, meta-GGA functionals depend on at least three variables that
cannot be decoupled (e.g. the density, its gradient, and the kinetic
energy density), and therefore they require higher dimensional
interpolations. The interpolation using multi-dimensional grids and
appropriate functions is not problematic, especially using available
python libraries. A slightly more complicated case is the case of hybrid
functionals (e.g. functionals that include a fraction of Hartree--Fock
exchange), for which the parameter that represents the fraction of HF
exchange could be encoded in the~procedure, for example at the beginning
of the sequence. For range-separated hybrid functionals, more
complicated~\emph{ad hoc} procedure must be designed. However, since
representing functionals with one parameter has no inherent benefit for
DFT as a method, going beyond the simple proof-of-principle described
above has very little scientific merit and is not explored further in
this context. A more rewarding endeavor is the search for a procedure
that does not rely on counting the number of parameters to evaluate the
transferability of functionals, as presented in the next section.
\section*{3. Statistical criteria of bias--variance tradeoff and analysis
of 60 exchange--correlation
functionals}
{\label{898228}}
The dispute between counting parameters and analytical fits is not a new
scene in statistics and machine learning, where the problem is generally
known as the bias/variance dilemma~\textsuperscript{\hyperref[csl:48]{[48]}}. Especially in
supervised learning, where a model is learned from (fitted to) some
training data, this dilemma translates to the necessity to strike a
balance between underfitting the data (bias error), resulting in methods
that don't incorporate all the relations between the data, and
overfitting them (variance), resulting in methods that are poorly
transferrable. Several criteria for model selection are available in
this context, and they all generally include two components, one that
accounts for the performance of the model on the training data, and
another that accounts for the transferability of the model to unseen
data. For a good introduction, see~\textsuperscript{\hyperref[csl:49]{[49]}}. The goal of the
next section is to borrow some of the methodologies that have been
developed in the context of supervised learning and apply them to the
analysis of DFT approximations.
\subsection*{3.1 Statistical Criteria for functional
evaluation}
{\label{153197}}
In order to introduce appropriate bias--variance criteria for
\emph{xc~}functionals, well-established model validation techniques from
statistical analysis must be used. Several criteria are available in
statistics for model selection and validation, mostly belonging to three
main classes:
\begin{itemize}
\tightlist
\item
Methods based on information criteria \textsuperscript{\hyperref[csl:50]{[50]}}.
\item
Methods obtained from Vapnik--Chervonenkis theory \textsuperscript{\hyperref[csl:51]{[51]}}.
\item
Resampling methods \textsuperscript{\hyperref[csl:52]{[52]},\hyperref[csl:53]{[53]}}.
\end{itemize}
In general, the first two classes include analytic methods that evaluate
the overall uncertainty (risk) of the model by inflating the error of
the fitted model calculated on the training set (or some appropriate
data set) by a penalty factor that depends on the degrees of freedom
(DoF) of the model and the number of data in the set. These methods
usually have to rely on assumptions on both the type of function that is
estimated and the statistical distribution of the data. The third class
of models require external data sets for validation, and is usually more
computational demanding, however it has the advantage of not relying on
any assumptions on the distribution of the errors, nor the training
data. Among the first class, the Akaike information criterion
(AIC)~\textsuperscript{\hyperref[csl:50]{[50]}} is the most widely used estimator of error
prediction. This coefficient is constructed from maximum likelihood
arguments, and it uses an additive formula to evaluate the overall
risk,~\emph{R} , as:
\(\begin{equation}R=R_{\text{emp}}+f\left(n,p\right), \end{equation}\)
where the empirical risk, \(R_{\text{emp}}\), represents the error of
the fitted model calculated on the training set, and should not be
confused with the error associated with the comparison of DFT data and
empirical (experimental) results, in a chemical sense. In order to
evaluate \(R_{\text{emp}}\), the recent ASCDB database can be used,
since it was specifically created to evaluate the performance of DFT
functionals. To account for the large differences in the average of the
absolute reference energies of each subset of ASCDB, it is convenient to
introduce here an overall weighted mean unsigned error (\emph{w}MUE),
calculated from the mean unsigned errors of the individual subsets,
MUE\emph{\textsubscript{i}} , using:
\(\begin{equation}w\text{MUE}=\sum_{i=1}^{16}{w_{i}\text{MUE}_{i}}\, \end{equation}\)
where the individual weights are calculated from the ratio between the
average of the absolute reference energies for each
subset,\(\left| \overline{\Delta \text{E}}\right|_{i}\), and that of the overall database (which is
6.988 kcal/mol for ASCDB, weights for this database are provided within
the Jupyter notebook that accompany the electronic version of this
article and on the Author's github page):
\begin{equation}\label{eqn:11}w_{i}=\left(\left|\overline{\Delta \text{E}}\right|_{i}\sum_{i=1}^{16}\frac{1}{\left|\overline{\Delta \text{E}}\right|_{i}}\right)^{-1}=\frac{6.988\frac{\text{kcal}}{\text{mol}}}{\left|\overline{\Delta \text{E}}\right|_{i}}. \end{equation}
This quantity is a slightly simplified version of the WTMAD-2 indicator
introduced by Goerigk et al.~\textsuperscript{\hyperref[csl:2]{[2]}} to rank functionals
based on the performance on the GMTKN55 database. On the statistical
standpoint,~\emph{w}MUE is the average of a slightly modified
coefficient of variation for each subset~\(\left( m\text{CV}_i=\text{MUE}_i / \left|\overline{\Delta \text{E}}\right|_i \right)\) where the
standard deviation is replaced by the mean unsigned error of each subset
\begin{equation}\label{eqn:12}w\text{MUE}=\frac{\sum_{i=1}^{16}{m\text{CV}}_i}{6.988\frac{kcal}{mol}}. \end{equation}
This replacement is justified by the fact that the MUE has similar
content to the root mean square error (which, up to a constant in the
definition, is the standard deviation of the distribution of the errors)
but it is usually preferred in the DFT literature as an indicator of
functional performance. While Goerigk et al. warned not to use their
weighted MUE indicators as an estimation of statistical error for
specific chemical problems,\textsuperscript{\hyperref[csl:2]{[2]}} the connection with the
coefficient of variation makes \emph{w}MUE useful beyond classification
purposes as a balanced measure of the empirical risk of a functional, as
demonstrated by the results presented below. It is important to keep in
mind though that---in accordance with Goerigk et al.'s
suggestion---weighted MUE values for different databases should never be
compared in absolute terms, because they intrinsically depend on the
molecules that are included in each database, and their main purpose is
to provide a basic criterion for the ranking of functionals.
\(f(n,p)\) in Eq. {\ref{eqn:9}} is an additive
penalty function that depends on the number of training
data,~\(n\) , and the degrees of freedom (number of free
parameters) of the fitted function,~\(p\) . Assuming a
gaussian distribution of the errors, the penalty function can be
calculated for regression models as:
\(\begin{equation} f\left(n,p\right)=\frac{2p}{n}\sigma^{2}, \end{equation}\)
with:
\(\begin{equation}\sigma^{2}=\frac{n}{n-p}R_{\text{emp}}\ ,\end{equation}\)
resulting in a final formula for AIC that is:
\(\begin{equation}\text{AIC}=w\text{MUE}\cdot\left(1+\frac{2p}{n-p}\right). \end{equation}\)
Among the second class of methods the Vapnik--Chervonenkis criterion
(VCC)~\textsuperscript{\hyperref[csl:51]{[51]}} can be selected. This criteria~inflates the
empirical risk by a multiplicative penalty function related to Vapnik's
measure:
\(\begin{equation}\text{VCC}=w\text{MUE}\cdot\left(1-\sqrt{p-p\ln p+\ln(n/2n)\ }\right)^{-1}. \end{equation}\)
For both AIC and VCC,~\(n=200\) when evaluated using ASCDB,
while~\(p\) is an estimation of the degrees of freedom
(DoF) that is equal to the number of fitted parameters for fitted
functionals, while it is~set to 1 for non-fitted ones (Table
{\ref{182001}}).
The definition of a resampling criterion for \emph{xc} functionals is
unfortunately not as straightforward, since in several cases it might be
difficult to find data that can be used as an external, unbiased,
validation set to be used in cross-validation methods. As such,
cross-validation criteria are intrinsically dependent on the data set
that is used to obtain them~\textsuperscript{\hyperref[csl:52]{[52]},\hyperref[csl:53]{[53]}}, and particular effort
has to be devoted to creating a criterion that is representative and
transferable across as many functionals and data sets as possible. The
purpose of cross-validation is, in practice, to highlight
inconsistencies in the treatment of external data, when compared to the
data that are used for the training of the parameters. Therefore, every
overfitted model present a large difference between the errors for the
training set and those for the validation set. The major hurdle in the
evaluation of~\emph{xc} functionals from different sources and
development philosophies is to find two appropriate and independent sets
of data that can function as a ``training set'' and as a ``validation
set''. On the one hand, the first 12 subsets of ASCDB include chemical
systems that are conventionally used to train and evaluate computational
methods. While none of the existing functionals is specifically trained
on all molecules of these subsets, most of the modern fitted functionals
were trained on similar systems (e.g. the Minnesota and wB97 families),
and even non-fitted ones (e.g. revTPSS and SCAN) have been subject to
convergent evolution to provide at least reasonable results for those
basic chemical systems. On the other hand, the last four subsets of
ASCDB contain unconventional systems that are very far from what current
functionals have been designed or trained for and represent a good
dataset for validation. (The three main subsets in this category comes
from the mindless benchmark database of Grimme and coworkers, while the
last one includes the energies of atoms on a per-electron basis.) A
simple cross-validation measurement of the overfitting of a functional
can then be obtained from the ratio between the MUE of the unbiased
calculation---used as a ``validation set''---and the overall~\emph{w}MUE
of ASCDB---used as the ``training set''. Interpreting this last quantity
as a cross-validation estimate of the unknown noise variance of the
distribution of the errors,~\(\sigma^{2}\), the cross-validation
criterion (CVC) can then be calculated by inflating the empirical risk
using eqs.~{\ref{eqn:9}}
and~{\ref{eqn:13}}, as:
\(\begin{equation}\text{CVC}=w\text{MUE}+\frac{2h}{n}\frac{\text{MU}E_{\text{UC}}}{w\text{MUE}}, \end{equation}\)
where the~\emph{w}MUE of the full ASCDB database is used at the
denominator in place of the MUE (or weighted MUE) of the first twelve
subsets of ASCDB because numerical evidence showed no significant
differences in the rankings when this transformation was performed.
Apart from a much simpler formula to calculate CVC, the main advantage
of using the weighted MUE of the entire database is that eq.
{\ref{eqn:17}} also becomes extensible to other
databases. For example, a straightforward extension of CVC to the
GMTKN55 database is obtained by using the overall WTMAD-2 at the
numerator, and the MUE (MAD using Goerigk et al. notation) for the
mindless benchmark subset at the denominator:
\(\begin{equation}\text{CVC}^{\text{GMTKN55}}=\text{WTMAD}2+\frac{2h}{n}\frac{\text{MAD}_{\text{MB1643}}}{\text{WTMA}D2}.\end{equation}\)
As for the \emph{w}MUE case discussed before, it is important to keep in
mind that, despite providing very similar rankings, CVC values from
different databases are difficult to compare in absolute terms because
they intrinsically depend on the molecules that are included in each
database.
\subsection*{3.2 Evaluation of 60 exchange--correlation
functionals}
{\label{111794}}
The usefulness of the three criteria described above can be evaluated on
the set of 60 popular exchange--correlation functional
approximations.\textsuperscript{\hyperref[csl:54]{[54]},\hyperref[csl:55]{[55]},\hyperref[csl:35]{[35]},\hyperref[csl:56]{[56]},\hyperref[csl:57]{[57]},\hyperref[csl:58]{[58]},\hyperref[csl:59]{[59]},\hyperref[csl:60]{[60]},\hyperref[csl:61]{[61]},\hyperref[csl:62]{[62]},\hyperref[csl:63]{[63]},\hyperref[csl:43]{[43]},\hyperref[csl:46]{[46]},\hyperref[csl:64]{[64]},\hyperref[csl:65]{[65]},\hyperref[csl:66]{[66]},\hyperref[csl:67]{[67]},\hyperref[csl:68]{[68]},\hyperref[csl:69]{[69]},\hyperref[csl:70]{[70]},\hyperref[csl:22]{[22]},\hyperref[csl:27]{[27]},\hyperref[csl:71]{[71]},\hyperref[csl:72]{[72]},\hyperref[csl:23]{[23]},\hyperref[csl:24]{[24]},\hyperref[csl:33]{[33]},\hyperref[csl:73]{[73]},\hyperref[csl:74]{[74]},\hyperref[csl:21]{[21]},\hyperref[csl:75]{[75]},\hyperref[csl:31]{[31]},\hyperref[csl:76]{[76]}}
\textsuperscript{\hyperref[csl:77]{[77]},\hyperref[csl:78]{[78]},\hyperref[csl:79]{[79]},\hyperref[csl:28]{[28]},\hyperref[csl:31]{[31]},\hyperref[csl:80]{[80]},\hyperref[csl:81]{[81]},\hyperref[csl:82]{[82]}}\textsuperscript{\hyperref[csl:34]{[34]},\hyperref[csl:32]{[32]}}\textsuperscript{\hyperref[csl:83]{[83]},\hyperref[csl:30]{[30]},\hyperref[csl:25]{[25]},\hyperref[csl:84]{[84]},\hyperref[csl:85]{[85]},\hyperref[csl:86]{[86]},\hyperref[csl:87]{[87]},\hyperref[csl:88]{[88]}} The functionals
is an expanded set of those that were originally selected to develop the
ASCDB database, and they include a broad set of approximations across
all different rungs of Jacob's ladder, as well as several decades of
functional development. The list of all used functionals, as well as
detailed reference to their original publications, are reported in the
last column of Table~{\ref{182001}}.
All calculations are stable broken-symmetry solutions close to the
complete basis set limit, and have been performed using quadruple-\selectlanguage{greek}ζ
\selectlanguage{english}quality basis sets using Q-Chem 5.1~\textsuperscript{\hyperref[csl:89]{[89]}}. Results for the
three statistical criteria, AIC, VCC, and CVC, are reported in
Table~{\ref{182001}}, as well as the ranking of each
functional according to each specific criterion (in parenthesis). The
average ranking of each functional across the three criteria is also
reported in the last column of Table~{\ref{182001}},
and is used as the final indicator for performance of a functional. It
is clear that the rankings obtained using the statistical criteria align
well with the Jacob's ladder picture of functional approximations.
According to all three criteria, for example, the three best functionals
are double-hybrid ``fifth rung'' approximations. ``Fourth rung'' hybrid
meta-GGA/NGA are the second-best class, followed by ``fourth rung''
hybrid GGA/NGA and ``third rung'' local meta-GGA, with similar average
performance. ``First and second rung'' Local GGA/NGA are on average at
the bottom of the rankings. Interestingly enough, modern non-fitted
functionals such as PBE and SCAN-D3(BJ) sits in the middle of the
ranking, together with most of the parametrized Minnesota functionals.
Even more interesting than the general trends are some of the outliers.
For example, the B3LYP-D3(BJ) ranks near the top according to all three
criteria, while its parent functional B3LYP is consistently ranked at
the bottom, more than 23 positions below B3LYP-D3(BJ), confirming the
trends observed in the literature. However, PBE-D3(BJ) is slightly more
transferable (despite a slightly higher \emph{w}MUE) than B3LYP-D3(BJ),
confirming recent finding of transferability issues in the popular
B3LYP-D3(BJ) functional.
\par\null\selectlanguage{english}
\begin{table}[H]
\centering
\normalsize\begin{tabulary}{1.0\textwidth}{CCCCCCCC}
& DoF & wMUE & AIC & VCC & CVC & AVG ranking & Ref \\
DSD-PBEP86-D3(BJ) & 7 & 2.14 & 2.29 (1) & 3.61 (1) & 2.3 (1) & 1 & 54 \\
PWPB95-D3(BJ) & 10 & 2.69 & 2.97 (2) & 4.99 (2) & 2.85 (2) & 2 & 34;55 \\
B2PLYP-D3(BJ) & 5 & 3.33 & 3.5 (3) & 5.21 (3) & 3.47 (3) & 3 & 55;56 \\
wB97M-V & 12 & 3.43 & 3.87 (4) & 6.76 (5) & 3.76 (4) & 4.33 & 57 \\
PW6B95-D3(BJ) & 9 & 3.75 & 4.11 (5) & 6.76 (4) & 3.98 (5) & 4.67 & 58;59 \\
PW6B95 & 6 & 5.27 & 5.6 (6) & 8.58 (9) & 5.43 (8) & 7.67 & 59 \\
PBE0-D3(BJ) & 4 & 5.45 & 5.68 (7) & 8.19 (8) & 5.53 (9) & 8 & 58;60 \\
HSE-HJS & 1 & 5.7 & 5.75 (9) & 7.23 (6) & 5.73 (15) & 10 & 61;62 \\
B3LYP-D3(BJ) & 6 & 5.42 & 5.76 (10) & 8.82 (11) & 5.63 (13) & 11.33 & 43;46;58;63;64;65 \\
B97M-rV & 12 & 5.05 & 5.69 (8) & 9.93 (16) & 5.53 (10) & 11.33 & 66;67 \\
PBE0 & 1 & 5.74 & 5.79 (12) & 7.28 (7) & 5.76 (16) & 11.67 & 60 \\
B97M-V & 12 & 5.12 & 5.77 (11) & 10.07 (17) & 5.61 (11) & 13 & 68;69 \\
wB97X-V & 10 & 5.33 & 5.89 (13) & 9.89 (15) & 5.7 (14) & 14 & 70 \\
SCAN-D3(BJ) & 2 & 6.32 & 6.45 (14) & 8.58 (10) & 6.35 (24) & 16 & 21;58 \\
M06-2X & 29 & 4.86 & 6.5 (15) & 14.37 (34) & 5.36 (7) & 18.67 & 26 \\
revPBE-D3(BJ) & 4 & 6.49 & 6.76 (17) & 9.74 (14) & 6.6 (26) & 19 & 58;71 \\
B97-1 & 10 & 5.95 & 6.58 (16) & 11.05 (21) & 6.34 (23) & 20 & 72 \\
M05 & 22 & 5.47 & 6.82 (19) & 13.84 (29) & 6.1 (18) & 22 & 22 \\
M05-2X & 19 & 5.82 & 7.05 (20) & 13.74 (28) & 6.26 (19) & 22.33 & 23 \\
MN15 & 59 & 3.68 & 6.76 (18) & 20.16 (44) & 5.35 (6) & 22.67 & 32 \\
BMK & 17 & 6.04 & 7.16 (23) & 13.57 (27) & 6.27 (20) & 23.33 & 73 \\
M06-2X-D3(0) & 35 & 5.03 & 7.16 (22) & 16.89 (38) & 5.62 (12) & 24 & 26;74 \\
PBE & 1 & 7.41 & 7.48 (25) & 9.39 (12) & 7.43 (35) & 24 & 20 \\
revTPSS-D3(BJ) & 6 & 6.76 & 7.17 (24) & 10.99 (20) & 6.99 (31) & 25 & 58;75 \\
N12-SX & 26 & 5.51 & 7.15 (21) & 15.26 (37) & 5.97 (17) & 25 & 30 \\
PW91 & 1 & 7.56 & 7.64 (28) & 9.59 (13) & 7.58 (37) & 26 & 76;77 \\
t-HCTHh & 17 & 6.36 & 7.54 (26) & 14.28 (32) & 6.98 (30) & 29.33 & 78 \\
TPSSh & 1 & 8.04 & 8.12 (33) & 10.2 (18) & 8.07 (41) & 30.67 & 79 \\
PBE-D3(BJ) & 3 & 7.81 & 8.05 (31) & 11.19 (23) & 7.87 (39) & 31 & 20;58 \\
B97-D3(0) & 9 & 7.23 & 7.91 (30) & 13.02 (26) & 7.61 (38) & 31.33 & 58;75 \\
M06-D3(0) & 39 & 5.09 & 7.56 (27) & 18.58 (41) & 6.63 (27) & 31.67 & 26;58 \\
B3PW91 & 3 & 7.82 & 8.06 (32) & 11.2 (24) & 7.9 (40) & 32 & 46;63;76;77 \\
M06 & 33 & 5.58 & 7.78 (29) & 17.98 (40) & 6.85 (29) & 32.67 & 26 \\
TPSS & 1 & 8.39 & 8.48 (37) & 10.64 (19) & 8.43 (44) & 33.33 & 78 \\
M11-D3(BJ) & 46 & 5.18 & 8.28 (34) & 21.81 (49) & 6.29 (22) & 35 & 68;79 \\
M08-HX & 47 & 5.24 & 8.46 (36) & 22.5 (50) & 6.29 (21) & 35.67 & 27 \\
revTPSS & 1 & 8.79 & 8.88 (39) & 11.15 (22) & 8.83 (47) & 36 & 75 \\
M11 & 40 & 5.6 & 8.4 (35) & 20.86 (46) & 6.64 (28) & 36.33 & 79 \\
N12 & 21 & 7.19 & 8.88 (38) & 17.78 (39) & 7.55 (36) & 37.67 & 30 \\
BP86 & 1 & 9.04 & 9.13 (41) & 11.47 (25) & 9.07 (48) & 38 & 63;80 \\
N12-SX-D3(BJ) & 32 & 6.61 & 9.13 (40) & 20.85 (45) & 7.2 (33) & 39.33 & 30;58 \\
B97-2 & 9 & 8.43 & 9.22 (42) & 15.17 (36) & 8.72 (46) & 41.33 & 81 \\
MN12-SX & 58 & 5.55 & 10.09 (46) & 29.83 (54) & 6.43 (25) & 41.67 & 30 \\
B3LYP & 3 & 9.79 & 10.09 (45) & 14.03 (30) & 9.91 (51) & 42 & 63;43;46;64;65 \\
SOGGA11-X & 21 & 7.69 & 9.49 (43) & 19.01 (43) & 8.35 (43) & 43 & 82 \\
N12-D3(0) & 27 & 7.57 & 9.93 (44) & 21.43 (48) & 8.11 (42) & 44.67 & 30;58 \\
revPBE & 1 & 11.22 & 11.33 (49) & 14.22 (31) & 11.24 (54) & 44.67 & 71 \\
MN15-L & 58 & 6.01 & 10.92 (47) & 32.27 (57) & 7.27 (34) & 46 & 33 \\
MN12-L & 58 & 6.23 & 11.33 (48) & 33.48 (59) & 7.14 (32) & 46.33 & 31 \\
RPBE & 1 & 11.3 & 11.41 (52) & 14.33 (33) & 11.33 (55) & 46.67 & 83 \\
BLYP & 1 & 11.8 & 11.92 (53) & 14.96 (35) & 11.83 (56) & 48 & 46;63 \\
M11-L-D3(0) & 50 & 6.83 & 11.38 (50) & 31.16 (56) & 8.66 (45) & 50.33 & 29;58 \\
t-HCTH & 16 & 9.7 & 11.39 (51) & 21.25 (47) & 10.3 (53) & 50.33 & 78 \\
M06-L-D3(0) & 35 & 8.56 & 12.19 (54) & 28.76 (52) & 9.86 (50) & 52 & 24;58 \\
OLYP & 1 & 14.97 & 15.12 (58) & 18.98 (42) & 14.99 (58) & 52.67 & 46;84 \\
M06-L & 34 & 8.85 & 12.47 (55) & 29.11 (53) & 10.13 (52) & 53.33 & 24 \\
M11-L & 44 & 8.27 & 12.93 (56) & 33.42 (58) & 9.84 (49) & 54.33 & 29 \\
HCTH/407 & 15 & 11.48 & 13.34 (57) & 24.5 (51) & 11.96 (57) & 55 & 85 \\
SVWN5 & 1 & 23.72 & 23.95 (59) & 30.08 (55) & 23.74 (59) & 57.67 & 43;86 \\
HF & 1 & 36.34 & 36.71 (60) & 46.09 (60) & 36.37 (60) & 60 & 87;88 \\
\end{tabulary}
\caption{{Statistical criteria for the ranking of~\emph{xc} functional
approximations based on bias--variance tradeoff using the ASCDB
database: Akaike information criterion (AIC),~Vapnik--Chervonenkis
criterion (VCC), and cross-validation criterion (CVC). The ranking of
each functional among the 60 considered is reported in parenthesis next
to each result. In addition, the degrees of freedom (DoF) estimated from
the number of parameters, and the weighted mean unsigned error for ASCDB
are also reported. Results are sorted according to the average ranking
among the three criteria. A full reference for each functional is also
provided in the last column.
{\label{182001}}%
}}
\end{table}~It is important to highlight that this study is primarily intended to
establish the reliability of the statistical criteria for evaluation of
performance and transferability of functionals. While the reported
rankings can be used to establish some trends, the list of functionals
is not comprehensive enough to provide reliable final suggestions on
which functional to pick among the more than 300 available in the
literature. Some conclusion on the performance and transferability of
the considered functionals are still interesting to report, and are as
follows:
\begin{itemize}
\tightlist
\item
Best double-hybrid: DSD-PBEP86-D3(BJ)~\textsuperscript{\hyperref[csl:54]{[54]}}, alternate:
PWPB95-D3(BJ)\textsuperscript{\hyperref[csl:35]{[35]},\hyperref[csl:55]{[55]}}.
\item
Best hybrid meta-GGA: \selectlanguage{greek}ω\selectlanguage{english}B97M-V \textsuperscript{\hyperref[csl:57]{[57]}}, alternate:
PW6B95-D3(BJ) \textsuperscript{\hyperref[csl:59]{[59]},\hyperref[csl:58]{[58]}}.
\item
Best hybrid-GGA: PBE0-D3(BJ) \textsuperscript{\hyperref[csl:60]{[60]},\hyperref[csl:90]{[90]},\hyperref[csl:58]{[58]}}, alternate:
B3LYP-D3(BJ) \textsuperscript{\hyperref[csl:63]{[63]},\hyperref[csl:43]{[43]},\hyperref[csl:46]{[46]},\hyperref[csl:64]{[64]},\hyperref[csl:65]{[65]},\hyperref[csl:58]{[58]}}.
\item
Best local meta-GGA: B97M-rV\textsuperscript{\hyperref[csl:66]{[66]},\hyperref[csl:67]{[67]}}, alternate: SCAN-D3(BJ)
\textsuperscript{\hyperref[csl:22]{[22]},\hyperref[csl:58]{[58]}}.
\item
Best local GGA: PBE \textsuperscript{\hyperref[csl:21]{[21]}}, alternate: PW91
\textsuperscript{\hyperref[csl:77]{[77]}}.
\end{itemize}
These results are strengthened by the fact that the majority of the
highlighted functionals overlap with the top performers suggested in
recent reviews by Head-Gordon's~\textsuperscript{\hyperref[csl:12]{[12]}},
Goerigk's~\textsuperscript{\hyperref[csl:3]{[3]}}, and Grimme's~\textsuperscript{\hyperref[csl:2]{[2]}} groups,
obtained with larger databases and considering a broader spectrum of
functionals.~ Finally, connecting the transferability results to the
issue of counting the number of parameters presented in
Section~{\ref{816226}}, the summary of the results
plotted in Fig.~{\ref{898028}} demonstrates a clear
lack of correlation between the average ranking of each functional and
its number of degrees of freedom. This lack of correlation supports the
main message of this work: The number of fitted parameters does not
represent~an effective measure of the transferability of a functional.
More reliable statistical criteria---such as those developed in this
work, or alternatively, the probabilistic performance estimator recently
introduced by Pernot and Savin~\textsuperscript{\hyperref[csl:91]{[91]},\hyperref[csl:92]{[92]}}---should be used to
evaluate the reliability of new and existing~xc functionals.\selectlanguage{english}
\begin{figure}[H]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/scatter/scatter}
\caption{{Scatter plot of the statistical analysis of 60 exchange--correlation
functionals. The results shCiao ow no correlation between the degrees of
freedom of a functional---a loose count of the number of its fitted
parameters (see text)---and its transferability---measured as the
average ranking across the three new statistical criteria.
{\label{898028}}%
}}
\end{center}
\end{figure}
\section*{Conclusions}
{\label{731335}}
A simple encoding procedure borrowed from data science was used to show
that the number of parameters of a fitted exchange--correlation
functional (or in a general sense, its degrees of freedom) are not
representative of transferability across different chemical systems. In
section~{\ref{816226}}, more than 300 functionals from
the LibXC DFT library are represented using one single parameter. This
exercise disentangles the arbitrary measurement ``number of parameter''
from the fundamental concept of transferability of the results, and
validates the proposition of Yu and Truhlar~\textsuperscript{\hyperref[csl:11]{[11]}} reading:~
``Counting parameters in a density functional is a little bit like
evaluating the quality of a research program by counting the
publications it produces---the number of publications is hardly
irrelevant, but it is far from the whole story, and usually it is not
the decisive measure of quality.''
To compensate for this lack of a ``decisive measure of quality'', three
new criteria~based on the statistical analysis of the recently proposed
ASCDB database of chemical data were developed in
section~{\ref{898228}}~ for the assessment of
exchange--correlation functional approximations. These criteria are the
Akaike information criterion (AIC), the~Vapnik--Chervonenkis criterion
(VCC), and the cross-validation criterion (CVC).~ While the criteria
mostly provide similar rankings, some differences between them do exist,
and the average ranking across the three criteria is the most
unambiguous measurement for the evaluation of~\emph{~}functionals.~
Preliminary results of the average ranking with 60 functionals show that
the best ones are those that carefully use a flexible~ mathematical form
with a modest number of appropriately fitted parameters (5--12).~In the
debate between different functional development philosophies, occupying
the middle ground seems to be the current winning strategy.
\selectlanguage{english}
\FloatBarrier
\section*{References}\sloppy
\phantomsection
\label{csl:1}[1]A. D. Becke, \textit{Journal of Chemical Physics} \textbf{2014}, \textit{140}, 18A301--19.
\phantomsection
\label{csl:2}[2]L. Goerigk, A. Hansen, C. Bauer, S. Ehrlich, A. Najibi, S. Grimme, \textit{Physical Chemistry Chemical Physics} \textbf{2017}, \textit{19}, 32184–32215.
\phantomsection
\label{csl:3}[3]L. Goerigk, N. Mehta, \textit{Australian Journal of Chemistry} \textbf{2019}, \textit{72}, 563.
\phantomsection
\label{csl:4}[4]H. Jacobsen, L. Cavallo, in \textit{Handbook of {Computational} {Chemistry}}, Springer International Publishing, Cham, \textbf{2017}, pp. 225–267.
\phantomsection
\label{csl:5}[5]G. Pacchioni, \textit{Catalysis Letters} \textbf{2015}, \textit{145}, 80–94.
\phantomsection
\label{csl:6}[6]A. D. Becke, \textit{INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY} \textbf{1983}, \textit{23}, 1915–1922.
\phantomsection
\label{csl:7}[7]A. D. Becke, \textit{The Journal of Chemical Physics} \textbf{1997}, \textit{107}, 8554–8560.
\phantomsection
\label{csl:8}[8]D. C. Langreth, J. P. Perdew, \textit{Solid State Communications} \textbf{1979}, \textit{31}, 567–571.
\phantomsection
\label{csl:9}[9]D. C. Langreth, J. P. Perdew, \textit{Physical Review B} \textbf{1980}, \textit{21}, 5469–5493.
\phantomsection
\label{csl:10}[10]J. P. Perdew, \textit{Physical Review Letters} \textbf{1985}, \textit{55}, 1665–1668.
\phantomsection
\label{csl:11}[11]H. S. Yu, S. L. Li, D. G. Truhlar, \textit{Journal of Chemical Physics} \textbf{2016}, \textit{145}, 130901–24.
\phantomsection
\label{csl:12}[12]N. Mardirossian, M. Head-Gordon, \textit{Molecular Physics} \textbf{2017}, \textit{115}, 2315–2372.
\phantomsection
\label{csl:13}[13]F. Dyson, \textit{Nature} \textbf{2004}, \textit{427}, 297–297.
\phantomsection
\label{csl:14}[14]G. K.-L. Chan, N. C. Handy, \textit{Journal of Chemical Physics} \textbf{2000}, \textit{112}, 5639–5653.
\phantomsection
\label{csl:15}[15]R. Peverati, D. G. Truhlar, \textit{Philosophical Transactions Of The Royal Society Of London Series A-Mathematical Physical And Engineering Sciences} \textbf{2014}, \textit{372}, 20120476.
\phantomsection
\label{csl:16}[16]B. Civalleri, D. Presti, R. Dovesi, A. Savin, in \textit{Chemical {Modelling}}, \textbf{2012}, pp. 168–185.
\phantomsection
\label{csl:17}[17]N. Schuch, F. Verstraete, \textit{Nature Physics} \textbf{2009}, \textit{5}, 732–735.
\phantomsection
\label{csl:18}[18]S. T. Piantadosi, \textit{AIP Advances} \textbf{2018}, \textit{8}, 095118.
\phantomsection
\label{csl:19}[19]L. Boué, \textit{arXiv:1904.12320 [cs, stat]} \textbf{2019}.
\phantomsection
\label{csl:20}[20]J. P. Perdew, K. Schmidt, \textit{AIP Conference Proceedings} \textbf{2001}, \textit{577}, 1–20.
\phantomsection
\label{csl:21}[21]J. P. Perdew, K. Burke, M. Ernzerhof, \textit{Physical Review Letters} \textbf{1996}, \textit{77}, 3865–3868.
\phantomsection
\label{csl:22}[22]J. Sun, A. Ruzsinszky, J. P. Perdew, \textit{Physical Review Letters} \textbf{2015}, \textit{115}, 036402.
\phantomsection
\label{csl:23}[23]Y. Zhao, N. E. Schultz, D. G. Truhlar, \textit{The Journal of Chemical Physics} \textbf{2005}, \textit{123}, 161103.
\phantomsection
\label{csl:24}[24]Y. Zhao, N. E. Schultz, D. G. Truhlar, \textit{Journal of Chemical Theory And Computation} \textbf{2006}, \textit{2}, 364–382.
\phantomsection
\label{csl:25}[25]Y. Zhao, D. G. Truhlar, \textit{The Journal of Chemical Physics} \textbf{2006}, \textit{125}, 194101.
\phantomsection
\label{csl:26}[26]Y. Zhao, D. G. Truhlar, \textit{The Journal of Physical Chemistry A} \textbf{2006}, \textit{110}, 13126–13130.
\phantomsection
\label{csl:27}[27]Y. Zhao, D. G. Truhlar, \textit{Theoretical Chemistry Accounts} \textbf{2008}, \textit{120}, 215–241.
\phantomsection
\label{csl:28}[28]Y. Zhao, D. G. Truhlar, \textit{Journal of Chemical Theory And Computation} \textbf{2008}, \textit{4}, 1849–1868.
\phantomsection
\label{csl:29}[29]R. Peverati, D. G. Truhlar, \textit{The Journal of Physical Chemistry Letters} \textbf{2011}, \textit{2}, 2810–2817.
\phantomsection
\label{csl:30}[30]R. Peverati, D. G. Truhlar, \textit{The Journal of Physical Chemistry Letters} \textbf{2012}, \textit{3}, 117–124.
\phantomsection
\label{csl:31}[31]R. Peverati, D. G. Truhlar, \textit{Physical Chemistry Chemical Physics} \textbf{2012}, \textit{14}, 16187–16191.
\phantomsection
\label{csl:32}[32]R. Peverati, D. G. Truhlar, \textit{Physical Chemistry Chemical Physics} \textbf{2012}, \textit{14}, 13171–13174.
\phantomsection
\label{csl:33}[33]H. S. Yu, X. He, S. L. Li, D. G. Truhlar, \textit{Chemical Science} \textbf{2016}, \textit{7}, 5032–5051.
\phantomsection
\label{csl:34}[34]H. S. Yu, X. He, D. G. Truhlar, \textit{Journal of Chemical Theory And Computation} \textbf{2016}, \textit{12}, 1280–1293.
\phantomsection
\label{csl:35}[35]L. Goerigk, S. Grimme, \textit{Journal of Chemical Theory And Computation} \textbf{2010}, \textit{6}, 107–126.
\phantomsection
\label{csl:36}[36]L. Goerigk, S. Grimme, \textit{Journal of Chemical Theory And Computation} \textbf{2011}, \textit{7}, 291–309.
\phantomsection
\label{csl:37}[37]G. Santra, N. Sylvetsky, J. M. L. Martin, \textit{The Journal of Physical Chemistry A} \textbf{2019}, \textit{123}, 5129–5143.
\phantomsection
\label{csl:38}[38]J. M. L. Martin, G. Santra, \textit{Israel Journal of Chemistry} \textbf{n.d.}, \textit{n/a}, DOI 10.1002/ijch.201900114.
\phantomsection
\label{csl:39}[39]P. Morgante, R. Peverati, \textit{Physical Chemistry Chemical Physics} \textbf{2019}, \textit{21}, 19092–19103.
\phantomsection
\label{csl:40}[40]M. A. L. Marques, M. J. T. Oliveira, T. Burnus, \textit{Computer Physics Communications} \textbf{2012}, \textit{183}, 2272–2281.
\phantomsection
\label{csl:41}[41]S. Lehtola, C. Steigemann, M. J. T. Oliveira, M. A. L. Marques, \textit{SoftwareX} \textbf{2018}, \textit{7}, 1–5.
\phantomsection
\label{csl:42}[42]R. Peverati, D. G. Truhlar, \textit{Journal of Chemical Theory And Computation} \textbf{2011}, \textit{7}, 3983–3994.
\phantomsection
\label{csl:43}[43]A. D. Becke, \textit{Physical Review A} \textbf{1988}, \textit{38}, 3098–3100.
\phantomsection
\label{csl:44}[44]R. Peverati, Y. Zhao, D. G. Truhlar, \textit{The Journal of Physical Chemistry Letters} \textbf{2011}, \textit{2}, 1991–1997.
\phantomsection
\label{csl:45}[45]A. D. Becke, \textit{The Journal of Chemical Physics} \textbf{2000}, \textit{112}, 4020–4026.
\phantomsection
\label{csl:46}[46]C. Lee, W. Yang, R. G. Parr, \textit{Physical Review B} \textbf{1988}, \textit{37}, 785–789.
\phantomsection
\label{csl:47}[47]H. S. Yu, W. Zhang, P. Verma, X. He, D. G. Truhlar, \textit{Physical Chemistry Chemical Physics} \textbf{2015}, \textit{17}, 12146–12160.
\phantomsection
\label{csl:48}[48]S. Geman, E. Bienenstock, R. Doursat, \textit{Neural Computation} \textbf{1992}, \textit{4}, 1–58.
\phantomsection
\label{csl:49}[49]V. S. Cherkassky, F. Mulier, \textit{{Learning from Data: Concepts, Theory, and Methods}}, IEEE Press : Wiley-Interscience, Hoboken, N.J, \textbf{2007}.
\phantomsection
\label{csl:50}[50]H. Akaike, \textit{IEEE Transactions on Automatic Control} \textbf{1974}, \textit{19}, 716–723.
\phantomsection
\label{csl:51}[51]V. N. Vapnik, A. Y. Chervonenkis, \textit{Theory of Probability \& Its Applications} \textbf{1971}, \textit{16}, 264–280.
\phantomsection
\label{csl:52}[52]S. Geisser, \textit{{Predictive Inference: an Introduction}}, Chapman \& Hall, New York, \textbf{1993}.
\phantomsection
\label{csl:53}[53]P. A. Devijver, J. Kittler, \textit{{Pattern Recognition: a Statistical Approach}}, Prentice/Hall International, Englewood Cliffs, N.J, \textbf{1982}.
\phantomsection
\label{csl:54}[54]S. Kozuch, J. M. L. Martin, \textit{Physical Chemistry Chemical Physics} \textbf{2011}, \textit{13}, 20104.
\phantomsection
\label{csl:55}[55]L. Goerigk, S. Grimme, \textit{Physical Chemistry Chemical Physics} \textbf{2011}, \textit{13}, 6670–6688.
\phantomsection
\label{csl:56}[56]S. Grimme, \textit{The Journal of Chemical Physics} \textbf{2006}, \textit{124}, 034108.
\phantomsection
\label{csl:57}[57]N. Mardirossian, M. Head-Gordon, \textit{The Journal of Chemical Physics} \textbf{2016}, \textit{144}, 214110.
\phantomsection
\label{csl:58}[58]S. Grimme, S. Ehrlich, L. Goerigk, \textit{Journal of Computational Chemistry} \textbf{2011}, \textit{32}, 1456–1465.
\phantomsection
\label{csl:59}[59]Y. Zhao, D. G. Truhlar, \textit{The Journal of Physical Chemistry A} \textbf{2005}, \textit{109}, 5656–5667.
\phantomsection
\label{csl:60}[60]C. Adamo, V. Barone, \textit{The Journal of Chemical Physics} \textbf{1999}, \textit{110}, 6158–6170.
\phantomsection
\label{csl:61}[61]A. V. Krukau, O. A. Vydrov, A. F. Izmaylov, G. E. Scuseria, \textit{The Journal of Chemical Physics} \textbf{2006}, \textit{125}, 224106.
\phantomsection
\label{csl:62}[62]T. M. Henderson, B. G. Janesko, G. E. Scuseria, \textit{The Journal of Chemical Physics} \textbf{2008}, \textit{128}, 194105.
\phantomsection
\label{csl:63}[63]S. H. Vosko, L. Wilk, M. Nusair, \textit{Canadian Journal of Physics} \textbf{1980}, \textit{58}, 1200–1211.
\phantomsection
\label{csl:64}[64]A. D. Becke, \textit{The Journal of Chemical Physics} \textbf{1993}, \textit{98}, 5648–5652.
\phantomsection
\label{csl:65}[65]P. Stephens, F. J. Devlin, C. F. Chabalowski, M. J. Frisch, \textit{The Journal Of Physical Chemistry} \textbf{1994}, \textit{98}, 11623–11627.
\phantomsection
\label{csl:66}[66]N. Mardirossian, L. R. Pestana, J. C. Womack, C.-K. Skylaris, T. Head-Gordon, M. Head-Gordon, \textit{The Journal of Physical Chemistry Letters} \textbf{2016}, \textit{8}, 35–40.
\phantomsection
\label{csl:67}[67]R. Sabatini, T. Gorni, S. de Gironcoli, \textit{Physical Review B} \textbf{2013}, \textit{87}, 041108.
\phantomsection
\label{csl:68}[68]N. Mardirossian, M. Head-Gordon, \textit{Journal of Chemical Physics} \textbf{2015}, \textit{142}, 074111–32.
\phantomsection
\label{csl:69}[69]O. A. Vydrov, T. van Voorhis, \textit{The Journal of Chemical Physics} \textbf{2010}, \textit{133}, 244103.
\phantomsection
\label{csl:70}[70]M. H.-G. N. Mardirossian, \textit{Physical Chemistry Chemical Physics} \textbf{2014}, 9904–9924.
\phantomsection
\label{csl:71}[71]Y. Zhang, W. Yang, \textit{Physical Review Letters} \textbf{1997}, \textit{80}, 890–890.
\phantomsection
\label{csl:72}[72]F. A. Hamprecht, A. J. Cohen, D. J. Tozer, N. C. Handy, \textit{The Journal of Chemical Physics} \textbf{1998}, \textit{109}, 6264–6271.
\phantomsection
\label{csl:73}[73]A. D. Boese, J. M. L. Martin, \textit{The Journal of Chemical Physics} \textbf{2004}, \textit{121}, 3405–3416.
\phantomsection
\label{csl:74}[74]S. Grimme, J. Antony, S. Ehrlich, H. Krieg, \textit{The Journal of Chemical Physics} \textbf{2010}, \textit{132}, 154104.
\phantomsection
\label{csl:75}[75]J. P. Perdew, A. Ruzsinszky, G. I. Csonka, L. A. Constantin, J. Sun, \textit{Physical Review Letters} \textbf{2009}, \textit{103}, 026403.
\phantomsection
\label{csl:76}[76]J. P. Perdew, in \textit{Electronic {Structure} of {Solids} '91}, \textbf{1991}.
\phantomsection
\label{csl:77}[77]J. P. Perdew, J. A. Chevary, S. H. Vosko, K. A. Jackson, M. R. Pederson, D. J. Singh, C. Fiolhais, \textit{Physical Review B} \textbf{1992}, \textit{46}, 6671–6687.
\phantomsection
\label{csl:78}[78]A. D. Boese, N. C. Handy, \textit{The Journal of Chemical Physics} \textbf{2002}, \textit{116}, 9559–9569.
\phantomsection
\label{csl:79}[79]V. N. Staroverov, G. E. Scuseria, J. Tao, J. P. Perdew, \textit{The Journal of Chemical Physics} \textbf{2002}, \textit{119}, 12129–12137.
\phantomsection
\label{csl:80}[80]J. P. Perdew, \textit{Physical Review B} \textbf{1986}, \textit{33}, 8822–8824.
\phantomsection
\label{csl:81}[81]P. J. Wilson, T. J. Bradley, D. J. Tozer, \textit{The Journal of Chemical Physics} \textbf{2001}, \textit{115}, 9233–9242.
\phantomsection
\label{csl:82}[82]R. Peverati, D. G. Truhlar, \textit{The Journal of Chemical Physics} \textbf{2011}, \textit{135}, 191102.
\phantomsection
\label{csl:83}[83]B. Hammer, L. B. Hansen, J. K. Norskov, \textit{Physical Review B} \textbf{1999}, \textit{59}, 7413–7421.
\phantomsection
\label{csl:84}[84]A. D. Boese, N. C. Handy, \textit{The Journal of Chemical Physics} \textbf{2000}, \textit{114}, 5497–5503.
\phantomsection
\label{csl:85}[85]N. C. Handy, A. J. Cohen, \textit{Molecular Physics} \textbf{2001}, \textit{99}, 403–412.
\phantomsection
\label{csl:86}[86]J. C. Slater, \textit{Physical Review} \textbf{1951}, \textit{81}, 385–390.
\phantomsection
\label{csl:87}[87]D. R. Hartree, \textit{Mathematical Proceedings of the Cambridge Philosophical Society} \textbf{1928}, \textit{24}, 89–110.
\phantomsection
\label{csl:88}[88]V. Fock, \textit{Zeitschrift für Physik} \textbf{1930}, \textit{61}, 126–148.
\phantomsection
\label{csl:89}[89]Y. Shao, Z. Gan, E. Epifanovsky, A. T. B. Gilbert, M. Wormit, J. Kussmann, A. W. Lange, A. Behn, J. Deng, X. Feng, et al., \textit{Molecular Physics} \textbf{2015}, \textit{113}, 184–215.
\phantomsection
\label{csl:90}[90]M. Ernzerhof, G. E. Scuseria, \textit{The Journal of Chemical Physics} \textbf{1999}, \textit{110}, 5029–5036.
\phantomsection
\label{csl:91}[91]P. Pernot, A. Savin, \textit{The Journal of Chemical Physics} \textbf{2018}, \textit{148}, 241707.
\phantomsection
\label{csl:92}[92]P. Pernot, A. Savin, \textit{The Journal of Chemical Physics} \textbf{2020}, \textit{152}, 164108.
\end{document}