Agnostic cosmology in the CAMEL framework

Published in arXiv
07/11/2016

Abstract

Cosmological parameter estimation is traditionally performed in the Bayesian context. By adopting an ”agnostic” statistical point of view, we show the interest of confronting the Bayesian results to a frequentist approach based on profile-likelihoods.

To this purpose, we have developed the Cosmological Analysis with a Minuit Exploration of the Likelihood (CAMEL) software. Written from scratch in pure C++, emphasis was put in building a clean and carefully-designed project where new data and/or cosmological computations can be easily included.

CAMEL incorporates the latest cosmological likelihoods and gives access from the very same input file to several estimation methods:

- •
A high quality Maximum Likelihood Estimate (a.k.a ”best fit”) using MINUIT,

- •
profile likelihoods,

- •
a new implementation of an Adaptive Metropolis MCMC algorithm that relieves the burden of reconstructing the proposal distribution.

We present here those various statistical techniques and roll out a full use-case that can then used as a tutorial. We revisit the \(\rm{\Lambda CDM}\) parameters determination with the latest Planck data and give results with both methodologies. Furthermore, by comparing the Bayesian and frequentist approaches, we discuss a ”likelihood volume effect” that affects the optical reionization depth when analyzing the high multipoles part of the Planck data.

The software, used in several Planck data analyzes, is available from http://camel.in2p3.fr. Using it does not require advanced C++ skills.

Since the 2000’s (Christensen et al., 2001), the adoption of Monte Carlo Markov Chain (MCMC) techniques by the cosmological community has promoted the acceptance of Bayesian methodology for parameter estimation. The possible (although inefficient) sampling of a high dimensional space allows, invoking Bayes theorem, to reconstruct posterior distributions assuming some prior degree of belief (rarely discussed). MCMC usage was popularized in cosmology by the CosmoMC package (Lewis et al., 2002) which uses an optimized version of the Metroplis-Hastings algorithm (see also MontePython (Audren et al., 2013)). There exist today several implementations of other algorithms, as MultiNest (Feroz et al., 2009) or PolyChord (Handley et al., 2015) for Nested Sampling (Skilling, 2004), or emcee (Foreman-Mackey et al., 2013) for the affine invariant ensemble sampler (Goodman et al., 2010). Several of them are packaged within the CosmosSIS package (Zuntz et al., 2015) and some of them were compared in Allison et al. (2014). They are essentially written in Fortran/python and/or wrapped into multiple languages. All of them adopt the Bayesian paradigm (see Hobson et al. (2010) for a review).

In other communities, as High Energy Physics, the frequentist approach is more traditional. The reason for its absence in cosmology maybe lies in the difficulty of building precise profile-likelihoods because of the numerical noise inherent to the Boltzmann solver computations (Sect.\ref{sec:precision}). It was however shown to be feasible on real Planck data using an accurate minimizer and tuning the precision of the Boltzmann computations (Planck Collaboration Int. XVI, 2014).

We wish to avoid the ideological debate about ”who is right”, and focus on the interest of confronting both methodologies. It is the goal of the CAMEL package to provide tools to compare the Bayesian and frequentist analyzes in order to better understand for instance the Bayesian priors effects or the shape of the likelihood. We choose to write it in pure C++ since we consider it as a proper language to develop a robust, clean and long term project. By C++ we mean an object-oriented code with some level of abstraction and well defined design patterns in order to build a clean interface that anticipates future developments. Fortunately users do not need to know what is under the hood and one can plug in easily any new Boltzmann solver and/or likelihood.

Certainly the best way to present CAMEL is by working out a full use-case. We will revisit the \(\rm{\Lambda CDM}\) cosmological parameters estimation with Planck data using the Hillipop likelihood (Planck Collaboration XI, 2015) and compare the Bayesian and frequentist results. Not only will we show how to obtain precise best-fits and profile-likelihoods, but also propose a new implementation of an Adaptive scheme for the Metropolis MCMC algorithm that relieves the pain of reconstructing the proposal. Sect. \ref{sec:pipeline} is a quick overview of the building blocks of parameter estimation. Then Sect. \ref{sec:use_case} explains in minute-details how to obtain best-fits (Sect.\ref{sec:MLE}), profile-likelihoods (Sect.\ref{sec:profile}) and produce MCMC with the new Adaptive algorithm (Sect. \ref{sec:MCMC}). For each method, the basics will be reviewed. This part concretely shows how to produce all the results and can therefore be used as a tutorial. Sect \ref{sec:comp} finally shows the interest of comparing both methods focusing on the study of the likelihood ”volume effects”. These results, partially available in Planck Collaboration Int. XVI (2014) and Planck Collaboration XIII (2015), were however never compiled and detailed. The results on the volume effects that affect Planck’s high-\(\ell\) likelihoods are new and complement Couchot et al. (2015).

A CPE analysis can be factorized into 3 distinct pieces (see Fig. \ref{fig:pipeline}):

- 1.
A code to compute some observables (as a set of \(C_{\ell}\) values for the CMB or \(P(k,z)\) for the matter power spectra) given a set of cosmological parameters (called \(\Omega\)) within a theoretical framework: \(X(\Omega)\). In cosmology, the numerical codes that solve the background+perturbation equations are often called ”Boltzman solvers”. In CAMEL such a tool is called an Engine. Today, class (http://class-code.net/) and camb (http://camb.info) are widely used but any analytic parametrization of some cosmological observable enters this category too.

- 2.
The experimental data, which in their most complete statistical form are given in the form of a likelihood, i.e a function of \(X\) + generally some extra nuisance parameters (called \(\nu\)). There is an implicit conversion here to cosmological parameters through the Engine: \(\mathscr{L}(\Omega,\nu)=\mathscr{L}(X(\Omega),\nu)\). In the following we will also often use the \(\chi^{2}\) function which is by definition \(\chi^{2}\equiv-2\ln\mathscr{L}\).

- 3.
One then wants to extract some information about the parameters themselves (including the nuisance ones), so you need some statistical methodology. In the realm of estimators, the Maximum Likelihood Estimator (MLE) is the King since it enjoys several nice properties (as being unbiased and of minimal variance asymptotically). It is also referred to as a ”best fit” meaning the set of \((\hat{\Omega},\hat{\nu})\) values that gives the maximal likelihood value (or minimum \(\chi^{2}\) one). This is a point estimate and is a classical statistical notion. Since it is not very practical to realize what happens in a multi-dimensional space, we focus on individual (or a pair of) parameters : this is the domain of interval estimation. Bayesian’s will extract the p.d.f of the true parameters (posterior distribution), while frequentists will make statements on data covering the true (but unknown) values through the use of profile-likelihoods.

CAMEL implements these 3 aspects through the abstraction mechanism allowed by Object-Oriented languages. Currently the following concrete implementations (all in C/C++, including the likelihoods) are provided:

- 1.
Engines:

- –
class (Blas et al., 2011)

- –
pico (Fendt et al., 2007)

- –
- 2.
Likelihoods:

- –
- –
CMB-lensing: Planck (Planck Collaboration XV, 2016),

- –
BAO: 6dF (Beutler et al., 2011), WiggleZ (Blake et al., 2011), SDSS (Padmanabhan et al., 2012; Anderson et al., 2012; Anderson et al., 2014): 1D and anisotropic,

- –
SN1a : JLA (Betoule et al., 2014).

- 3.
Statistics: best-fit, profile-likelihood, MCMC.

See the Appendix for more details.

Estimating cosmological parameter (with a full Boltzman solver) is a CPU demanding task and can hardly be performed on your laptop: one needs to rely on a computer cluster. But before sending a lot of jobs to your farm, you should define and test your parameter file which describes all the cosmological and nuisance parameters (varied or fixed) , data , Engine and statistical method. All the tests -in this section- can be run interactively. We choose as a benchmark the determination of \(\rm{\Lambda CDM}\) parameters Planck high+low \(\ell\) likelihoods using the class engine.

We write in this part a parameter file named hlpTT_bflike_LCDM.par.

First comes the choice of the engine:

engine=class

Then comes the choice of the likelihoods. Here we will use the Planck Hillipop likelihood (for the high-\(\ell\) part) and bflike for the low-\(\ell\) one (both discussed in Planck Collaboration XIII (2015)).

############################################################################

#hiell

HiLLiPOP=HiLLiPOP/DX11dHM_superExt_CO_TT.lik

#lowl

clikfile=planck_data/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik

##############################################################################

Then we define the parameters: the cosmological ones (i.e those entering the engine) must be stamped with the ”cosm” keyword and follow the syntax of the Engine(here we follow the class terminology). Nuisances, that depends on the likelihoods, should use the ”nui” keyword. Each parameter that will be varied (”par”) or not (”fix”) should be given an initial value (no necessarily very precise), an expected standard deviation and some boundaries for finite exploration (which can be viewed as the flat prior borders for MCMC sampling or simply a box within which the MLE is searched).

##############PARAMETERS################################################

#cosmological parameters

#par/fix name type init sigma min max

par omega_b cosm 0.02224 0.00027 0.017 0.027

par omega_cdm cosm 0.1192 0.0026 0.09 0.15

par 100*theta_s cosm 1.0418 0.6E-04 1.03 1.05

par tau_reio cosm 0.07 0.13E-01 0.01 0.20

par log(10^10A_s) cosm 3.07 0.025 2.7 3.5

par n_s cosm 0.96 0.0070 0.9 1.1

#nuisances for hilipop

par A_planck nui 1 0.001 0.9 1.1

par c0 nui 0. 0.001 -0.05 0.05

par c1 nui 0. 0.001 -0.05 0.05

fix c2 nui 0. 0.001 -0.05 0.05

par c3 nui 0. 0.001 -0.05 0.05

par c4 nui 0.004 0.001 -0.05 0.05

par c5 nui 0.004 0.001 -0.05 0.05

par Aps100x100 nui 2.5E-04 1E-05 0.0 0.1

par Aps100x143 nui 1.1E-04 7E-06 0.0 0.1

par Aps100x217 nui 9.9E-05 6E-06 0.0 0.1

par Aps143x143 nui 4.7E-05 2E-06 0.0 0.1

par Aps143x217 nui 3.1E-05 3E-06 0.0 0.1

par Aps217x217 nui 7.6E-05 6E-06 0.0 0.1

par Asz nui 1 0.1 0.0 10

par Acib nui 1. 0.1 0.0 10

par AdustTT nui 1 0.1 0.0 2

fix AdustPP nui 0.00 0.1 0.0 2

fix AdustTP nui 0.00 0.1 0.0 2

par Aksz nui 0.00 1.0 0.0 10

par Aszxcib nui 0.00 1.0 0.0 10

##########################################################################

Next you may add some external constraints (or ”priors” in Bayesian vocabulary). The most common one is the 1D Gaussian, which is illustrated here: for instance we add a Gaussain likelihood to ”AdutTT” centered on 0 with \(\sigma=0.2\).

#priors######################################

gauss1 AdustTT 1 0.2

gauss1 Acib 1 0.2

gauss1 A_planck 1 0.0025

gauss1 c0 0 2e-3

gauss1 c1 0 2e-3

gauss1 c3 0 2e-3

gauss1 c4 0.0025 2e-3

gauss1 c5 0.0025 2e-3

###############################################

Then you may need to tune the details of the Boltzman solver. For class you can do it with the keyword class which passes the following argument as a strings to the solver. For instance to define the non-cold-dark matter setup to a degenerate massive neutrino with 0.06 eV (the upper limit in normal hierarchy measured from oscillation experiments) one may use:

#class setup###############################

class N_ncdm 1

class m_ncdm 0.06

class N_eff 2.046

###########################################

Note that in class, \(N_{\rm eff}\) corresponds only to the pure radiation today which is why it is not the famous 3.046 value). One may also pass these parameters as previously with the ”fix” keyword, but this better decouples the variables from their environment (and avoids repeating fixed numbers in the output files). There are some cases where strings to class is mandatory:

##################################################

class k_pivot 0.05

class lensing yes

class sBBN\ file bbn/sBBN.dat

#################################################

(Note the `\`

in the last line to indicate a trailing space)

Finally we define the file that contains some high-precision settings when we wish to compute precisely the MLE or profile with class. Indeed as we will see later getting a precise \(\chi^{2}\) minimum is very challenging with a method that cannot rely on analytical gradients. More exactly Boltzmann solvers do not lead to a likelihood that is continuous in the parameters space because they are the result of many numerical computations with some limited accuracy (as cutoffs in the integrals). class, beside being written in clean C, is interesting in this respect since it allows to specify in a single place all the precision parameters that are used. We provide a set of pre-defined precision files (which result from a trade-off between accuracy and CPU-time and are presented in Planck Collaboration Int. XVI (2014)) and suggest to use the following one:

#########################################

precisionFile=class_pre/hpjul2.pre

#########################################

when performing MLE or profile-likelihood studies. Default settings can be used when doing MCMC, because the algorithm smooths the details of the likelihood function. More in the next section.

The parameter file is ready and we call it hlpTT_bflike_LCDM.par.
To test it you can run
```
writeChi2 hlpTT_bflike_LCDM.par
```

which parses your file and just writes the \(\chi^{2}\) value corresponding the the given parameter value (no fit there).
If the program goes to the end, you most probably have some valid file.

While in practice using the `hpjul2.pre`

is generally sufficient, we show here
how you can get an idea of the numerical noise in your analysis.
We recall that the difficulty for finding a really precise multi-dimensional \(\chi^{2}\) minimum comes from the fact that the function \(\dfrac{\partial\mathscr{L}}{\partial\Omega_{i}}\) is not continuous because we use a full Boltzman solver to compute the observable \(X\) in \(\mathscr{L}(X(\Omega))\).
We can get some idea of the level of numerical noise, by scanning linearly one variable (with all other ones fixed) and studying the smoothness of the \(\chi^{2}(\theta_{i})\) values.
This is the goal of the `ScanParam`

program which fixes all but one parameters from the file, vary linearly the free one and records the \(\chi^{2}\) values in a text file.
Let’s see how this works for instance on the parameter \(\omega_{\mathrm{b}}\). Run:
```
ScanParam hlpTT_bflike_LCDM.par omega_b 0.020 0.025 50
```

where the last 3 values indicates how the variable is scanned (here from 0.02 to 0.025 with 50 equidistant points).
The upper plot of Fig. \ref{fig:scanomb} shows the \(\chi^{2}\) values, in blue when using the default class precision parameters (ie. no particular precision file) and in red with the `hpjul2.pre`

one.
Then the black curve shows a smooth interpolation between the points, and the bottom plot the residual of each point w.r.t to it. We then see that the numerical noise with the `hpjul2.pre`

file (in red) is much lower than for the default case (in blue). Using such a precision allowed in all cases we studied to determine an accurate \(\chi^{2}\) minimum.

It can be useful before running jobs to estimate the time taken by each iteration. By iteration we mean the complete processing of one model. For this you can run interactively a few iterations of the `Minimize`

executable adding some verbosity to your parameter file with:

verbose=true

This is done with
```
Minimize myparfile output
```

where ”output” is any name.

It will print among many other information the time used for each iteration (at the end of each line).
The CPU time for hlpTT_bflike_LCDM.par typically varied between 5s on 8 cores to 2s on 24 ones (with the `hpjul2`

precision file) and is only limited by the scalability of the Boltzman solver.

The MLE enjoys many nice properties that can be reviewed e.g in James (2007). One that is not always well known (and that also applies to the likelihood ratio statistics) is the MLE invariance which states that if \(\hat{\theta}\) is the MLE of \(\theta\) , then the MLE of any function of \(\theta\), \(y=f(\theta)\) is \(\hat{y}=f(\hat{\theta})\).

For instance in the \(\rm{\Lambda CDM}\) model, there are 6 free cosmological parameters often chosen as \(\Omega=\{\omega_{\mathrm{b}},\omega_{\mathrm{c}},\theta,n_{\rm s}\,\ln(10^{10}A_{\rm s}),\tau\}\) (see hlpTT_bflike_LCDM.par). If instead of the angular size of the sound horizon (\(\theta\)) one uses the Hubble constant (\(H_{0}\)), the minimization leads exactly to the same values for all the other parameters. Furthermore, if the minimization was performed on the \(\Omega\) set, one does not need to redo it with \(H_{0}\). From the invariance property you just need to run the Boltzmann solver on the \(\hat{\Omega}\) solution to obtain the \(\hat{H}_{0}\) best fit solution.

The goal is to measure in the parameter space where the likelihood reaches its maximum value or equivalently the \(\chi^{2}\) its minimum (which is preferred for numerical reasons). As discussed in Sect.\ref{sec:precision} obtaining a precise multi-dimensional minimum (here on 23 parameters but it can be more) is challenging since

- •
we don’t have analytical first order derivatives,

- •
Boltzmann engine computations brings up numerical noise.

Many softwares exists for multi-dimensional optimization.
We chose to work with the MINUIT ^{1}^{1}http://seal.web.cern.ch/seal/work-packages/mathlibs/minuit/index.html package, a renowned
minimizer often used in High Energy Physics. Starting in the 70’s (James et al., 1975), it was
permanently improved and rewritten into C++. This version, named MINUIT2 is embedded within CAMEL.
MINUIT is a large toolbox, including many features (as releasing or constraining parameters on the fly), and working with it requires some degree of experimentation. We tuned it for cosmology with the MIGRAD algorithm that is a ”variable metric method” (one changes the metric of the system not the data, see eg. Davidon et al. (1959)).
In most cases you won’t need to change the default setting.

To run the minimization as a batch job, one uses the `Minimise.sh`

script (note the ”.sh”):
^{2}^{2}Scripts are located under the work/batch directory and we propose
some implementations for cc-inp3 (cc.in2p3.fr) and NERSC (nersc.gov); they should be simple to adapt to other sites.

```
Minimize.sh /path/to/hlpTT_bflike_LCDM.par
```

This

- 1.
checks your configuration,

- 2.
creates a directory named hlpTT_bflike_LCDM_min where you launched the script, which will contain the outputs,

- 3.
sends the (OpenMP) job.

You then wait for the job completion… In our case, the minimization converged in about 3000 iterations, which, given the wall times measured in Sect.\ref{sec:time} correspond typically to 1.5h (on 24 cores) or 4h on 8. This time is only limited by class computations not the minimization algorithm.

One then obtain the job log-file and the bestfit.dat and covmat.data files. You can examine the log-file and not worry too much about class error messages (if they are sparse!). They are often due to the lack of convergence of some numerical algorithms for some ”extreme” combination of parameters: CAMEL catches these errors and the minimizer moves away from those regions.
You should also not worry about such a message:
```
WARNING: Minuit did not converge.
```

which is due, for class, to the numerical discontinuities discussed in Sect \ref{sec:precision}.
Then near the end you will find the interesting results.
```
# ext. || Name || type || Value || Error +/-
0 || omega_b || limited ||0.02221669365664 ||0.0001757676645765
1 || omega_cdm || limited ||0.1193778692005 ||0.001014938068844
2 || 100*theta_s|| limited ||1.041727722902 ||0.000349347668509
3 || tau_reio || limited ||0.07124770721867 ||0.002294613708248
4 || log(10^10A_s)|| limited ||3.068270305092 ||0.00899043822157
5 || n_s || limited ||0.9657674074055 ||0.006549911640786
6 || A_planck || limited ||1.000680632922 ||0.002480642264396
7 || c0 || limited ||0.001487875696301 ||0.001638396021085
8 || c1 || limited ||0.001440151762443 ||0.00163998060017
9 || c2 || const ||0 ||
10 || c3 || limited ||-0.0007967689027367 ||0.00147758139892
11 || c4 || limited ||0.001778114967395 ||0.001762250083468
12 || c5 || limited ||0.001921014220322 ||0.001759624112839
13 || Aps100x100|| limited ||0.000255535488113 ||3.31867430043e-05
14 || Aps100x143|| limited ||0.0001108863113654 ||2.529138206055e-05
15 || Aps100x217|| limited ||7.71374529739e-05 ||1.641760577019e-05
16 || Aps143x143|| limited ||4.429855974166e-05 ||2.030311938827e-05
17 || Aps143x217|| limited ||3.222589327638e-05 ||1.452202004259e-05
18 || Aps217x217|| limited ||7.645366758567e-05 ||1.374833402066e-05
19 || Asz || limited ||1.322514028996 ||0.5973930819106
20 || Acib || limited ||0.9397363536028 ||0.1319397001748
21 || AdustTT || limited ||0.9832158679645 ||0.09576457283827
22 || AdustPP || const ||0 ||
23 || AdustTP || const ||0 ||
24 || Aksz || limited ||1.634684530752 ||5.729577959257
25 || Aszxcib || limited ||8.295195641495e-08 ||8.899445674664
```

which gives the values of the best fit and some (crude) estimate of the ”error bars” from the Hessian matrix (that should not be trusted too much, more in Sect. \ref{sec:hessian}).
Also of uttermost importance is the value of the \(\chi^{2}_{\textrm{min}}\) at this location which appears as
```
minimum function Value: 20448.12723918
```

and at the very end of the file you obtain the breakdown of the individual \(\chi^{2}_{\textrm{min}}\)values and the total wall-time. Here we obtain:
```
===== lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik: chi2=10495
===== HiLLiPOP:DX11dHM_superExt_CO_TT.lal: chi2=9951.47
Chi2 from HiLLiPOP: 9951.47 ndof: 9556
===== CMB_all: chi2=20446.5
===== Gauss1 constraint on AdustTT(1+-0.2): chi2=0.00704268
===== Gauss1 constraint on Acib(1+-0.2): chi2=0.0907927
===== Gauss1 constraint on A_planck(1+-0.0025): chi2=0.0741218
===== Gauss1 constraint on c0(0+-0.002): chi2=0.553444
===== Gauss1 constraint on c1(0+-0.002): chi2=0.518509
===== Gauss1 constraint on c3(0+-0.002): chi2=0.15871
===== Gauss1 constraint on c4(0.0025+-0.002): chi2=0.13028
===== Gauss1 constraint on c5(0.0025+-0.002): chi2=0.0838061
TIMER TOTAL TIME=16268.6 s =271.144 min =4.51907h
===== Chi2Combiner: chi2=20448.1
```

Now for simplicity we included the bestfit.awk script ^{3}^{3}analysis scripts are located in the work/tools directory which parses the whole output file and prints out the interesting things.
Use it as:
```
awk -f tools/awk/bestfit.awk logfile
```

The files bestfit.dat and covmat.dat are also interesting excerpts of the log-file that can be easily read with any interactive analysis software.

It can be wise to test whether one could not get a slightly better (lower) \(\chi^{2}_{\textrm{min}}\) value through a different path (recall the numerical Engine noise may perturb the gradient descent). To this end, we would like to shoot different starting points and see what we obtain. This can be done specifying a job ”range” in the form ”I-J”. For example:
```
Minimize.sh path/to/hlpTT_bflike_LCDM.par 1-10
```

creates the hlpTT_bflike_LCDM_bf/ directory (that contains everything) and sends 10 jobs to the cluster through what is referred to as an ”array-job”.

Each job is independent, so can be run simultaneously with all the others finally increasing little the user wall-time. Job number 1 is the standard minimization described previously. Others will perform some uniform random shuffle of each parameter within the [min,max] bounds of the parameter file. These shots are reproducible in the sense that another ”1-10” run would yield the same configuration. Use ”11-20” if you want 10 more trials…

Now we have the outputs of jobs 1 to 10 in the same directory. Which is the best? the one with the lowest \(\chi^{2}_{\textrm{min}}\).
We provide a script in order to help finding it. Go into hlpTT_bflike_LCDM_bf/
and run
```
awk -f tools/awk/zebest.awk *.o*
```

where ”*.o*” is a pattern of all the log-files files (and may be different at your site). Here is an example of what you can obtain (given the random seed do not expect exactly the same)
```
hlpTT_bflike_LCDM_bf.o26551546.1 chi2=20448.127239 it=2734 t=5.950475s tot=4.519056h errors=2
hlpTT_bflike_LCDM_bf.o26551546.10 chi2=20448.205740 it=4706 t=6.512537s tot=8.513333h errors=13
hlpTT_bflike_LCDM_bf.o26551546.2 chi2=20448.276149 it=3191 t=5.881981s tot=5.213722h errors=15
hlpTT_bflike_LCDM_bf.o26551546.3 chi2=20448.148907 it=5165 t=6.357231s tot=9.120861h errors=20
hlpTT_bflike_LCDM_bf.o26551546.4 chi2=20448.199447 it=3300 t=6.380727s tot=5.849000h errors=56
hlpTT_bflike_LCDM_bf.o26551546.5 chi2=21113.587792 it=1721 t=5.421063s tot=2.591569h errors=56
hlpTT_bflike_LCDM_bf.o26551546.6 chi2=20448.158418 it=3819 t=5.074077s tot=5.382750h errors=57
hlpTT_bflike_LCDM_bf.o26551546.7 chi2=20448.667074 it=2867 t=6.624834s tot=5.275944h errors=86
hlpTT_bflike_LCDM_bf.o26551546.8 chi2=20448.154056 it=3971 t=5.468623s tot=6.032194h errors=86
hlpTT_bflike_LCDM_bf.o26551546.9 chi2=20448.225423 it=2925 t=3.550632s tot=2.884889h errors=87
--------------------------------------------------------
chi2best= 20448.12723918 for hlpTT_bflike_LCDM_bf.o26551546.1
```

For each line you find : its name, \(\chi^{2}_{\textrm{min}}\), number of iterations, time per iteration, total time, number of (possible) class errors (warnings).
So in our case, the best result was indeed for the first job, ie. the one analyzed in Sect. \ref{sec:mini}.
Some ”accidents” may happen (large \(\chi^{2}\) as for the 5th job) where some extreme random starting values did not allow a proper convergence.
But basically we are confident the best fit initially found is indeed the best one.

MINUIT allows to retrieve the Hessian matrix corresponding to the local curvature of the parameters around the minimum.
It is produced when adding a name for the matrix to the arguments of the `Minimize`

executable (which is the default in the `Minimise.sh`

script)
You should be careful when using it since:

- •
it is not very precise due to the numerical noise,

- •
it corresponds to second-order partial derivative matrix and cannot be identified to an ”error matrix” in the case of strongly non linear parameters (James, 1980).

It should therefore only be considered as indicative: the right way to obtain real ”errors” is through the use of the profile-likelihood or MCMC methods. As we will see in Sect \ref{sec:MCMC}, it is however very convenient as the input proposal to the adaptive MCMC algorithm and we will check a posteriori that it was quite precise in this case (Fig. \ref{fig:cors}).

We now address the question of interval estimation for a single parameter (or a pair, rarely more) using the method of profile-likelihoods. Technically it consists in fixing values of a given parameter (let’s call it \(\theta_{i}\)) and performing a minimizations w.r.t all the others. For each \(\theta_{i}\) scanned value, the \(\chi^{2}(\theta_{i})=\chi^{2}_{\textrm{min}}\) value is reported on a graph that is finally offset to 0. We will focus our discussion on 1D profile-likelihoods, ie. when one scans a single parameter, but 2D ones (i.e scanning values over a grid) are also available in CAMEL.

The interest of reconstructing a profile-likelihood stems from the fact that by cutting it, one can reconstruct an \(\alpha\)-level confidence intervals in the frequentist sense: if the experiment is repeated many times the interval covers the true values a fraction \(\alpha\) of the times.
We first consider the case of a Gaussian variable.
Then \(\Delta\chi^{2}\), considered as random variable of the data, is centered on the true value and follows a \(\chi^{2}_{1}\) distribution ^{1}^{1}or \(\chi^{2}_{2}\) for 2 parameters (see eg. Severini (2000)). Then the \(\alpha\)-level interval is obtained by:

We give in Table \ref{tab:cutprof} some classical values of \(\chi_{\alpha}\) for 1 and 2D profile-likelihoods.

dim | 0.68 | 0.95 | 0.99 |
---|---|---|---|

1 par | 1 | 3.84 | 6.63 |

2 par | 2.29 | 5.99 | 9.21 |

Although we built our discussion from the Gaussian case, it actually holds for non-Gaussian cases too (ie. non parabolic \(\Delta\chi^{2}\) curves) and allows to reconstruct asymmetric intervals (w.r.t the MLE). As discussed in James (2007), in the general case one can always imagine a change of variable that would make the profile parabolic. The MLE invariance discussed Sect \ref{sec:MLE} more generally applies to likelihood ratios \(\dfrac{\mathscr{L}(\theta_{i})}{\mathscr{L}_{max}}\) or equivalently \(\chi^{2}(\theta_{i})-\chi^{2}_{\textrm{min}}\). Thus the assertion made on the interval for the transformed variable (\(\Delta\chi^{2}=1\)) also applies to the original variable. Strictly speaking there can be a slight difference (because one transforms the data not the true value) but this only happens for very low statistics (a data samples) which is anyway always difficult to treat and a situation where the MLE is not necessarily optimal.

For completeness, we note that there are some cases where one wants to incorporate some physical bound (as for example a mass being always positive). This is most naturally performed in the Bayesian framework through the use of priors. The ”unified approach” of Feldman et al. (1998) can be applied here and was demonstrated in (Planck Collaboration Int. XVI 2014) and (Henrot-Versillé 2015).

Before showing a concrete example we would like to point out that because of the method, the best fit at the minimum of the \(\Delta\chi^{2}(\theta_{i})\) function is the best-fit of the full set of parameters (the one obtained in Sect.\ref{sec:MLE}). This provides an interesting consistency test and ensures that, unlike in the MCMC approach, there is no difference when quoting the parameter value and the best fit solution.

Building a profile-likelihood is then nothing more than running several minimizations fixing one parameter each time.
Although there exist a CAMEL executable that loops on the variable and perform sequentially minimizations, one can take advantage of batch farms by sending each minimization on a separate worker.
This can be performed with the `Profile.sh`

script.
To use it, one must first write a scan-file which specifies the profiled variable and its values. For instance, in a file `omega_b.scan`

we put ^{1}^{1}this is only illustrative, you rarely need so many points. Note also that since this file is sourced by the script, one can use some bash commands (as loops) to generate these numbers.
```
var="omega_b"
val=(0.022000 0.022020 0.022041 0.022061 0.022082 0.022102 0.022122 0.022143 0.022163 0.022184
0.022204 0.022224 0.022245 0.022265 0.022286 0.022306 0.022327 0.022347 0.022367 0.022388 0.022408
0.022429 0.022449 0.022469 0.022490 0.022510 0.022531 0.022551 0.022571 0.022592 0.022612 0.022633
0.022653 0.022673 0.022694 0.022714 0.022735 0.022755 0.022776 0.022796 0.022816 0.022837 0.022857
0.022878 0.022898 0.022918 0.022939 0.022959 0.022980 0.023000)
```

Then run:
```
Profile.sh path/to/hlpTT_bflike_LCDM.par /path/to/omega_b.scan
```

When all the jobs finished, we concatenate the best-fit files ^{2}^{2}we will see later how to do that. and represent the ”chi2min” vs. ”omega_b” variables reading the corresponding columns of the concatenated file (the profiled variable is always the column before the ”chi2min” one). We obtained the points shown on Fig. \ref{fig:prof_omegab1}.

Obviously there has been a ”glitch” near 0.0227 and some ”wiggles” indicates the fit did not reached exactly its minimum in 3 or 4 places. As for the best fit case (Sect \ref{sec:mini}) to cure that issue we shoot different starting points. This does not need to be re-performed for each point, only the ones that are manifestly incorrect: this is the reason for writing explicitly the scan-file: you may re-edit it and keep only the values that needs to be improved.

Here we re-run (for the sake of illustration) on all the point (ie keep the same scanfile) shooting each time 6 different starting points for minimization:
```
Profile.sh path/to/hlpTT_bflike_LCDM.par /path/to/omega_b.scan 1-6
```

”1-6” means that jobs 1 to 6 will be launched for each scanned value. Each one (except for number 1) will shuffle randomly the starting values of each parameter in their specified [min,max] range. This shot is reproducible, so that if you need more starting points do not use ”1-6” again but something as ”7-10”. This allows to increase gradually the statistics until reaching satisfactory continuity.

The script creates the `hlpTT_bflike_LCDM_mprof_omega_b/`

directory and within it several other ones corresponding to the scanned values (which will contain each the outputs of the 6 jobs).
Let us look at the outputs of all the jobs by concatenating all the `best_fit`

files.
All the points are shown on Fig. \ref{fig:mprof_omegab}.

Some fits did not converge well, but what we are interested in is, for each scanned value, the minimal \(\chi^{2}\) value.
There are in each case our ”best best fit” solution and shown in red on Fig. \ref{fig:mprof_omegab}.
Using them the profile is now nicely continuous.
For convenience, and since it is a common operation, we provide a script (in `work/tools/python`

) to extract these points. Simply run it on the list of sub-directories containing the scanned values. Here we used (in the directory
`hlpTT_bflike_LCDM_mprof_omega_b`

)
```
python tools/python/bestbestfit.py omega_b*
```

where `omega_b*`

is the pattern of the output directories. This creates the `profile_omega_b.txt`

file.

Then you interpolate smoothly between the profile points and determine the intersection with the \(\Delta\chi^{2}=1\) line to obtain the 68% CL interval. You may quote ”error bars” by comparing these limits to the position of the minimum. Be careful however to determine the minimum and limits on the interpolated curve not the data points.
You can code this yourself in your preferred environment or use the `draw_profile.py`

script which produced Fig. \ref{fig:bestprof_omegab} running:
```
python tools/python/draw_profile.py profile_omega_b.txt
```

There are other ways to produce this plot (and others) through the `tools/python/camel.py`

library (see the CAMEL web site for details).

`coprofiles.py`

to produce these plots.
We ran:
```
python tools/python/coprofiles.py profile_omega_b.txt
```

and it produced the output shown on Fig. \ref{fig:coprof_omegab}.

The construction of a profile-likelihood may seem delicate because we entered in many details. In practice you should pay attention to the following points:

- •
always use a precision file (with class). the default

`hpjul2.pre`

should be sufficient in most cases. - •
choose your scanning points carefully. You need to focus only on the \(\Delta\chi^{2}\lesssim 5\) region (which may require some iterations): it is useless (and difficult) to regularize points outside.

- •
although we described the method with many points (50), you don’t need as much to construct a reliable curve (10 to 20 are sufficient).

- •
you should first try with the parameter-file fixed starting point (ie. range 1-1 in

`Profile.sh`

which is the default) - •
then if you notice some continuity problem

- –
if there are little, simply discards those points before performing a smooth interpolation among the remaining ”good” ones.

- –
if there are many, try more starting points on the problematic values

- –
- •
once you are happy with your profile, always check the ”co-profile” to see if some parameters are not blocked by the [min,max] bounds.

We apply this methodology to the remaining 5 cosmological parameters in the file hlpTT_bflike_LCDM.par and obtain the profile-likelihoods shown on Fig. \ref{fig:lcdm_prof}

The Monte-Carlo Markov Chain method draws samples according to any distribution (for a nice introduction see eg. Bardenet (2013)). When applied to a likelihood function, it may serve to reconstruct the unknown parameters distribution (posteriors), a concept not accepted in a classical ”frequentist” perspective. According to Bayes theorem:

\begin{align} p(\theta|x)=\dfrac{p(x|\theta)\pi(x)}{p(x)}\propto\mathscr{L}(\theta|x)\pi(\theta)\\ \end{align}\(\mathscr{L}\) being the likelihood of the data and \(\pi\) the prior distribution that reflects the degree of belief on the parameter before considering the experiment. The latter is rarely discussed (ie. considered as equal to 1 in some reasonable range although this is not necessarily the most non-informative prior (Jeffreys, 1983)) in which case the likelihood of the parameters equals their probability. Sampling from the likelihood then gives directly the parameters posterior distribution.

The founding father of all MCMC algorithms is the Metropolis one (Metropolis et al., 1949; Metropolis et al., 1953). Its mathematical properties have been studied for decades and it is certainly the best characterized one (Roberts et al., 1998; Gilks et al., 1996). Although it is mathematically guaranteed to converge to the proper distribution at infinity, this is often too far in real case studies, in particular in cosmology, where Boltzmann solver computations are slow. In practice it means that the proposal distribution used to perform the random jumps, generally a central multi-variate Gaussian with some covariance matrix, must be known precisely. This matrix is often complicated because of important off-diagonal elements that can hardly be a priori inferred (see eg. Fig. \ref{fig:cors}): not incorporating them will not allow a proper sampling of the likelihood on a reasonable timescale. Therefore the matrix is reconstructed iteratively from the empirical covariance of the samples but it is a very time-consuming process. Reproducing results when the proposal is given is simple but hides the lengthy previous steps. A soon as one want to study new variables/likelihoods the proposal matrix must be rebuild again.

We have therefore implemented in CAMEL an adaptive scheme of the Metropolis algorithm. In this approach the proposal covariance matrix is built on the fly: the adaptation level is gradually decreased so that one ends up with a fixed matrix and then follows the classical Metropolis algorithm. The initial matrix does not need to be known precisely and can be obtained in 2 ways:

- 1.
one may use the Hessian matrix obtained from minimization (Sect \ref{sec:hessian}).

- 2.
often one just want to expand a set of parameters, and already knows from the empirical covariance a precise matrix of a subset of parameters: then just add a crude estimate of the new parameters errors on the diagonal and let the algorithm do the work.

The original algorithm (Haario et al., 2001) works in the following way:

- 1.
Choose a starting point \(X_{0}\), a starting covariance matrix \(\Sigma_{0}\) and a tuning parameter \(c\);

- 2.
generate a value \(Y\) from a proposal density \(N(X_{t-1},c\Sigma)\);

- 3.
evaluate the test ratio \(\alpha=min[1,\frac{\pi(Y)}{\pi(X)}]\);

- 4.
generate a value \(u\) uniformly distributed in \([0,1]\);

- 5.
if \(u\leq\alpha(X_{t},Y)\) set \(X_{t+1}=Y\) else set \(X_{t+1}=X_{t}\);

- 6.
update running mean and covariance:

\begin{equation} \mu_{t}=\mu_{t-1}+\frac{1}{t}(X_{t}-\mu_{t-1}),~{}~{}~{}\Sigma_{t}=\Sigma_{t-1}+\frac{1}{t}((X_{t}-\mu_{t})(X_{t}-\mu_{t})^{T}-\Sigma_{t-1})\\ \end{equation} - 7.
Increment \(t\).

In practice, this algorithm requires developments in order to be used in non-trivial conditions. If we try to start the adaptation from the very first steps, the correction we make to the initial guess matrix \(\Sigma_{0}\) is too big and numerically unstable. We thus introduced the parameter \(t_{0}\), that can be tuned, and that represents the number of steps we wait before starting the adaptation. We use as default \(t_{0}=2000\) steps which gave us good results in all the cases we studied (from N=20 to 50 parameters). If you are pretty sure of your initial matrix you may lower this number. Anyway the algorithm is very weakly dependent on this choice. In the \(t\leq t_{0}\) phase the chain starts exploring the parameter space, and it is not essential to have a very reliable first estimate for the covariance matrix; it is preferable to have a high acceptance rate and underestimate the optimal step than to risk remaining stuck at the starting point. The scale \(c\) in the proposal is thus chosen to be quite small (default 0.001)

Then at step \(t_{0}\) we calculate the sample variance of the chain. The estimate of the covariance between the parameter \(j\) and the parameter \(k\) is simply

\begin{equation} \Sigma_{jk}=\frac{1}{t_{0}-1}\sum_{i=1}^{t_{0}}(X_{ij}-\bar{X}_{j})(X_{ik}-\bar{X}_{k}).\\ \end{equation}This reconstructed matrix is a fair first guess of the covariance of the posterior. Hence, the scale \(c\) is now set to the optimal value (for Gaussians) of \(2.4^{2}/d\) where \(d\) is the number of parameters (Gelman et al., 1996; Dunkley et al., 2005) and we start running the matrix adaptation.

Since the likelihood is not necessarily Gaussian and the matrix not perfect (otherwise we won’t need to run!), we introduced another refinement that concerns the scale parameter \(c\). At a time \(t_{s}\) we begin to adapt the scaling parameter according to

\begin{equation} \label{eq:scale_ad} \label{eq:scale_ad}c_{t+1}=c_{t}\cdot(1-\frac{1}{t})+\frac{1}{t}(\mathrm{a.r.}-0.25)\\ \end{equation}where ”a.r” is the acceptance rate (ie. the number of accepted moves over the total) and is computed with the previous 100 samples. The purpose of this stage is to tune the scale parameter \(c\) in order to reach an a.r of about 0.25, a classical ”rule of thumb” for the Metropolis algorithm, and that gave indeed in our experience the best results. The \(t_{s}\) default in CAMEL is 10000 samples. You may change this number (with counts steps after \(t_{0}\)) but note that the \(\frac{1}{t}\) factors in Eq. \ref{eq:scale_ad} makes the scale factor converge rapidly to a constant and we end up with a classical Metropolis-Hastings algorithm.

Keeping memory of all its history, the process is no more Markovian. So, does the adaptive algorithm have the correct ergodicity properties? It is indeed mathematically difficult to deal with non basic MCMC methods, and a lot of the algorithm used by the scientific community are not assured to have the right properties for convergence. However in the Adaptative-Metropolis case, the convergence was proven (Andrieu et al., 2006) because the asymptotic dependency between the elements of the chain is weak enough to apply large number theorems.

Let us see how to use the AM. One needs to specify a few extra information to the parameter file. Most have correct default values but you should at least define the number of samples (”length”) and the path to the initial proposal covariance matrix (”proposal_cov”):

length=100000 # specify the chain length

bunchSize=1000 # the chain is dumped every bunchSize

algo=ada # default algorithm (or ”metro” for standard Metropolis)

proposal_cov=/path/to/your/covmatrix.dat # path to your first guess covariance matrix

t0=2000 # number of steps for initial cov construction

ts=10000 # number of samples over t0 before starting the scale adapation

scale=0.001 # initials scaling (shoud be low)

#seed=12345 # seed for the random number generator for proposal: if not defined drawn randomly

#seedU=78776 # seed for uniform shots in algorithm: if not defined drawn randomly

Some important remarks

- •
the program will dump the data into a specified text file every

`bunchSize`

iterations. This is interesting for running remotely (as in batches) and flushing the result through the network without too intensive IOs. You can then investigate results without waiting the end of the run. - •
there are 2 seeds for the random number generators. If you fix these numbers you will recover the same results which is probably not what you want if you run multiple chains (but is interesting for reproducibility). You can then implement some logic to draw some random numbers for these seeds, or do not specify them at all: then they will be drawn randomly from the machine environment.

- •
the covariance matrix should be given as a text file. You can (should?) use the Hessian matrix of the minimization (see Sect.\ref{sec:hessian}). If not specified a diagonal matrix will be built from the parameter file ”errors” but it is probably a bad idea to do so…

- •
because of the hopping nature of the algorithm, it is unnecessary (and CPU consuming) to use a high precision file. Don’t put any

`precisionFile`

line in your parameter file: this will use class default settings which are sufficient.

If you are using a full Boltzmann solver (as class) you may also want to add to the output some derived parameters which are computed at each sampled step. The list of them can found in the `src/camel/Engine.hh::get`

method. You add to your parameter file some lines as:

(note that you need to activate the ”do_mPk” flag for sigma8 (which turns on the matter power spectrum computations)

You may also add some redshift dependent values as

The last line is here to ask the exact computation of P(k) at z=0 and 0.57.

In order to use a script to submit jobs with AM, just put the parameters discussed in this section in some text file (for instance mcsetup.txt and run:
```
runMCMC.sh /path/to/parfile mcsetup.txt
```

This will launch 4 openMP jobs (you can change this number by redefining the NCHAINS environment variable).
The chains will be written as plain ASCII files under the name `samples*.txt`

in the directory created.

We discuss here the question of MCMC convergence. We illustrate it with the chains produced with the hlpTT_bflike_LCDM.par run and with the settings discussed in the previous part. Note that we used directly for the initial covariance matrix the Hessian obtained in Sect \ref{sec:mini}. We ran 4 chains of 150000 samples each which produced 4 output files.

There exists no exact mathematical way to determine when (if) the chain reached ”convergence”, ie. entered a regime where it samples correctly the likelihood (in all dimensions). But there are some tricks and we discuss here two classical ones.

The first thing you should look at is simply each variable evolution (”trace plots”). This is shown below for one chain (obviously all should be looked at).

In this exercise for example, the ”Aszxcib” variable began to be widely explored only lately (after \(\simeq\)50000 samples). This impacted the ”Acib and all the ”Aps” variables evolution.

This kind of plot only tests the stationarity of the chains. The real challenge however is to reach ergodicity (which requires stationarity) meaning that the samples can be used to compute statistical expectation values (ie. ”time” integrals can be used to replace ”ensemble” averages) This is the idea behind the Gelman-Rubin (GR )test (Gelman 1992) which is run on a few chains and compares the inter- and cross- variance of the samples. It produces \(R_{i}\) ,the ”potential scale reduction factor” for each variable and the usual prescription is to assume that the variables reached ergodicity when all \(R_{i}-1\leq 0.03\) (but you may feel more comfortable with 0.01).

The result in our example, as implemented in CAMEL in python or idl, is shown on Fig. \ref{fig:GR}.

The GR-test confirms that ”Aszxcib” began to be sampled correctly only after \(\gtrsim 80000\) steps. Once its sampling begun (\(\gtrsim 50000\)) it indeed affected the ”Aps” and ”Acib” variables which is reflected in some increase of their R values before finally converging to low values.

While a number of other tests exists (we also implemented the Geweke one), GR produces generally satisfactory results (for simple posteriors).

Given the GR-test, we decided to keep samples above 100000 steps and concatenate them into a single ”well-converged” chain.

1D posterior distributions are then obtained by histograming (and possibly smoothing) each variable individually . They are shown on Fig. \ref{fig:post1}. For further discussion, we also superimpose the best-fit values with Gaussian errors taken from the diagonal of the Hessian matrix.

Let us now study the correlations among the parameters. One can compute the empirical covariance matrix between the estimated parameters

\begin{align} \hat{C}=\dfrac{1}{N-1}\sum_{i=1}^{N}\Delta_{i}^{T}\Delta_{i}\\ \end{align}where the \(i^{th}\) realization deviation vector is \(\Delta_{i}\equiv\vec{p}_{i}-\hat{\mu}\), \(\vec{p}\) being the vector of parameters and \(\hat{\mu}\) its statistical mean. We convert it to a correlation matrix and show it on the left of Fig. \ref{fig:cors} (we drop the ”H0” and ”sigma8” parameters which are derived parameters).

`camel.py`

library or through the GetDist package (that can read CAMEL’s chains) and is shown on Fig. \ref{fig:triangle}.

We can now check if in this working case both statistical methodologies lead to the same intervals. We show on Fig \ref{fig:prof_vs_mc} the comparison between the 1D posteriors and the profile-likelihoods of the cosmological parameters as obtained in the previous sections. For this latter and the sake comparison, we represent \(e^{-\Delta\chi^{2}/2}\) and the \(\Delta\chi^{2}=1\) cut translates into a \(e^{-1/2}\simeq 0.6\) one.

We therefore see that for \(\rm{\Lambda CDM}\) using the Planck Hillipop+bflike likelihoods both the Bayesian and frequentist approaches give very similar results on cosmological parameters, a result coarsely shown with other Planck likelihoods (Planck Collaboration Int. XVI, 2014; Planck Collaboration XIII, 2015).

One should not conclude however it is a general statement: it happens here because the data constrain well all the parameters, in particular the use of the low-\(\ell\) likelihood breaks the \((\tau,\ln(10^{10}A_{\rm s}))\) degeneracy from the high-\(\ell\) part. The posteriors are close to Gaussian so that their \(1\sigma\) interval corresponds neatly to the one obtained with the profile-likelihoods cut at \(e^{-1/2}\). More generally, for a single parameter, the Bayesian credible interval will be a genuine ”confidence level” if and only if the parameter is a ”location” one, ie. \(\mathscr{L}(x;\theta)=\mathscr{L}(x-\theta)\) or some transformation of the likelihood leads to it (Porter, 1996). In the multi-dimensional case, some differences may appear due to the Bayesian marginalization. This is called the ”volume effect” and is discussed next.

In order to illustrate the difference between the profile-likelihoods and posterior distributions with several parameters, we begin with a toy example. Suppose the likelihood has 2 parameters (\(x_{1},x_{2})\) and that its shape is of the type shown on the upper plot of Fig. \ref{fig:toy}. Then the marginalized posterior for \(x_{1}\) reads

\begin{align} p(x_{1})\propto\int\mathscr{L}(x_{1},x_{2})dx_{2},\\ \end{align}while the profile-likelihood is

\begin{align} \mathscr{L}(x_{1})\propto\max_{x_{2}}\mathscr{L}(x_{1},x_{2}).\\ \end{align}One can then easily understand why the maximum of the profile-likelihood lies at the same position than the 2D maximum. This is not necessarily the case for the marginalized posterior where some part of the ”volume” of the 2D likelihood may shadow the projection. Indeed these methods answer different questions: in the case of the posterior ”what is the value of \(x_{1}\) given the possible \(x_{2}\) values?”, while for profile-likelihood, ”what is the value \(x_{1}\) for the most likely \(x_{2}\) values?”.

We now give an example where this situation happens. It was shown in Couchot et al. (2015) that when removing the low-\(\ell\) part of the Planck baseline likelihoods and keeping only the high-\(\ell\) one (named Plik), the profile-likelihood method leads to an optical reionization depth of:

\begin{align} \label{eq:profHFI} \label{eq:profHFI}\hat{\tau}=0.17\pm 0.04~{}(68\%,\text{Planck-high-$\ell$/profile-likelihood}).\\ \end{align}The \(\tau\) posterior distribution, obtained by running the AM method, is shifted to lower values is as shown on Fig.\ref{fig:likvspost}.
The peak of the distribution (the ”mode”) is located around 0.15 , ^{1}^{1}and even slightly less for the mean and median.
By integrating the tails, one can reconstruct the 68% central credible interval, which gives \([0.11,0.18]\) and taking the mode as the best estimate, we obtain:

The choice between both methodologies can play a role.
For instance, from an LFI-based low-\(\ell\) analysis (Planck Collaboration XI, 2015), Planck reports a value of ^{2}^{2} This result is independent of the methodology used: \(\hat{\tau}=0.067^{+0.023}_{-0.021}(68\%,\text{Planck-LFI low-$\ell$})\).
It is in \(2.2\sigma\) tension with Eq.\ref{eq:profHFI} but \(1.8\sigma\) with Eq.\ref{eq:credible}.
Not over-emphasizing the \(2\sigma\) boundary, the use of the Bayesian result may have hidden this tension that is now increased
by the new HFI-based result (Planck Collaboration Int. XLVII, 2016): \(\hat{\tau}=0.058\pm 0.012,(68\%,\text{Planck-HFI low-$\ell$})\) to respectively \(2.6\sigma\) (profile) and \(2.2\sigma\) (posterior).

We have seen how comparing different statistical approaches can be fruitful in cosmological analyzes. A possible workflow can be:

- 1.
first run an accurate Minimization with MINUIT with a few starting points. Obtain the best-fit and the Hessian matrix.

- 2.
use the Hessian as the starting matrix to the proposal of the Adaptive Metroplis MCMC algorithm.

- 3.
if some posterior modes differ significantly from the best-fit value, run single profile-likelihoods on those parameters.

- 4.
if some parameters are of particular importance (as \((w_{0},w_{a})\) in future surveys

^{1}^{1}2D profile-likelihoods are also implemented), always run a profile-likelihood on them to verify the influence of priors and volume effects in the MCMC run.

CAMEL provides the tools to perform such rich studies using the very same input file. See the camel.inp3.fr web site for more information. The software is collaborative and contributions are most welcome.

The following file shows the complete list of likelihoods with their associated nuisance parameters. The Engine used is class.

##############################################################

#Boltzman Engine

engine=class

#class setup

class N_ncdm 1

class m_ncdm 0.06

class N_eff 2.046

class k_pivot 0.05

class lensing yes

class sBBN\ file bbn/sBBN.dat

precisionFile=class_pre/hpjul2.pre

###############################################################

###############################################################

#Cosmological model

par omega_b cosm 0.02224 0.00027 0.017 0.027

par omega_cdm cosm 0.1192 0.0026 0.09 0.15

par 100*theta_s cosm 1.0418 0.6E-04 1.03 1.05

par tau_reio cosm 0.07 0.13E-01 0.01 0.20

par log(10^10A_s) cosm 3.07 0.025 2.7 3.5

par n_s cosm 0.96 0.0070 0.9 1.1

###############################################################

###############################################################

#LIKELIHOODS

#—————————–

#CMB low-l (BFLIKE)

clikfile=planck_data/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik

#—————————–

#—————————–

#CMB high-l (HILLIPOP)

HiLLiPOP=HiLLiPOP/DX11dHM_superExt_CO_TT.lik

par A_planck nui 1 0.001 0.9 1.1

par c0 nui 0. 0.001 -0.05 0.05

par c1 nui 0. 0.001 -0.05 0.05

fix c2 nui 0. 0.001 -0.05 0.05

par c3 nui 0. 0.001 -0.05 0.05

par c4 nui 0.004 0.001 -0.05 0.05

par c5 nui 0.004 0.001 -0.05 0.05

par Aps100x100 nui 2.5E-04 1E-05 0.0 0.1

par Aps100x143 nui 1.1E-04 7E-06 0.0 0.1

par Aps100x217 nui 9.9E-05 6E-06 0.0 0.1

par Aps143x143 nui 4.7E-05 2E-06 0.0 0.1

par Aps143x217 nui 3.1E-05 3E-06 0.0 0.1

par Aps217x217 nui 7.6E-05 6E-06 0.0 0.1

par Asz nui 1 0.1 0.0 10

par Acib nui 1. 0.1 0.0 10

par AdustTT nui 1 0.1 0.0 2

fix AdustPP nui 0.00 0.1 0.0 2

fix AdustTP nui 0.00 0.1 0.0 2

par Aksz nui 0.00 1.0 0.0 10

par Aszxcib nui 0.00 1.0 0.0 10

#PRIORS

gauss1 AdustTT 1 0.2

gauss1 Acib 1 0.2

gauss1 A_planck 1 0.0025

gauss1 c0 0 2e-3

gauss1 c1 0 2e-3

gauss1 c3 0 2e-3

gauss1 c4 0.002 2e-3

gauss1 c5 0.002 2e-3

#—————————–

#—————————–

#CMB Very-high-l (SPT)

SPT_High=HighEll/SPT_high_2014.lik

par SPT_high_95_cal nui 0.9961 0.002 0.9 1.1

par SPT_high_150_cal nui 1.002 0.002 0.9 1.1

par SPT_high_220_cal nui 1.015 0.002 0.9 1.1

par SPT_high_Aps_95x95 nui 7.425 0.01 0.1 50

par SPT_high_Aps_95x150 nui 5.147 0.01 0.1 50

par SPT_high_Aps_95x220 nui 8.8 0.01 0.1 50

par SPT_high_Aps_150x150 nui 6.649 0.01 0.2 50

par SPT_high_Aps_150x220 nui 14.15 0.01 1.5 50

par SPT_high_Aps_220x220 nui 36.07 0.01 3 200

fix SPT_ADust nui 1

#PRIORS

gauss1 SPT_high_95_cal 1.01 .01

gauss1 SPT_high_150_cal 1.01 .01

gauss1 SPT_high_220_cal 1.01 .02

#—————————–

#—————————–

#CMB Very-high-l (SPT-low)

SPT_Low = HighEll/SPT_low.lik

par SPT_low_Aps nui 20.32 0.01 1. 60

par SPT_low_cal nui 1.00 0.01 0 2

#—————————–

#—————————–

#CMB Very-high-l (ACT)

ACT_equat=HighEll/ACT_equat.lik

par ACT_equat_148_cal nui 0.9991 0.002 0.9 1.1

par ACT_equat_220_cal nui 1.013 0.002 0.9 1.1

par ACT_equat_ADust nui 1.719 0.01 0.05 10

par ACT_equat_Aps_148x148 nui 7.159 0.01 0.1 50

par ACT_equat_Aps_148x220 nui 20 0.01 0.1 50

par ACT_equat_Aps_220x220 nui 60 0.01 10 150

ACT_south=HighEll/ACT_south.lik

par ACT_south_148_cal nui 1.007 0.002 0.9 1.1

par ACT_south_220_cal nui 1.032 0.002 0.9 1.1

par ACT_south_ADust nui 1.3 0.01 0.05 10

par ACT_south_Aps_148x148 nui 9 0.01 0.1 50

par ACT_south_Aps_148x220 nui 16.29 0.01 0.1 50

par ACT_south_