INTRODUCTION
The predictive accuracy of the Earth System Model (ESM) relies on a series of differential equations that are often sensitive to their initial conditions. Even a small error in estimates of their initial conditions can lead to large forecast uncertainties. In short-to-medium range forecast systems, an open-loop run of coupled land and weather models often diverges from the true states and show low forecast skills as the error keeps accumulating over time \citep{charney1951dynamic,kalnay20074}. To extend the forecast lead time, the science of data assimilation (DA) attempts to use the information content of the observations for improved estimates of the ESMs initial conditions thus reducing their forecast uncertainties \citep{leith1993numerical,kalnay2003atmospheric}. The DA methods often involve iterative cycles at which the {\it observations} are optimally integrated with the previous time forecasts ({\it background states}) to obtain an {\it a posteriori} estimate of the initial conditions ({\it analysis state}) with reduced uncertainty in a Bayesian setting \citep{rabier2005overview,asch2016data}. In the literature, two major categories of DA methodologies exist, namely \blue{filtering} and variational methods \citep{law2012evaluating}. Advanced approaches such as hybrid DA schemes are also being developed, which aim to combine and take advantage of unique benefits of the two DA classes \citep{wang2008hybrid,lorenc2015comparison}.
Although, both classic DA approaches have been widely used in land and weather forecast systems, they are often based on a strict underlying assumption that the error is drawn from a zero-mean Gaussian distribution, which is not always realistic. Bias exists in land-atmosphere models mainly due to under-representation of the governing laws of physics and erroneous boundary conditions. Observation bias also exists largely due to systematic errors in sensing systems and retrieval algorithms represented by the observation operator in DA systems \citep{dee2003detection,dee2005bias}. From a mathematical point of view, bias is simply the expected value of the error that can be removed if the ground truth of the process was known, which is not often feasible in reality. The problem is often exacerbated due to difficulty in attribution of the bias to either model or observations and/or both. \blue{At the same time, point-scale observations such as those from {\it in-situ} gauges and radiosondes are often considered to be more close to the ground truth, however, their assimilation into gridded model outputs is not straightforward due to the existing scale gaps.}
Bias correction strategies in DA systems mainly fall under two general categories: (i) dynamic bias-aware and (ii) re-scaling bias correction schemes. \orange{Apart from these two general categories, machine learning techniques have also been developed to learn relationships between observations and ancillary variables for bias correction \citep{jin2019machine}. The dynamic bias-aware schemes make prior assumptions about the nature of the bias and attribute it either to model or observations which may not be realistic as both models and observations suffer from systematic errors. Early attempts to dynamically treat model biases are based on a two-step bias estimation and correction approach, which is applied prior to the analysis step \citep{dee1998data,radakovich2001results,chepurin2005forecast}. Variants of bias-aware Kalman filter for colored noise \citep{drecourt2006bias} and a weak constrained four-dimensional VDA (4D-Var) \citep{zupanski1997general} have also been proposed to account for non-zero mean model errors. At the same time, there exists another class of dynamic observational bias correction techniques that rely on variants of variational bias-correction (VarBC) method, which makes an {\it a priori} estimate of bias and dynamically update it using innovation information \citep{auligne2007adaptive,dee2009variational,zhu2014enhanced}. Apart from VarBC, more recently, a new approach is proposed to treat observation biases by iterative updates of the observation operator \citep{hamilton2019correcting}. A body of research also has been devoted for simultaneously treating model and observation biases using multi-stage hybrid filtering technique \citep{pauwels2013simultaneous}. However, the above schemes still lack the ability to leverage climatologically unbiased information from reference observations (e.g., {\it in situ} data) and has not yet been tested for effective bias correction in chaotic systems. More importantly, the developed schemes largely focus to retrieve an unbiased expected value of the forecast and remain limited to characterization of the second-order forecast uncertainty.}
The re-scaling techniques do not make any explicit assumptions about relative accuracy of the model and observation system \citep{reichle2004bias,crow2005relevance,reichle2007comparison,kumar2009role,reichle2010assimilation,liu2018contributions}. This family of methods often involves mapping the observations onto the model space by matching their Cumulative Distribution Function (CDF). While the CDF-matching technique is comparatively easier in implementation than the dynamic approach and prevents any numerical instabilities in model simulations, it implicitly assumes that model forecasts are unbiased and partly ignores the information content of observations. For example, if our observations were less biased than the model outputs, this approach basically fails to effectively remove the bias. Furthermore, it is a static scheme and there exists no formal way to extend CDF-matching scheme to dynamically account for changes in the bias \citep{kumar2012comparison} and its seasonality \citep{de2016soil}.
Conceptually, CDF-matching techniques move probability masses from one distribution to another. To transform the static CDF-matching to a dynamic scheme, there are two key questions that we aim to answer: Can we quantify the movement of probability masses as a cost through a convex metric? How can this cost be employed to assimilate relatively unbiased {\it in situ} data for dynamic bias correction in the VDA framework?
The Wasserstein metric (WM) \citep{villani2008optimal,santambrogio2015optimal}, also known as the Earth Mover's distance \citep{rubner2000earth} stems from the theory of optimal mass transport (OMT) \citep{monge1781memoire,kantorovich1942translocation,villani2003topics}, which provides a concrete ground to compare probability measures \citep{brenier1991polar,gangbo1996geometry,benamou2015iterative,chen2017matrix,chen2018optimal,chen2018wasserstein}. Specifically, this distance metric can quantify the dissimilarity between two probability histograms in terms of the amount of ``work'' done during displacement of probability masses between them. Thus, we hypothesize that inclusion of such metric in the VDA cost function can reduce analysis biases. The rationale is that the work done during displacement of the probability masses, is not only a function of the shape of probability histograms but also the difference between their central positions, \blue{as described in Section \ref{sec:OMT}}. The use of the Wasserstein metric in DA has been explored previously \citep{ning2014coping,feyeux2018optimal}. \citet{ning2014coping} introduced the concept of OMT in classical VDA framework and demonstrated that the bias in the background state results in an unrealistic bimodal distribution of the analysis state. However, the study was conducted on linear systems only for model bias correction without accounting for any form of observation biases. \citet{feyeux2018optimal} proposed to fully replace the quadratic costs in the classic VDA by the Wasserstein metric. Even though, the latter approach extends the classic VDA beyond a minimum mean squared error approximation, it does not provide any road map for bias correction, which is the central focus of this paper.
This paper presents a new VDA approach through regularizing the classic VDA problem with cost associated with the Wasserstein metric, hereafter referred to as the Wasserstein Metric VDA (WM-VDA). Unlike previous VDA techniques, WM-VDA treats unknown biases of different magnitudes and signs both in the model dynamics and observations. To that end, WM-VDA needs to be informed by an {\it a priori} reference distribution or histogram (e.g., from {\it in situ} data) that encodes the space-time variability of the state variables of interest in probability domain. This {\it a priori} histogram must be less biased but could exhibit larger higher-order uncertainties than the observations and model forecasts. \blue{More importantly, unlike classic DA methods, the WM-VDA allows full recovery of the probability histogram of the analysis state in the probability domain, which can lead to forecast uncertainty quantification beyond second-order statistics}. The idea is tested on a first-order linear dynamical system as a test-bed and the chaotic Lorenz-63 \citep{lorenz1963deterministic} attractor that represents nonlinear dynamics of convective circulation in a shallow fluid layer. The results demonstrate that the presented approach is capable in preserving the geometric shape of the distribution of analysis state when both the background state and observations are systematically biased and extend the forecast skills by controlling the propagation of bias in the phase space of a highly chaotic system.
The paper is organized as follows: Section 2 discusses the concept of classic VDA focusing on the 3D-Var. In this section, a summary on the theory of OMT and the Wasserstein metric is also provided. The mathematical formulation of the proposed WM-VDA is explained in Section 3. Section 4 implements the WM-VDA on a first-order linear and the nonlinear Lorenz-63 dynamic systems. The results are interpreted and compared with the 3D-Var and \orange{CDF-matching technique}. Summary and concluding remarks are presented in Section 5.\ref{730632}