\documentclass{article}
\usepackage[affil-it]{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
\usepackage{url}
\usepackage{hyperref}
\hypersetup{colorlinks=false,pdfborder={0 0 0}}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\DeclareMathOperator*{\argmin}{\arg\!\min}
\begin{document}
\title{Manifold Warping: Manifold Alignment Over Time}
\author{CJ Carey}
\affil{Affiliation not available}
\date{\today}
\maketitle
\selectlanguage{english}
\begin{abstract}
Knowledge transfer is computationally challenging, due in part to the curse of dimensionality, compounded by source and target domains expressed using different features (e.g., documents written in different languages).
Recent work on manifold learning has shown that data collected in real-world settings often have high-dimensional representations, but lie on low-dimensional manifolds.
Furthermore, data sets collected from similar generating processes often present different high-dimensional views, even though their underlying manifolds are similar.
The ability to align these data sets and extract this common structure is critical for many transfer learning tasks.
In this paper, we present a novel framework for aligning two sequentially-ordered data sets, taking advantage of a shared low-dimensional manifold representation.
Our approach combines traditional manifold alignment and dynamic time warping algorithms using alternating projections.
We also show that the previously-proposed canonical time warping algorithm is a special case of our approach.
We provide a theoretical formulation as well as experimental results on synthetic and real-world data, comparing manifold warping to other alignment methods.%
\end{abstract}%
\section{Introduction}
The advent of large, often high-dimensional, digital data sets has made automated knowledge extraction a critical research focus in the field of machine learning.
Often, we find real-world sequential data sets that encode the same information with disparate surface feature representations, such as sensor network data, activity and object recognition corpora, and audio/video streams.
In these cases, an automated technique for discovering correlations between sets will allow easy transfer of knowledge from one domain to another, avoiding costly or infeasible re-learning.
In this paper, we present a framework that combines manifold alignment \cite{ham2003MA,wang2009generalMAframework} and dynamic time warping (DTW) \cite{sakoe1978dtw} for aligning two such sequential data sets.
We also show that the previously proposed method of canonical time warping (CTW) \cite{zhou2009ctw} is a special case of our approach.
Dynamic time warping has been used effectively for time-series alignment,
but it requires an inter-set distance function,
which usually implies that both input data sets must have the same dimensionality.
DTW may also fail under arbitrary affine transformations of one or both inputs.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/CTWillustration/CTWillustration}
\caption{{\label{fig:CTWillustration} An illustration of two sinusoidal curves before and after applying CTW.%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/SyntheticSineCurves/SyntheticSineCurves}
\caption{{\label{fig:NonlinearManifoldIllustration} Sinusoidal curves on a plane and on a Swiss roll.%
}}
\end{center}
\end{figure}
CTW aims to solve these two problems by alternating between canonical correlation analysis (CCA) \cite{anderson2003cca} and DTW until convergence, as illustrated in Figure \ref{fig:CTWillustration}.
In the case where inputs are of different dimensions, CTW first projects both data sets into a shared space using principal component analysis (PCA) \cite{jolliffe2002pca}.
The algorithm does not always converge to a global optimum, but CTW still improves the performance of the alignment when compared to applying DTW directly.
However, CTW fails when the two related data sets require nonlinear transformations to uncover the shared manifold space.
We illustrate such a case in Figure \ref{fig:NonlinearManifoldIllustration}, in which two sequential data sets are two $sin(x)$ curves; one lying on a plane and the other on a Swiss roll.
In this case, the CCA projection that CTW relies on will fail to unroll the second curve, making a good DTW alignment impossible.
We first provide a brief overview of manifold alignment and dynamic time warping before combining the two to formulate manifold warping.
\section{Manifold Alignment}
In manifold alignment \cite{wang2009generalMAframework}, we are given two data sets $X \in \mathbb{R}^{n_{X} \times d_{X} }$ and $Y \in \mathbb{R}^{n_{Y} \times d_{Y}}$ where $\mathbb{R}^{m \times n}$ denotes a $m$ by $n$ real matrix.
In $X$ and $Y$, each row is an \emph{instance}.
$W^{(X)} \in \mathbb{R}^{n_{X} \times n_{X} }$ and $W^{(Y)} \in \mathbb{R}^{n_{Y} \times n_{Y}}$ are similarity matrices that provide the similarities between instances in $X$ and $Y$, respectively.
These two matrices are usually constructed as adjacency matrices of nearest neighbor graphs, optionally applying a heat kernel function.
In addition, we also have a warping matrix $W^{(X,Y)} \in \mathbb{R}^{n_{X} \times n_{Y}}$ that specifies the correspondences between instances in X and Y.
Typically,
\begin{equation}
W^{(X,Y)}_{i,j}=\left\{ \begin{array}{ll}
1 & \mbox{if $X_{i}$ corresponds to $Y_{j}$} \\
0 & \mbox{otherwise}
\end{array}
\right.
\end{equation}
Suppose we have a mapping that maps $X,Y$ to $F^{(X)} \in \mathbb{R}^{n_{X}\times d},F^{(Y)}\in \mathbb{R}^{n_{Y}\times d}$ in a latent space with dimension $d\leq min(d_{x},d_{y})$.
Note that in terms of the underlying matrix representation, any row $i$ of $X$ is mapped to row $i$ of $F^{(X)}$, and a similar relation holds for $Y$ and $F^{(Y)}$.
We form the following loss function for the mapping as follows.
The first term indicates that corresponding points across data sets should remain close to each other in the embedding.
The last two terms specify that, within an input set, points close in the original space should remain close in the embedding.
The factor $\mu$ controls how much we want to preserve inter-set correspondences versus local geometry.
\begin{eqnarray}
L_1\left(F^{(X)},F^{(Y)}\right) =
\mu\sum_{i \in X, j \in Y} {||F^{(X)}_{i}-F^{(Y)}_{j}||^2 W^{(X,Y)}_{i,j}} \nonumber \\
+ (1-\mu) \sum_{i,j \in X} {||F^{(X)}_{i} - F^{(X)}_{j}||^2 W^{(X)}_{i,j} } \nonumber \\
+ (1-\mu) \sum_{i,j \in Y} {||F^{(Y)}_{i} - F^{(Y)}_{j}||^2 W^{(Y)}_{i,j} }
\end{eqnarray}
The notation $i \in X$ simply means $1 \leq i \leq n_{X}$ . We can combine $W^{(X)}$,$W^{(Y)}$, and $W^{(X,Y)}$ into a joint similarity matrix $W$:
\begin{equation}
W=\left[\begin{array}{cc}
(1-\mu) W^{\left(X\right)} & \mu W^{\left(X,Y\right)}\\
\mu W^{\left(Y,X\right)} & (1-\mu) W^{\left(Y\right)}
\end{array}\right]
\end{equation}
Then, we combine $F^{(X)},F^{(Y)}$ into $F$ where $F= \left[ \begin{array}{l} F^{(X)} \\ F^{(Y)} \end{array} \right].$
Let $F_{i,k}$ denote the element $(i,k)$ of $F$ and $F_{\cdot,k}$ denote the $k$th column of $F$.
Then the loss function can be rewritten:
\begin{eqnarray}
L_{1}(F)= & \displaystyle \sum_{i,j} ||{F_{i}-F_{j}||^{2} W_{i,j} } \nonumber \\
=& \displaystyle \sum_{k} \sum_{i,j} ||{F_{i,k}-F_{j,k}||^{2} W_{i,j} } \nonumber \\
=& 2 \displaystyle \sum_{k} F_{\cdot,k}^{T} L F_{\cdot,k} = 2 tr\left(F^{T}LF\right)
\end{eqnarray}
The optimization problem becomes:
\begin{equation}
\underset{F}{\arg\min} \left(L_{1}\right) = \underset{F}{\arg\min} \left(tr\left(F^{T}LF\right)\right)
\end{equation}
This matches the optimization problem of Laplacian Eigenmaps \cite{belkin2001laplacian_eigenmaps}, except that in this case the similarity matrix is a joint matrix produced from two similarity matrices.
As with Laplacian Eigenmaps, we add a constraint $F^{T}DF=I$ in order to remove an arbitrary scaling factor as well as to avoid a collapse to a subspace with dimension less than $d$.
For example, this constraint prevents the trivial mapping to a single point.
The solution $F=[f_{1}, f_{2}, ..., f_{d} ]$ is given by the $d$ nonzero smallest eigenvectors of the general eigenvalue problem: $Lf_i = \lambda D f_i$ for $i=1,...,d$.
We can also restrict the mapping to be linear by instead solving the optimization problem:
\begin{equation}
\label{eqn:linearmaeqn} \underset{\phi}{\arg\min} \left(tr\left(\phi^{T}V^{T}L\phi V\right)\right) \mbox { subject to $\phi^{T}V^{T}D\phi V=I$}
\end{equation}
\begin{equation}
V = \left( \begin{array}{cc} X & 0 \\ 0 & Y \end{array} \right)
\end{equation}
and $\phi = \left[ \begin{array}{l} \phi^{(X)} \\ \phi^{(Y)} \end{array} \right] $ is the joint projection, $L,D$ are the graph Laplacian and degree matrix of $V$ respectively.
The resultant linear embedding is then $X\phi^{(X)}$ and $Y\phi^{(Y)}$, instead of $F^{(X)}$ and $F^{(Y)}$.
The solution for $\phi = [\phi_{1},\phi_{2},...,\phi_{d}]$ is the $d$ nonzero smallest eigenvectors of the general eigenvalue problem $V^{T}LV\phi_{i}=\lambda V^{T}DV \phi_{i}$ for $i = 0,...,d$.
\section{Dynamic Time Warping}
We are given two sequential data sets $X= [ x_{1}^{T},\ldots,x_{n}^{T} ]^{T} \in \mathbb{R}^{n \times d} $, $Y=[ y_{1}^{T},\ldots,y_{m}^{T}]^{T} \in \mathbb{R}^{m \times d}$ in the same space with a distance function $dist:X \times Y \rightarrow \mathbb{R}$.
Let $P=\{p_1,...,p_s\}$ represent an alignment between $X$ and $Y$,
where each $p_k=(i,j)$ is a pair of indices such that $x_i$ corresponds with $y_j$.
Since the alignment is restricted to sequentially-ordered data, we impose the additional constraints:
\begin{eqnarray}
p_{1} & = & (1,1) \label{eq:Wcontraint1} \\
p_{s} & = & (n,m) \label{eq:Wcontraint2} \\
p_{k+1}-p_{k} & = & (1,0)\: or\:(0,1)\: or\:(1,1) \label{eq:Wcontraint3}
\end{eqnarray}
That is, an alignment must match the first and last instances and cannot skip any intermediate instance.
This also yields the property that no two sub-alignments cross each other.
Figure \ref{fig:dtw} is an example of a valid alignment.
We can also represent the alignment in matrix form $W$ where:
\begin{equation}
W_{i,j}= \left\{ \begin{array}{ll}
1 & \mbox{if $(i,j) \in P$} \\
0 & \mbox{otherwise} \end{array} \right.
\end{equation}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/dtw/dtw}
\caption{{A valid time-series alignment%
}}
\end{center}
\end{figure}
To ensure that $W$ represents an alignment which satisfies the constraints in Equations \ref{eq:Wcontraint1}, \ref{eq:Wcontraint2}, \ref{eq:Wcontraint3}, $W$ must be in the following form:
$W_{1,1}=1,W_{n,m}=1$, none of the columns or rows of $W$ is a $0$ vector, and there must not be any $0$ between any two $1$'s in a row or column of $W$.
We call a $W$ which satifies these conditions a \emph{DTW matrix}.
An optimal alignment is the one which minimizes the loss function with respect to the DTW matrix $W$:
\begin{eqnarray}
L_2(W) = \sum_{i,j}dist\left(x_i,y_j\right)W_{i,j}
\end{eqnarray}
A na\"{\i}ve search over the space of all valid alignments would take exponential time;
however, dynamic programming can produce an optimal alignment in $O(nm)$.
\section{Manifold Warping}
\subsection {One-step algorithm}
We now present a novel framework for aligning two sequentially-ordered data sets that share a common manifold representation.
In our approach, we use the warping matrix produced by DTW as a heuristic correspondence matrix for manifold alignment.
The proposed algorithm uses alternating projections \cite{von_neumann1950alt_projections}, picking new correspondences with DTW and reprojecting both inputs using manifold alignment until the loss function is minimized.
This presents an improvement over CTW in cases where nonlinear transformations are required to recover the underlying manifold structure of one or both input data sets.
We introduce the following loss function for manifold warping:
\begin{equation}
\begin{array}{l}
L_3 \left(F^{(X)},F^{(Y)},W^{(X,Y)}\right) = \\
\mu \displaystyle\sum_{i\in X,j\in Y}\Vert F_i^{(X)}-F_{j}^{(Y)}\Vert^2 W_{i,j}^{(X,Y)} \\
+ (1-\mu) \displaystyle\sum_{i,j\in X}\Vert F_i^{(X)}-F_{j}^{(X)}\Vert^2 W_{i,j}^{(X)} \\
+ (1-\mu) \displaystyle\sum_{i,j\in Y}\Vert F_i^{(Y)}-F_{j}^{(Y)}\Vert^2 W_{i,j}^{(Y)}
\end{array}
\end{equation}
The optimization becomes $\underset{F^{(X)},F^{(Y)},W^{(X,Y)} }{\argmin} (L_{3})$ subject to $F^{T}DF=I$ where $F= \left[ \begin{array}{l} F^{(X)} \\ F^{(Y)} \end{array} \right]$ and $W^{(X,Y)}$ is a DTW matrix.
Note that unlike manifold alignment, the correspondence matrix $W^{(X,Y)}$ is now an argument in the optimization problem.
The intuition behind this loss function is similar to that of manifold alignment:
the last two error terms ensure that the embedding preserves the local geometry of the inputs, and the first term promotes a high quality DTW alignment between two sequential data sets. Again, these goals are controlled by the parameter $\mu$.
We now propose an algorithm that minimizes $L_3$:
\begin{algorithm}
\caption{{One-Step Manifold Warping}}
\SetAlgoLined
\KwIn{X,Y: two time-series data sets \\
d: latent space dimension \\
k: number of nearest neighbors used \\
$\mu$: preserving correspondence vs local geometry factor}
\KwOut{$F^{(X)},F^{(Y)}$: the embeddings of X and Y in the latent space \\
$W^{(X,Y)}$: the result DTW matrix that provides the alignment of X and Y}
\Begin{
$W^{(X)}\leftarrow \mbox{KNNGraph}(X,k)$
$W^{(Y)}\leftarrow \mbox{KNNGraph}(Y,k)$ \\
Set $W^{(X,Y)}_{1,1}=W^{(X,Y)}_{n_{X},n_{Y}}=1$, and $0$ everywhere else
$t \leftarrow 0$ \\
\Repeat{convergence}{
$W = \left[ \begin{array}{cc} (1-\mu) W^{(X),t} & \mu W^{(X,Y),t} \\
\mu (W^{(X,Y),t})^{T} & (1-\mu) W^{(Y),t} \end{array} \right]$ \\
$F^{(X),t+1},F^{(Y),t+1}\leftarrow\mbox{MA}(F^{(X),t},F^{(Y),t},W^{t},d,\mu)$\\
$W^{(X,Y),t+1}\leftarrow\mbox{DTW}(F^{(X),t+1},F^{(Y),t+1})$ \\
$t \leftarrow t+1$
}
$F^{(X)} \leftarrow F^{(X),t}$; $F^{(Y)} \leftarrow F^{(Y),t}$; $W^{(X,Y)} \leftarrow W^{t}$
}
\end{algorithm}
In Algorithm 1, $\mbox{MA(X,Y,W,d,$\mu$)}$ is a function that returns the embedding of $X,Y$ in a $d$ dimensional space using manifold alignment with the joint similarity matrix $W$ and parameter $\mu$ described in the manifold alignment section.
The function $\mbox{DTW(X,Y)}$ returns a DTW matrix after aligning two sequences $X,Y$ using dynamic time warping.
The $\mbox{KNNGraph(X,k)}$ function returns the $k$-nearest neighbors graph of a data set $X$.
In some cases, it is useful to replace the $k$-nearest neighbor graph approach with an $\epsilon$-neighborhood graph \cite{belkin2001laplacian_eigenmaps}.
\begin{convergence}
Let $L_{3,t}$ be the loss function $L_3$ evaluated at $F^{(X),t},F^{(Y),t},W^{(X,Y),t}$ of Algorithm 1. The sequence $L_{3,t}$ converges to a mimimum as $t \rightarrow \infty$. Therefore, Algorithm 1 will terminate.
\end{convergence}
\begin{proof}
In every iteration $t$, two steps are performed: using manifold alignment to solve for new projections $F^{\left(X\right),t+1},F^{\left(Y\right),t+1}$,
and using DTW to change the correspondences to $W^{(X,Y),t+1}$.
Recall that the loss function $L_1$ is just $L_3$ with fixed $W^{(X,Y)}$.
In the first step, with fixed $W^{(X,Y),t}$, Algorithm 1 solves for new projections $F^{(X),t+1},F^{(Y),t+1}$ using manifold alignment.
In manifold alignment section, we showed that manifold alignment's mappings minimize the loss function $L_{3}$ when the correspondence matrix is fixed. Hence:
\begin{equation}
\begin{array}{l}
L_{3}(F^{(X),t+1},F^{(Y),t+1},W^{(X,Y),t}) \\
\leq L_{3}(F^{(X),t},F^{(Y),t},W^{(X,Y),t})
\end{array} \label{eqn:mainequality} \end{equation}
In the second step, the projections are fixed as $F^{(X),t+1},F^{(Y),t+1}$. Algorithm 1 changes the correspondence matrix from $W^{(X,Y),t}$ to $W^{(X,Y),t+1}$ which does not affect last two terms in $L_3$.
If we replace $dist(F^{(X)}_{i},F^{(Y)}_{j})$ by $\mu ||F^{(X),t+1}_{i}-F^{(Y),t+1}_{j}||^2$ in the loss function $L_{2}$ of DTW,
we recover the first term in $L_{3}$ of manifold warping.
Since $W^{(X,Y),t+1}$ is produced by DTW, it will minimize the first term of $L_3$. Therefore, we have:
\begin{equation}
\begin{array}{c}
\mu \displaystyle\sum_{i\in X,j\in Y} || F_{i}^{(X),t+1}-F_{j}^{(Y),t+1}||^{2}W_{i,j}^{(X,Y),t+1} \\
\leq \mu \displaystyle\sum_{i\in X,j\in Y} || F_{i}^{(X),t+1}-F_{j}^{(Y),t+1}||^{2}W_{i,j}^{(X,Y),t}
\end{array} \label{eqn:dtwinequality} \end{equation}
Changing the correspondence matrix does not affect the last two terms of $L_3$, so:
\begin{equation}
\begin{array}{l}
L_{3}(F^{(X),t+1},F^{(Y),t+1},W^{(X,Y),t+1}) \\
\leq L_{3}(F^{(X),t+1},F^{(Y),t+1},W^{(X,Y),t}) \\
\leq L_{3}(F^{(X),t},F^{(Y),t},W^{(X,Y),t}) \mbox{ from inequality \ref{eqn:mainequality} } \\
\Leftrightarrow L_{3,t+1} \leq L_{3,t}
\end{array} \end{equation}
Therefore, $L_{3,t}$ is a decreasing sequence. We also have $L_{3,t} \ge 0$, so it is convergent. Therefore, Algorithm 1 will eventually terminate.
\end{proof}
\subsection {Two-step algorithm}
We now propose an algorithm that exploits the observation that if the local geometries of the two data sets are roughly the same, their similarity matrices will also be very similar to each other. \cite{wang2008procrustes}
Thus, if we first perform a nonlinear projection on each input set independently, the embeddings are likely to be linearly alignable using either manifold warping or CTW.
\begin{algorithm}
\caption{{Two-Step Manifold Warping}}
\SetAlgoLined
\KwIn{X,Y: two time-series data sets \\ d: latent space dimension \\
k: number of nearest neighbors used \\
$\mu$: preserving correspondence/local geometry factor}
\KwOut{$F^{(X)},F^{(Y)}$: the embeddings of X and Y in the latent space \\
$W^{(X,Y)}$: the result DTW matrix that provides the alignment of X and Y}
\Begin{
$W^{(X)}\leftarrow \mbox{KNNGraph(X,k)}$ \\
$W^{(Y)}\leftarrow \mbox{KNNGraph(Y,k)}$ \\
$t \leftarrow 0$ \\
$F^{\left(X\right),t}\leftarrow\mbox{DimReduction}\left(F^{\left(X\right)},W^{(X)},d\right)$ \\
$F^{\left(Y\right),t}\leftarrow\mbox{DimReduction}\left(F^{\left(Y\right)},W^{(Y)},d\right)$ \\
\Repeat{convergence}{
$W = \left[ \begin{array}{cc} (1-\mu) W^{(X),t} & \mu W^{(X,Y),t} \\
\mu (W^{(X,Y),t})^{T} & (1-\mu) W^{(Y),t} \end{array} \right]$ \\
$\phi^{(Y),t+1},\phi^{(X),t+1}$ $\leftarrow \mbox{LMA}\left(F^{\left(X\right),t},F^{\left(Y\right),t},W^{(X,Y),t},d,\mu \right)$\\
$F^{(X),t+1}\leftarrow F^{(X),t}\phi^{(X),t+1}$\\
$F^{(Y),t+1}\leftarrow F^{(Y),t}\phi^{(Y),t+1}$\\
$W^{\left(X,Y\right),t+1}\leftarrow\mbox{DTW}\left(F^{\left(X\right),t+1},F^{\left(Y\right),t+1} \right) $ \\
$t \leftarrow t+1$
}
$F^{(X)} \leftarrow F^{(X),t}$; $F^{(Y)} \leftarrow F^{(Y),t}$; $W^{(X,Y)} \leftarrow W^{(X,Y),t}$
}
\end{algorithm}
In Algorithm 2, $\mbox{DimReduction}(X,W,d)$ is a dimensionality reduction function which maps $X$ with similarity matrix $W$ to a lower dimensional space $d$.
In this paper, we will use Laplacian Eigenmaps to be consistent with manifold alignment even though other methods such as LLE \cite{roweis2000lle}, Isomap \cite{J.B.Tenenbaum:2000:0833c}, etc. could be applied.
$\mbox{LMA(X,Y,W,d,$\mu$)}$ is a function that performs linear manifold alignment described above on $X$ and $Y$ with the joint similarity matrix $W$, the target dimension $d$ and returns the projection matrices $\phi^{(X)}$ and $\phi^{(Y)}$.
We can think of $\mbox{DimReduction}$ as a preprocessing step, then reformulate the loss function as:
\begin{equation}
\begin{array}{l}
L_{4} (\phi^{(X)},\phi^{(Y)},W^{(X,Y)}) \\= ((1-\mu)\displaystyle\sum_{i,j\in X} ||F_{i}^{(X)}\phi^{(X)}-F_{j}^{(X)}\phi^{(X)}||^{2}W_{i,j}^{(X)} \\
+(1-\mu)\displaystyle\sum_{i,j\in Y}||F_{i}^{(Y)}\phi^{(Y)}-F_{j}^{(Y)}\phi^{(Y)}||^{2}W_{i,j}^{(Y)}\\
+\mu \displaystyle\sum_{i\in X,j\in Y}||F_{i}^{(X)}\phi^{(X)}-F_{j}^{(Y)}\phi^{(Y)}||^{2}W_{i,j}^{(X,Y)}
) \end{array} \end{equation}
which is the same loss function as in linear manifold alignment except that $W^{(X,Y)}$ is now a variable. The two constraints are the constraint in Equation \ref{eqn:linearmaeqn} of linear manifold alignment, and $W^{(X,Y)}$ must be a $\mbox{DTW matrix}$.
\begin{convergence2}
Let $L_{4,t}$ be the loss function $L_4$ evaluated at $\prod_{i=1}^{t}\phi^{(X),i},\prod_{i=1}^{t}\phi^{(Y),i},W^{(X,Y),t}$ of Algorithm 2. The sequence $L_{4,t}$ converges to a mimimum as $t \rightarrow \infty$. Therefore, Algorithm 2 will terminate.
\end{convergence2}
\begin{proof}
The proof is similar to that of theorem 1.
At any iteration $t$, Algorithm 2 first fixes the correspondence matrix at $W^{(X,Y),t}$.
Now let $L_4'$ be like $L_4$ except that we replace $F_{i}^{(X)},F_{i}^{(Y)}$ by $F_{i}^{(X),t},F_{i}^{(Y),t}$ and Algorithm 2 minimizes $L_4'$ over $\phi^{(X),t+1},\phi^{(Y),t+1}$ using linear manifold alignment.
Thus,
\begin{equation}
\begin{array}{l} \label {eqn:linearinequality}
L_{4}'(\phi^{(X),t+1},\phi^{(Y),t+1},W^{(X,Y),t}) \\
\leq L_{4}'(I,I,W^{(X,Y),t}) \\
= L_{4}(\prod_{i=1}^{t}\phi^{(X),i},\prod_{i=1}^{t} \phi^{(Y),i},W^{(X,Y),t}) \\
= L_{4,t}
\end{array}
\end{equation}
since $F^{(X),t}=F^{(X)}\prod_{i=1}^{t}\phi^{(X),i}$ and $F^{(Y),t}=F^{(Y)}\prod_{i=1}^{t}\phi^{(X),i}$. We also have:
\begin{equation}
\begin{array}{l}
L_{4}'(\phi^{(X),t+1},\phi^{(Y),t+1},W^{(X,Y),t}) \\
= L_{4}(\prod_{i=1}^{t+1}\phi^{(X),i},\prod_{i=1}^{t+1} \phi^{(Y),i},W^{(X,Y),t}) \\
\leq L_{4,t}
\end{array}
\end{equation}
Algorithm 2 then performs DTW to change $W^{(X,Y),t}$ to $W^{(X,Y),t+1}$.
Using the same argument as in the proof of Theorem 1, we have:
\begin{equation}
\begin{array}{l}
L_{4}(\prod_{i=1}^{t+1}\phi^{(X),i},\prod_{i=1}^{t+1} \phi^{(Y),i},W^{(X,Y),t+1}) \\
\leq L_{4}(\prod_{i=1}^{t+1}\phi^{(X),i},\prod_{i=1}^{t+1} \phi^{(Y),i},W^{(X,Y),t}) \\
\leq L_{4,t}\\
\Leftrightarrow L_{4,t+1} \leq L_{4,t}.
\end{array}
\end{equation}
So, the convergence follows.
\end{proof}
Furthermore, when we set $\mu=1$, the loss function $L_{4}$ will become similar to that of CTW.
We can also substitute CTW in place of the loop in the algorithm.
\section{Experimental Results}
\subsection{Synthetic data sets}
We compare the performance of CTW and manifold warping by trying to align two $sin(x^{2})$ curves: one is on the flat plane, the another is projected onto the Swiss roll as illustrated in Figure \ref{fig:NonlinearManifoldIllustration}.
Some duplicate points are added along the curves to create many-to-one correspondences in the alignment.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/SyntheticEmbeddings/SyntheticEmbeddings}
\caption{{Embedding of two $sin(x^{2})$ curves illustrated in Figure \ref{fig:NonlinearManifoldIllustration} onto 2D.%
}}
\end{center}
\end{figure}
As shown in Figure \ref{fig:SytheticEmbeddings}, manifold warping produced similar embeddings for two curves based on their local geometry while CTW linearly collapsed the Swiss roll curve onto the plane.
As a result, the warping path (that is, the alignment path) produced by manifold warping stays closer to the true warping path than that produced by CTW.
The error is calculated by the area between the result path and the ground truth path as suggested in \cite{zhou2009ctw}.
We also normalize the error by dividing by the whole plot area, $n_{X}\times n_{Y}$.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/SyntheticWarpingPaths/SyntheticWarpingPaths}
\caption{{Result warping path of each algorithm's alignment and the ground truth warping path for the two sine curves illustrated in Figure \ref{fig:NonlinearManifoldIllustration}%
}}
\end{center}
\end{figure}
The warping paths and the calculated errors, shown in Figure \ref{fig:SyntheticWarpingPaths} and Table \ref{tab:AlignmentErrors}, show that manifold warping yields a smaller error than CTW.
\subsection{COIL-100 data set}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/DogAndCat/DogAndCat}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/2cups/2cups}
\caption{{Samples from pairs of COIL-100 image series%
}}
\end{center}
\end{figure}
We also test these algorithms on a real-world vision data set from the Columbia Object Image Library (COIL100) \cite{Coil100}.
The corpus consists of different series of images taken of different objects on a rotating platform.
Each series has $72$ images, each $128\times128$ pixels.
We try to align two series of images of two different objects, with differences in shape and brightness producing very different high-dimensional representations.
To demonstrate our algorithm's ability to work with data sets of different dimensionality, we compress one image series to a smaller resolution ($64\times64$ pixels).
Additionally, some duplicate images are added to each series, to ensure that the correct mapping is not trivially one-to-one.
In both experiments, manifold warping methods achieve alignments with a much smaller error than CTW.
The depiction in Figure \ref{fig:DogAndCatEmbeddings} provides an intuitive picture of the manifold warping algorithm.
In the first projection to two dimensions, both image series are mapped to circles.
The next several iterations rotate these circles to match the first and last points, then the points in between.
For the case of one-step Manifold Warping (where all mappings are nonlinear),
we pick a small $\mu$ to prioritize preserving local geometry of each series.
This avoids over-fitting the embedding to a potentially bad intermediate DTW correspondence.
We perform the experiment with two pairs of COIL image series, illustrated in Figure \ref{fig:COIL_series}.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/DogAndCatEmbeddings/DogAndCatEmbeddings}
\caption{{2D embedding of dog/cat toy image series (Figure \ref{fig:COIL_series}).%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/DogAndCatWarpingPaths/DogAndCatWarpingPaths}
\caption{{Warping paths for dog/cat toy images%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/2CupEmbeddings/2CupEmbeddings}
\caption{{2D embedding of cup image series (Figure \ref{fig:COIL_series}).%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/2CupWarpingPaths/2CupWarpingPaths}
\caption{{Warping paths for cup images%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{table}
\begin{tabular}{lrrrr}
~ & Synthetic & Dog+Cat & Cups & Kitchen \\
\hline
1-step MW & 0.0768 & 0.0447 & 0.0464 & \textbf{0.0257} \\
2-step MW & 0.0817 & \textbf{0.0282} & \textbf{0.0125} & 0.0396 \\
2-step + CTW & \textbf{0.0652}& 0.0298 & 0.0143 & 0.0772 \\
CTW & 0.2784 & 0.2656 & 0.1668 & 0.0966 \\
\end{tabular}
\caption{{Alignment error across algorithms and data sets}}
\label{tab:AlignmentErrors}
\end{table}
\subsection{Kitchen data set}
Our last experiment uses the kitchen data set \cite{torre2008motiondata} from the CMU Quality of Life Grand Challenge, which records human subjects cooking a variety of dishes. Here, we attempt nonlinear alignments between the same subject and task, across different sensors.
Our experiment considers two separate views of the same moment in time, during which the subject prepares a brownie.
The two views are 9-dimensional inertial measurement unit (IMU) readings and 87-dimensional motion capture suit coordinates (MOCAP).
Aligning two views of the same task provides a straightforward evaluation metric, because the time stamps on each reading yield ground-truth correspondence information.
To make the problem computationally feasible, we subsampled the original data sets.
Each manifold warping method performs better than CTW, based on the results shown below in Figure \ref{fig:SandwichEmbeddings}, Figure \ref{fig:SandwichWarpingPaths}, and Table \ref{tab:AlignmentErrors}.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/SandwichEmbeddings/SandwichEmbeddings}
\caption{{The embeddings of two sensors measurements series IMU and MOCAP for a kitchen task.%
}}
\end{center}
\end{figure}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/SandwichWarpingPaths/SandwichWarpingPaths}
\caption{{Result warping path of each algorithm's alignment for the kitchen data set experiment%
}}
\end{center}
\end{figure}
\section{Discussion and Future Work}
Due to the lack of a linearity constraint, manifold warping consistently performs better than CTW when the inputs lie on manifolds that are not accessible via linear transformations. Even in the linear case, the alignment quality of manifold warping is at just as good as CTW.
Importantly, this improved alignment quality does not impose significant runtime overhead.
Both algorithms rely on the same DTW step, and tuned implementations of manifold alignment are comparable in runtime to the CCA step used in CTW. We found that while each manifold warping iteration is marginally slower than a similar CTW iteration, manifold warping tends to converge with fewer steps.
Speedups may be possible by using a relaxed variation of DTW for the first few iterations, and parallelizing the initial alignments of the two-step algorithm.
Both the one-step and two-step manifold warping algorithms are natural extensions of canonical time warping on manifolds, and the results presented in this paper indicate that this added information has the potential to significantly improve alignment quality.
Several variants of DTW and manifold alignment may prove beneficial within the manifold warping framework. Future work will explore multiscale and local alignment strategies, enabling broader applications for manifold warping.
\selectlanguage{english}
\FloatBarrier
\bibliographystyle{plain}
\bibliography{bibliography/converted_to_latex.bib%
}
\end{document}