Literature Review
2.1
Failure
rate in Naval ship setting
Before the mission, each naval ship is equipped with a forecasted amount
of spare engines. An underestimated forecast has a risk of mission
failure as spares parts cannot be resupplied during mission times. An
overestimated forecast may lead to reduced operating efficiency due to a
load of unnecessary spare parts. Moreover, from a system point of view,
overestimation induces unnecessary use of budget and even lead to
inventory shortage for other ships. So, defining the optimal set of
spare parts is crucial for mission success (Zammori et al., 2020).
For accurate prediction, several special features resulting from Navy’s
system should be noted. First of all, unbalances are observed in two
categories of the data: age period and engine types. There was only a
short period of failure rate data compared to the entire lifetime. In
our case, for example, an early age has less data than the rest of the
age period; this might be problematic as the failure rate of young ships
is needed for operation. Also, the distribution of ships for each engine
type category is not balanced. In our dataset with 98 ships, there are
5, 27, 43, 17, 6 ships for each engine type category. In this case,
while a satisfactory model could be obtained from engine with a large
amount of data, other models might suffer lack of data problems.
Moreover, the similarity between ships and engines should also be noted
as they undergo are expected as all the engines are under the same
maintenance process; planned maintenance is performed by ROK Navy
regardless of the engine type (Yoo, J. M. et al., 2019). Based on these
circumstances, where ships as well as engines share certain qualities,
the model with layered parameter structure is needed; it should be able
to learn the specific structure between and within each layer from the
data.
2.2 Failure
forecasting models
Several models exist such as ARIMA, exponential smoothing, and seasonal
trend decomposition using Loess (Hyndman and Athanasopoulos, 2018) that
could model time series characteristics of failure rate. Among the
existing time series models, Prophet, which adopts Bayesian generalized
additive model shows high accuracy. Moreover, it decomposes time series
into trend, seasonal, other regressor factors which enhances both its
application and interpretability (Taylor, S.J., & Letham, B., 2018).
More specific models concentrating on the characteristics of failure
have been suggested. A bathtub is typical shape pattern of failure rate.
Also, Weibull or Poisson distribution are often used as a distribution
of failure rate. Wang and Yin (2019) performed failure rate forecasting
through stochastic ARIMA model and Weibull distribution. Time series has
been decomposed into bathtub-shape assumed trend and stochastic factors.
Parameters of the Weibull distribution were separately learned for the
increase, decrease, and flat period of the bathtub. The stochastic
element was obtained using ARIMA, and the time series failure rate was
calculated as the sum of the trend and stochastic elements. Sherbrooke
(2006) proposed Pareto-optimal algorithms, named constructive
algorithms, based on Poisson distribution. However, it had limits in
determining the parameter. Zammori et al. (2020) tried to solve the
problem of parameter estimation of Sherbrooke’s (2006) model by applying
time-series Weibull distribution. Other attempts such as Pareto-optimal,
Monte-Carlo(Sherbrooke, 2006), ARMA, and least-squares logarithm (Wang
& Yin, 2019) have been made to add the effect of stochastic factors to
this distribution.
Attempts have been made to integrate time series models with information
about system architecture. In the risk analysis of deepwater drilling
riser fracture (Chang, y. et al., 2019), Bayesian network was used to
predict the fracture failure rate. Bayesian network could also used to
analyze and prevent the cause of a ship’s potential accidents (Afenyo,
M. et al., 2017). Time series forecasting based on Bayesian network
(Dikis, K., & Lazakis, I. , 2019) and Analytic Hierarchy Process (AHP)
(Yoo, J. M., Yoon, S. W., & Lee, S. H., 2019) illustrate these
approaches. They are based on the assumption that equipment, engines for
example, within the same group follow similar failure patterns.
2.3. Hierarchical model
Hierarchical model has an edge in representing the features of Navy data
introduced in 2.1; unbalanced category and sharing structure, by
information pooling. Gelman et al. (2005) explained that hierarchical
models are highly predictive because of pooling (Gelman et al., 2013).
When hierarchical model is used, there is almost always an improvement,
but to different degrees that depends on the heterogeneity of the
observed data (Gelman, 2006a). When updating the model parameters, such
as prior parameters, the relationship between the part of the data being
used and the whole population should always be considered. Pooled
effects between subclusters are partial as they are implemented through
shared hyperparameters, not parameters. In a Bayesian hierarchy, the
balance of fit can be learned by using hyperpriors.
By properly setting the hyperprior structure, we can find a reasonable
balance between over-fitting and under-fitting, as hyperpriors are known
to serve as a regularizing factor. Many examples of applying
hierarchical structure in cross-sectional data exist in diverse domain,
such as ecology, education, business, and epidemiology (McElreath,
2020). The structure of cross-sectional data where the whole population
is divided into multiple and nested subcategories provides an excellent
environment for a hierarchical model. Previous literature on comparing
the education effects of multiple schools has shown that incorporating
the nested structure of the state, school, and class in the model had
substantial improvement in terms of accuracy and interpretability
(Rubin, 1981).
2.4. Model evaluation measures
Time series cross-validation and k-fold cross-validation, along with the
expanding forecast method, can be used to measure forecast accuracy in
time series (Hyndman and Athanasopoulos, 2018). Several sets of training
and test data are created in a walk-forward mode, and forecast accuracy
is computed by averaging over the test sets. Various measures of
forecast error exist, including the mean absolute, root mean squared and
mean absolute percentage error. When a large difference of scale exists
in the data, using a scaled error measure is recommended. The mean
absolute scaled error is recommended for comparing forecast accuracy
across multiple time series (Hyndman & Koehler, 2006).
Information criteria can be used to measure the fit of a model in
Bayesian models include widely applicable information criterion (WAIC)
and the leave-one-out cross-validation (LOOCV); they are preferred to
other criteria such as Akaike information criterion (AIC) and deviance
information criterion (DIC) (Vehtari and Lampinen, 2002). For Bayesian
models, where the estimation of parameters is based on sampled results,
it is essential to check whether chains have reached their convergence
before comparing models. For these purposes, trace plots and numerical
summaries such as the potential scale reduction factor, Rhat (Stan
Development Team, 2017b) are used. Rhat lower than 1.1, for each
parameter, is recommended.