A simple solution is to train a replication timing with only H3K4me3 and H3K27me. But this would be too simplistic a model, even though replication time is related to gene expression and H3K4me3 induces expression while H3k27me3 represses it.
Instead, using data available from the  ENCODE project, a different model was trained for each experiment using as many tags available to improve training.

Training model

  1. For each experiment, ENCODE data was obtained for cell lines, or tissues with at least all the tags needed.
  2. Data values for 100kb region for each tag were extracted from regions whose replication time is variable. and the mean, standard variation, skewness and Kurtosis was saved.
  3. Once enough features have been extracted  (ca 20000), the Standard Variation, Skewness, and Kurtosis were Boxcox normalized (as they do not lay in a normal distribution).
  4. The features are now ready to be modelled by linear regression.

Applying model

  1. Features from the experiments are extracted in the same fashion as \ref{645025}
  2. Using the model trained in \ref{645025} on the features from \ref{972036} we get a predicted replication time for mice germ line.

Result

Linear regression: