A goal of the development of sim1000G was to enable simulations of arbitrary pedigree structures with multi-marker data. This requires modelling and simulation of all meiosis and recombination events in a pedigree. We have added a complete set of methods in the package to enable the generation of realistic genotype data in familial data of simple or complex pedigrees.
When modelling recombination, we introduce two recombination models within sim1000G: a interference chi-squared model and a simple no-interference model. This models are used to generate inter-recombination distances for the region that is under simulation and the recombination of the resulting haplotypes. The model with interference was adapted from a previously described two-pathway model \citep{Housworth_2003}.
In addition, simulations of family data require a detailed genetic map. For the 1000 genomes data, we provide genetic maps for all autosomes. Because of package size limitations we were not able to include genetic maps in the package distribution and we provide all the genetic map files on the accompanying website of the package.
It is common for methods in familial studies to require the estimation of identity by descent (IBD) probabilities between members of the same pedigree. Within it's simulation model, sim1000G tracks all ancestral haplotypes and alleles for each recombination event. This enable the compuation of the exact identity by descent state for each marker in the region that is being simulated. We added a simple user interface that computes the exact IBD 1 and 2 proportions of every pair of individuals, with a single call of the function computePairIBD12.
Computational efficiency