The total running time of sim1000G was evaluated in a common laptop computer, with a 2Ghz processor and 4GB of RAM. Only one cpu core was used for all timing reports. In figure 1, we see the running time of sim1000G for generation of up to 8000 simulated individuals in a region spanning 1Mb. The number of variants including in the simulation was varied from 1000 to 16000. Even when simulating thousands of individuals, the computational efficiency of sim1000G allows for the whole process to be completed in less than 10-30 seconds in most cases.
Applications
Generating genotype data for a single population
Using sim1000G, we generated variant data for 300 simulated individuals in a region of chromosome 4 at 60995249-61569446bp. The simulation was based on the 95 individuals from 1000 genomes population CEU, thus the simulated individuals mimic the allele frequencies and LD patterns of European descent individuals.
To asses the simulation, we compated the LD patterns of the original genotype data with the LD patterns of the simulated data (figure 1). In figure 1 we show both of the LD patterns, the lower triangle of the matrix shows the original genotypes of 95 individuals from 1000 genomes and the upper triangle the same pattern for the 300 simulated individuals. Although there are some subtle differences in LD, sim1000G preserved both short range and long-range LD in this region.