Genome size estimation based on NGS sequencing data
The HTQC package(Xi Yang et
al., 2013) was used to filter low-quality bases and reads. Briefly,
three steps were performed to clean the NGS data. First, the adapter
sequences were removed from the reads; second, the reads with more than
10% N bases were eliminated; and third, reads with more than 50%
low-quality bases (<=5) were discarded. Lastly, we obtained
42.3 Gb (~86X) of cleaned data for the Kmer-based
analysis. We also randomly picked 10,000 read pairs and blasted them
against the NCBI non redundant nucleotide (nt) database to check for
obvious sample contamination.