This is a work-in-progress evaluation report for our experiments on the chr20 subset of the 12 ONT/shasta assemblies (+ GRCh38).
Stats from vg stats after chopping the nodes into max. 32 bp nodes. Of note, chr20 is around 64.4 Mbp long in GRCh38.
The degree distribution looks like this:
Note: the degree is winsorized at 100 (nodes with degree>100 at grouped shown at x=100).
Stats extracted from the distance index with vg view -B. As an estimate of the variation, let’s look at the difference between maximum and minimum length of paths traversing the snarl. The depth represents how embedded a snarl is.
Using vg deconstruct on the pangenome, relative to paths starting with hg38_.
The graph below shows the cumulative proportion of mapped reads for different mapping quality threshold, i.e. the proportion of mapped reads with MAPQ>=x.
No MAPQ curve for now because all reads tend to map with maximum MAPQ of 255.
Resources used to construct the pangenome, make (vg) indexes and map short reads.