Stats from vg stats
after chopping the nodes into max. 32 bp nodes. Of note, chr20 is around 64.4 Mbp long in GRCh38.
The degree distribution looks like this:
Note: the degree is winsorized at 100 (nodes with degree>100 at grouped shown at x=100).
Stats extracted from the distance index with vg view -B
. As an estimate of the variation, let’s look at the difference between maximum and minimum length of paths traversing the snarl. The depth represents how embedded a snarl is.
Using vg deconstruct
on the pangenome, relative to paths starting with hg38_.
The graph below shows the cumulative proportion of mapped reads for different mapping quality threshold, i.e. the proportion of mapped reads with MAPQ>=x.
No MAPQ curve for now because all reads tend to map with maximum MAPQ of 255.