Pangenome

Stats from vg stats after chopping the nodes into max. 32 bp nodes. Of note, chr20 is around 64.4 Mbp long in GRCh38.

The degree distribution looks like this:

Note: the degree is winsorized at 100 (nodes with degree>100 at grouped shown at x=100).

Variation

Snarls

Stats extracted from the distance index with vg view -B. As an estimate of the variation, let’s look at the difference between maximum and minimum length of paths traversing the snarl. The depth represents how embedded a snarl is.

Relative to hg38

Using vg deconstruct on the pangenome, relative to paths starting with hg38_.

Short-read Mapping stats

  • Reads from HG002.
  • ~50x depth.
  • Chr20 subset: reads mapping to chr20 on the GIAB SV graph.

The graph below shows the cumulative proportion of mapped reads for different mapping quality threshold, i.e. the proportion of mapped reads with MAPQ>=x.

Long-reads Mapping stats

  • CCS reads from HG002.
  • ~10x depth.
  • Chr20 subset: reads mapping to chr20 on hg38

No MAPQ curve for now because all reads tend to map with maximum MAPQ of 255.