# Long Read Giraffe Paper Data

This archive contains software source code and data used in the manuscript *Rapid, accurate long- and short-read mapping to large pangenome graphs with vg Giraffe*.

## Layout

The archive is organized as follows:

* `code`
  * `long-read-giraffe-experiments`: the pipeline used to run mapping, calling, and genotyping experiments, https://github.com/vgteam/long-read-giraffe-experiments
    * `v1.0.0.tar.gz`
  * `pga_workflow`: the pipeline used to run pangenome-guided assembly, https://github.com/shlokanegi/pga_workflow/tree/lrg2025-paper
    * `pga_workflow.tar.gz`
  * `vg`: vg source code, regular releases are available at: https://github.com/vgteam/vg/
    * `vg-ceb8ad76e537d66433f3078848489026709e7557.tar.gz`
    * `vg-v1.68.0.tar.gz`
* `deep_variant_models`
  * `2025-03-26`
    * `model.example_info.json`
    * `checkpoint`
    * `checkpoint-140800-0.98024-1.data-00000-of-00001`
    * `README.txt`
    * `checkpoint-140800-0.98024-1.index`
    * `example_info.json`
  * `2025-03-26noinfo`
    * `checkpoint`
    * `checkpoint-140800-0.98024-1.data-00000-of-00001`
    * `checkpoint-140800-0.98024-1.index`
    * `example_info.json`
* `graphs`
  * `HPRC-clipped`: The v2 HPRC graph built with Minigraph-Cactus used for haplotype sampling
    * `hprc-v2.0-mc-chm13-eval.ec1M.fragment.hapl`: The haplotype index for haplotype sampling
    * `hprc-v2.0-mc-chm13-eval.ec1M.gbz`
  * `HPRC-d46`: The v2 HPRC graph built with Minigraph-Cactus with frequency filtering
    * `hprc-v2.0-mc-chm13-eval.d46.gbz`
    * `hprc-v2.0-mc-chm13-eval.d46.gfa`
    * `hprc-v2.0-mc-chm13-eval.d46.dist`: The distance index
    * `hprc-v2.0-mc-chm13-eval.d46.k29.w11.withzip.min`: The minimizer index for mapping short reads
    * `hprc-v2.0-mc-chm13-eval.d46.k29.w11.zipcodes`: The zip codes associated with the short read minimizer index
    * `hprc-v2.0-mc-chm13-eval.d46.k31.w50.W.withzip.min`: The minimizer index for mapping long reads
    * `hprc-v2.0-mc-chm13-eval.d46.k31.w50.W.zipcodes`: The zip codes associated with the long read minimizer index
  * `HPRC-minigraph`: The v2 HPRC graph built with Minigraph
    * `HPRC-minigraph/hprc-v2.0-minigraph-chm13-eval.gfa`
* `sim_reads`: Simulated read sets used for evaluating mapping correctness 
  * `HG002-sim-element-1m.gam`
  * `HG002-sim-hifi-1m.gam`
  * `HG002-sim-illumina-1m.gam`
  * `HG002-sim-r10-1m.gam`
* `real_reads`
  * `HG002`
    * `HG002.GAT-LI-C044.fastq.gz`: Element reads
    * `HG002.novaseq.pcr-free.40x.fq.gz`: Illumina reads
    * `HG002Revio_hg002v1.0.1_hifi_revio_pbmay24.pri.unshuffled.fastq.gz`: HiFi reads
    * `r10y2025.HG002_PAW70337.fastq.gz`: R10 reads
    * `README.txt`
