| Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Precision | METRIC.Recall | METRIC.F1_Score |
|---|---|---|---|---|---|---|
| INDEL | 502129 | 2372 | 1540 | 0.997066 | 0.995298 | 0.996181 |
| SNP | 3315258 | 12238 | 4818 | 0.998550 | 0.996322 | 0.997435 |
| Type | TRUTH.TP | TRUTH.FN | QUERY.FP | METRIC.Precision | METRIC.Recall | METRIC.F1_Score |
|---|---|---|---|---|---|---|
| INDEL | 501470 | 3031 | 1380 | 0.997367 | 0.993992 | 0.995677 |
| SNP | 3306945 | 20551 | 6042 | 0.998177 | 0.993824 | 0.995996 |
From the variants marked as TP/FP/FN in the annotated VCF that hap.py produces.
| type | FN | FP | TP | precision | recall | F1 |
|---|---|---|---|---|---|---|
| INDEL | 2372 | 1540 | 502129 | 0.9969 | 0.9953 | 0.9961 |
| SNP | 12238 | 4818 | 3315258 | 0.9985 | 0.9963 | 0.9974 |
| type | FN | FP | TP | precision | recall | F1 |
|---|---|---|---|---|---|---|
| INDEL | 3031 | 1380 | 501470 | 0.9973 | 0.9940 | 0.9956 |
| SNP | 20551 | 6042 | 3306945 | 0.9982 | 0.9938 | 0.9960 |
Looks almost exactly like the metrics computed by hap.py. Good, it means I could filter these annotated variants to get a quick estimate of the performance in different regions.
From UCSC genomicSuperDups track.
sd: all segmental duplicationssd99: segmental duplication with fracMath>0.99From https://github.com/genome-in-a-bottle/genome-stratifications
| region | n | Mbp |
|---|---|---|
| sd | 69894 | 910.757532 |
| sd99 | 2491 | 131.811504 |
| MHC | 1 | 4.970558 |
| AllTandemRepeatsandHomopolymers_slop5 | 4689843 | 254.038446 |
| AllTandemRepeats_gt100bp_slop5 | 213368 | 120.148167 |
| L1H_gt500 | 1021 | 3.289135 |
| alldifficultregions | 4810858 | 643.469280 |
| alllowmapandsegdupregions | 529793 | 306.144596 |
| lowmappabilityall | 673815 | 249.550584 |
Graphs sorted to highlight regions with the biggest difference in performance between methods.
Note: this is based on the annotated variants from one hap.py run on the confident regions. To get slightly more accurate estimates of the performance (and ROC curves for eg) for a particular region set, we should rerun hap.py on them. This is just a quicker way used for exploration.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 0.00 13.82 5.00 453.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 2.000 5.797 7.000 96.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 4.00 8.00 21.34 25.00 332.00
MHC regions with FP/FN with giraffe
Looking for regions were most variants are FN in the giraffe-HPRC run but not in the bwa-mem run.
| coord | prop.fn | n.fn | n.tot | prop.fn.bwamem |
|---|---|---|---|---|
| chr1:161757353-161767351 | 1.0000000 | 15 | 15 | 0.0000000 |
| chr1:161767352-161777351 | 1.0000000 | 23 | 23 | 0.0000000 |
| chr6:32645857-32655855 | 1.0000000 | 109 | 109 | 0.0180180 |
| chr7:38353860-38363858 | 1.0000000 | 11 | 11 | 0.0000000 |
| chr9:64426127-64436125 | 1.0000000 | 12 | 12 | 0.0833333 |
| chr9:64466125-64476123 | 1.0000000 | 12 | 12 | 0.0666667 |
| chr15:84392425-84402422 | 1.0000000 | 67 | 67 | 0.0000000 |
| chr16:22189304-22199302 | 1.0000000 | 12 | 12 | 0.0000000 |
| chr16:28498701-28508699 | 1.0000000 | 12 | 12 | 0.0000000 |
| chr16:22119311-22129309 | 0.9285714 | 13 | 14 | 0.0000000 |
| chr17:36210775-36220774 | 0.9285714 | 13 | 14 | 0.0000000 |
| chr3:195911092-195921091 | 0.9230769 | 12 | 13 | 0.0000000 |
| chr16:22229300-22239299 | 0.9230769 | 12 | 13 | 0.0000000 |
| chr12:9448483-9458481 | 0.9090909 | 20 | 22 | 0.0000000 |
| chr16:22159307-22169305 | 0.9090909 | 10 | 11 | 0.0000000 |
| chr9:67046008-67056006 | 0.9047619 | 19 | 21 | 0.0476190 |
| chr16:28518699-28528697 | 0.9047619 | 19 | 21 | 0.0000000 |
Same analysis with larger bins
| coord | prop.fn | n.fn | n.tot | prop.fn.bwamem |
|---|---|---|---|---|
| chr9:64427487-64527452 | 1.0000000 | 97 | 97 | 0.1825397 |
| chr9:64827353-64927318 | 1.0000000 | 45 | 45 | 0.1886792 |
| chr16:21826922-21926900 | 1.0000000 | 31 | 31 | 0.3589744 |
| chr15:84334575-84434550 | 0.9758065 | 121 | 124 | 0.1653543 |
| chr9:66326851-66426816 | 0.9545455 | 42 | 44 | 0.1666667 |
| chr16:21327025-21427004 | 0.9000000 | 36 | 40 | 0.2000000 |
| chr9:67326516-67426481 | 0.8923077 | 58 | 65 | 0.4492754 |
| chr9:67126583-67226548 | 0.8391608 | 120 | 143 | 0.2402597 |
| chr16:21626963-21726942 | 0.8367347 | 41 | 49 | 0.0000000 |
| chr8:12290888-12390866 | 0.8317757 | 89 | 107 | 0.1538462 |
| chr16:22126860-22226839 | 0.8317757 | 89 | 107 | 0.0000000 |
| chr16:21726943-21826921 | 0.8125000 | 13 | 16 | 0.0909091 |
| chr15:22449292-22549268 | 0.8076923 | 21 | 26 | 0.2750000 |