# H9 UCSC Genome Browser Hub

This directory contains a complete UCSC Genome Browser Track Hub for H9_HAP1 and H9_HAP2 gene annotations.

## Contents

### Hub Configuration Files
- `hub.txt` - Main hub configuration
- `genomes.txt` - Genome assembly definitions

### Genome Directories

Each genome directory (`H9_HAP1/` and `H9_HAP2/`) contains:

#### Genome Sequence
- `H9_HAP1.2bit` / `H9_HAP2.2bit` - Genome sequence in 2bit format
- `H9_HAP1.chrom.sizes` / `H9_HAP2.chrom.sizes` - Chromosome sizes

#### Track Configuration
- `trackDb.txt` - Track definitions and display settings

#### Gene Annotation Tracks (bigBed format)

1. **Consensus Track** (`*_consensus.bb`)
   - Final consensus gene set from CAT pipeline
   - Integrates evidence from all sources
   - Priority: Highest (shown by default in "pack" mode)
   - Color: Green

2. **Liftoff Track** (`*_liftoff.bb`)
   - Liftoff gene projections from reference
   - High-quality alignments
   - Color: Blue

3. **TransMap Track** (`*_transMap.bb`)
   - TransMap transcript alignments (filtered)
   - Genome-to-genome alignments via HAL
   - Color: Red

4. **Augustus TM Track** (`*_augTM.bb`)
   - Augustus predictions with TransMap hints
   - De novo gene predictions guided by alignments
   - Color: Brown

5. **Augustus TMR Track** (`*_augTMR.bb`)
   - Augustus predictions with TransMap + RNA-seq hints
   - Incorporates transcriptomic evidence
   - Default: Hidden
   - Color: Purple

6. **Augustus PB Track** (`*_augPB.bb`)
   - Augustus predictions with PacBio/Iso-Seq hints
   - Long-read evidence for novel genes/isoforms
   - Default: Hidden
   - Color: Teal

#### Additional Files
- `*_consensus_info.txt` - Detailed metadata for consensus genes
  - Gene sources, scores, classification, support metrics

## Statistics

### H9_HAP1 Tracks
- Consensus: 519,075 transcripts
- Liftoff: 385,717 transcripts
- TransMap: 310,858 transcripts
- Augustus TM: 37,141 transcripts
- Augustus TMR: 34,471 transcripts
- Augustus PB: 334,662 transcripts

### H9_HAP2 Tracks
- Consensus: 519,230 transcripts
- Liftoff: 386,368 transcripts
- TransMap: 310,849 transcripts
- Augustus TM: 37,281 transcripts
- Augustus TMR: 34,453 transcripts
- Augustus PB: 316,608 transcripts

## Using This Hub

### Option 1: Load as Public Hub

1. Copy the entire `ucsc_hub` directory to a web-accessible location
2. Note the URL to `hub.txt` (e.g., `https://your-server.com/path/to/hub.txt`)
3. Go to [UCSC Genome Browser](https://genome.ucsc.edu/)
4. Navigate to: **My Data → Track Hubs → My Hubs**
5. Enter the URL to your `hub.txt` file
6. Click "Add Hub"

### Option 2: Load Locally (for testing)

You can use the UCSC Genome Browser Gateway to test locally:

```bash
# Start a simple HTTP server in the hub directory
cd /private/groups/cgl/pnhebbar/h9_project/ucsc_hub
python3 -m http.server 8000
```

Then use `http://localhost:8000/hub.txt` as the hub URL.

### Option 3: Copy to Web Server

```bash
# Example: Copy to a web-accessible directory
rsync -avz /private/groups/cgl/pnhebbar/h9_project/ucsc_hub/ \
    user@webserver:/var/www/html/h9_hub/
```

## Track Display Options

### Visibility Modes
- **hide** - Track is not displayed
- **dense** - Compact display, single line per item
- **pack** - Multiple lines, items stacked
- **squish** - Compressed pack mode
- **full** - Full display with all details

### Default Visibility
- Consensus: **pack** (expanded view)
- Liftoff: **dense** (compact)
- TransMap: **dense** (compact)
- Augustus TM: **dense** (compact)
- Augustus TMR: **hide** (off by default)
- Augustus PB: **hide** (off by default)

## Customizing Tracks

To modify track appearance, edit the `trackDb.txt` files in each genome directory.

Example modifications:
```
track consensusGenes
visibility full          # Change to full display
color 0,150,0           # Adjust color (RGB)
priority 1              # Track ordering
```

## File Formats

- **bigBed (.bb)** - Binary indexed BED format for efficient viewing
- **2bit (.2bit)** - Compressed genome sequence format
- **BED12** - Standard BED format with 12 fields (name, score, strand, blocks, etc.)

## Support & Citation

These annotations were generated using the Comparative Annotation Toolkit (CAT):
- GitHub: https://github.com/ComparativeGenomicsToolkit/Comparative-Annotation-Toolkit
- Paper: Fiddes et al. (2018) "Comparative Annotation Toolkit (CAT)"

## Notes

### Known Issues
1. Some genes may have generic IDs (e.g., `H9_HAP2_G0060434`) instead of gene names if:
   - They are novel genes not in reference
   - The gene name lookup failed during consensus building
   - They are from augPB predictions without reference match

2. Tracks may overlap due to:
   - Structural variations between haplotypes
   - Copy number variations (CNVs)
   - Overlapping gene families (e.g., NBPF, NOTCH2NL)

### Filtering Applied
- **TransMap**: Filtered for alignment quality and paralogy resolution
- **Augustus**: Minimum intron/exon support requirements applied
- **Consensus**: Integrated best evidence from all sources with deduplication

## Contact

For questions about this hub or the annotations:
- Generated: November 2025
- Pipeline: CAT (Comparative Annotation Toolkit)
- Location: /private/groups/cgl/pnhebbar/h9_project/ucsc_hub

