# Panprimate Gene Annotations Browser Hub

This directory contains a UCSC Genome Browser Track Hub for the panprimate project with consensus gene annotations and evidence tracks for 10 primate species.

## Hub Overview

- **Hub Name**: panprimate_hub
- **Format**: bigBed (bigGenePred format)
- **Total bigBed files**: 50 (5 tracks × 10 genomes)
- **Created**: November 10, 2025

## Species Included

1. **Eulemur fulvus** (PPG00246)
2. **Eulemur macaco** (PPG00128)
3. **Eulemur rubriventer** (PPG00129)
4. **Leontopithecus rosalia** (PPG00786)
5. **Lagothrix lagotricha** (PPG00525)
6. **Pithecia pithecia** (PPG00239)
7. **Colobus angolensis** (PPG00099)
8. **Mandrillus leucophaeus** (PPG00232)
9. **Papio anubis** (PPG00036)
10. **Macaca nemestrina** (PPG00030)

## Track Types

Each genome has 5 tracks representing different sources of evidence:

### 1. Consensus Genes (Primary Track)
- **File**: `consensusGenes.bb`
- **Description**: Final consensus gene set combining all evidence sources
- **Color**: Blue (0,60,120)
- **Visibility**: Pack (default view)
- **Priority**: 1 (top track)

### 2. TransMap
- **File**: `transMap.bb`
- **Description**: Homology-based gene mappings from reference genome (hs1/CHM13)
- **Color**: Brown (120,60,0)
- **Visibility**: Dense
- **Priority**: 2

### 3. Augustus TM
- **File**: `augustusTM.bb`
- **Description**: Ab initio gene predictions using Augustus with TransMap hints
- **Color**: Green (0,120,60)
- **Visibility**: Dense
- **Priority**: 3

### 4. Augustus PB
- **File**: `augustusPB.bb`
- **Description**: Ab initio gene predictions using Augustus with PacBio/IsoSeq hints
- **Color**: Purple (120,0,60)
- **Visibility**: Dense
- **Priority**: 4

### 5. Liftoff
- **File**: `liftoff.bb`
- **Description**: Reference-based annotations lifted from CHM13
- **Color**: Yellow-green (60,120,0)
- **Visibility**: Dense
- **Priority**: 5

## Directory Structure

```
browser_hub/
├── hub.txt                          # Main hub configuration
├── genomes.txt                      # Genome definitions
├── panprimate_description.html      # Hub description page
├── README.md                        # This file
└── [GENOME_NAME]/
    ├── [GENOME_NAME].2bit          # Genome sequence
    ├── trackDb.txt                 # Track definitions for this genome
    ├── consensusGenes.bb           # Consensus gene annotations
    ├── transMap.bb                 # TransMap evidence
    ├── augustusTM.bb               # Augustus with TransMap hints
    ├── augustusPB.bb               # Augustus with PacBio hints
    └── liftoff.bb                  # Liftoff annotations
```

## Usage Instructions

### Option 1: Load from Public Web Server

1. Upload the entire `browser_hub` directory to a web-accessible server
2. Go to UCSC Genome Browser: https://genome.ucsc.edu/cgi-bin/hgHubConnect
3. Click "My Data" > "Track Hubs"
4. Enter the URL to your `hub.txt` file
5. Click "Add Hub"

### Option 2: Load Locally (for testing)

```bash
# Start a simple web server in the hub directory
cd /private/groups/cgl/pnhebbar/CAT2_smk/panprimate_output/browser_hub
python3 -m http.server 8000

# Then in UCSC Genome Browser, add hub URL:
# http://localhost:8000/hub.txt
```

### Option 3: Use with Custom Genome Browser

If you have your own Genome Browser instance, you can load the hub directly:

```bash
# Set the hub directory path in your browser configuration
HUB_DIR=/private/groups/cgl/pnhebbar/CAT2_smk/panprimate_output/browser_hub
```

## File Formats

### bigBed Format (bigGenePred)
All annotation tracks use the bigGenePred format (bed12+8), which includes:
- Standard BED12 fields (chromosome, coordinates, blocks, etc.)
- Extended fields: alternative names, CDS status, exon frames, gene type

### AutoSql Definition
Each bigBed file is accompanied by an `.as` (autoSql) file defining the schema:
- `chrom`, `chromStart`, `chromEnd`: Genomic coordinates
- `name`: Transcript ID
- `score`: Quality score (0-1000)
- `strand`: + or -
- `thickStart`, `thickEnd`: Coding region
- `blockCount`, `blockSizes`, `chromStarts`: Exon structure
- `name2`: Alternative/human-readable name
- `cdsStartStat`, `cdsEndStat`: Completeness status
- `exonFrames`: Frame information for each exon

## Annotation Methods

The consensus gene set was generated using the Comparative Annotation Toolkit (CAT) v2 with:

1. **Homology mapping**: TransMap for projecting reference annotations
2. **Ab initio prediction**: Augustus with multiple evidence types
3. **Reference liftover**: Liftoff for direct annotation transfer
4. **RNA-seq validation**: PacBio IsoSeq data for transcript evidence
5. **Consensus building**: Integration of all evidence sources

## Statistics

| Genome | Consensus Genes | File Size |
|--------|-----------------|-----------|
| PPG00246 (E. fulvus) | ~274,490 transcripts | 12 MB |
| PPG00128 (E. macaco) | ~similar | ~12 MB |
| PPG00129 (E. rubriventer) | ~similar | ~12 MB |
| PPG00786 (L. rosalia) | ~similar | ~12 MB |
| PPG00525 (L. lagotricha) | ~similar | ~12 MB |
| PPG00239 (P. pithecia) | ~similar | ~12 MB |
| PPG00099 (C. angolensis) | ~similar | ~12 MB |
| PPG00232 (M. leucophaeus) | ~similar | ~12 MB |
| PPG00036 (P. anubis) | ~similar | ~12 MB |
| PPG00030 (M. nemestrina) | ~similar | ~12 MB |

## Regenerating the Hub

To regenerate or update the hub:

```bash
cd /private/groups/cgl/pnhebbar/CAT2_smk

# Set up environment
source /private/groups/cgl/cactus/venv-cactus-latest/bin/activate
source ~/miniconda3/etc/profile.d/conda.sh
source ~/.bashrc
conda activate cat
export PATH=/private/groups/cgl/pnhebbar/CAT2_smk/standalones:$PATH

# Run the hub creation script
python3 create_panprimate_hub.py
```

Or use the wrapper script:

```bash
cd /private/groups/cgl/pnhebbar/CAT2_smk
./run_create_hub.sh
```

## References

- **Comparative Annotation Toolkit (CAT)**: https://github.com/ComparativeGenomicsToolkit/Comparative-Annotation-Toolkit
- **UCSC Track Hub Documentation**: https://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html
- **bigBed Format**: https://genome.ucsc.edu/goldenPath/help/bigBed.html
- **bigGenePred Format**: https://genome.ucsc.edu/FAQ/FAQformat.html#format1

## Contact

For questions or issues with this hub:
- Email: pnhebbar@ucsc.edu
- Project: Panprimate Genome Annotations

## Citation

If you use these annotations, please cite:
- The Comparative Annotation Toolkit (CAT)
- The panprimate genome project
- Individual genome assemblies and data sources

## License

These annotations are provided for research use. Please consult with the project team for specific usage terms.

---

**Last Updated**: November 10, 2025
**Version**: 1.0
**Pipeline**: CAT2_smk

