Alternative polyadenylation#
Description#
This mini-protocol turns aligned RNA-seq reads into an analysis-ready alternative-polyadenylation (APA) phenotype matrix for apaQTL analysis. It chains two pipeline modules through their pipeline/ symlinks. First, APA calling builds a 3’UTR reference, converts transcriptome BAM files to per-base coverage, and runs DaPars2 to quantify a percentage-of-distal-polyA-site-usage (PDUI) matrix per chromosome. Second, post-APA imputation and QC fills missing PDUI values, applies quantile normalization, and optionally renames sample columns so that downstream covariate, association, and fine-mapping steps receive a complete matrix. Follow the steps in order; each is a single command on the toy data.
Input#
File |
Description |
|---|---|
Transcriptome BAM files |
Per-sample RNA-seq alignments to the transcriptome, collected under |
Reference GTF ( |
Gene annotation used to derive the 3’UTR reference regions (e.g. |
Sample match table |
Optional tab-delimited table mapping internal IDs to final sample names, used by the rename step. |
Steps#
i. Generate the 3’UTR reference regions from a GTF annotation:#
Timing: Runtime varies by dataset size and compute resources. For the toy chr22 MWE dataset, most steps complete in under 10 minutes on a standard HPC node.
sos run pipeline/apa_calling.ipynb UTR_reference \
--cwd output/apa \
--hg-gtf output/apa/chr22.gtf
ii. Convert the transcriptome BAM files into per-base coverage (wig) and read-depth (flagstat) files:#
sos run pipeline/apa_calling.ipynb bam2tools \
--cwd output/apa \
--bam-dir output/rnaseq/bam
iii. Compile the DaPars2 sample configuration and mapping files:#
sos run pipeline/apa_calling.ipynb APAconfig \
--cwd output/apa \
--bfile output/apa/wig \
--annotation output/apa/chr22_3UTR.bed
iv. Use DaPars2 to quantify APA events (PDUI matrix per chromosome):#
sos run pipeline/apa_calling.ipynb APAmain \
--cwd output/apa \
--chrlist chr22 \
--dapars-path code/molecular_phenotypes/calling/apa/Dapars2_Multi_Sample.py
v. Impute missing values and run quality control on the PDUI matrix:#
sos run pipeline/apa_impute.ipynb APAimpute \
--cwd output/apa \
--chrlist chr22
vi. Optionally rename the sample columns of the imputed PDUI matrix using a match table:#
Output#
File |
Description |
|---|---|
|
3’UTR reference regions extracted from the annotation, used by DaPars2. |
Coverage ( |
Per-base read coverage and read-depth summaries derived from the transcriptome BAMs. |
PDUI matrix (per chromosome) |
DaPars2 quantification of distal poly-A site usage, samples in columns. |
Imputed PDUI matrix |
The PDUI matrix after missing-value imputation and quantile normalization — the APA phenotype table used for apaQTL analysis. |
Anticipated Results#
The pipeline produces output files in the output/ subdirectory named after the workflow step. Verify success by checking that output files exist and are non-empty. See the Output section above for the expected file names and formats.
sos run pipeline/apa_impute.ipynb APArename \
--cwd output/apa \
--chrlist chr22 \
--match input/covariate/protocol_example.apa_matchtable.txt