Alternative splicing from RNA-seq data#

Miniprotocol Timing#

Timing <2 hours

Overview#

Several other modules should be run before generating splicing data to prepare the data. These include:

  1. molecular_phenotypes/calling/RNA_calling.ipynb (step i): Generate data quality summary with fastqc

  2. molecular_phenotypes/calling/RNA_calling.ipynb (step ii): Trim adaptors

  3. molecular_phenotypes/calling/RNA_calling.ipynb (step iii): Align RNASeq reads with STAR using the wasp option specifically for splicing data

This miniprotocol shows the use of modules for splicing quantification and normalization. They are as follows:

  1. molecular_phenotypes/calling/splicing_calling.ipynb (step i): Quantify splicing with leafcutter or psichomics

  2. molecular_phenotypes/QC/splicing_normalization.ipynb (step ii): Quality control and normalization of splicing data

  3. data_preprocessing/phenotype/gene_annotation.ipynb (step iii): Process splicing data for use in TensorQTL

Steps#

i. Splicing Quantification with Leafcutter (intron usage ratio) or Psichomics (percent spliced in events)#

sos run pipeline/splicing_calling.ipynb leafcutter \
    --cwd output/leaf_cutter/ \
    --samples output/rnaseq/xqtl_protocol_data_bam_list 

sos run pipeline/splicing_calling.ipynb psichomics \
    --cwd output/psichomics/ \
    --samples output/rnaseq/xqtl_protocol_data_bam_list \
    --splicing_annotation hg38_suppa.rds 

ii. Splicing QC and Normalization#

sos run pipeline/splicing_normalization.ipynb leafcutter_norm \
    --cwd output/leaf_cutter/ \
    --ratios output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz

sos run pipeline/splicing_normalization.ipynb psichomics_norm \
    --cwd psichomics_output \
    --ratios psichomics_output/psi_raw_data.tsv 

iii. Post Processing for TensorQTL#

sos run pipeline/gene_annotation.ipynb annotate_leafcutter_isoforms \
    --cwd output/leaf_cutter/ \
    --intron_count output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind_numers.counts.gz \
    --phenoFile output/leaf_cutter/xqtl_protocol_data_bam_list_intron_usage_perind.counts.gz_raw_data.qqnorm.txt \
    --annotation-gtf reference_data/Homo_sapiens.GRCh38.103.chr.reformatted.collapse_only.gene.gtf \
    --sample_participant_lookup reference_data/sample_participant_lookup.rnaseq

sos run pipeline/code/data_preprocessing/phenotype/gene_annotation.ipynb annotate_psichomics_isoforms \
    --cwd psichomics_output \
    --phenoFile psichomics_output/psichomics_raw_data_bedded.qqnorm.txt \
    --annotation-gtf reference_data/Homo_sapiens.GRCh38.103.chr.reformated.ERCC.gene.gtf 

Anticipated Results#

The final output contains the QCed and normalized splicing data from leafcutter and psichomics.