Skip to main content
Ctrl+K
FunGen-xQTL Consortium - Home FunGen-xQTL Consortium - Home
  • FunGen-xQTL Computational Protocol

Getting started

  • Getting Started

Command Generator

  • RNA-seq calling and QC
  • Univariate xQTL Discovery
  • APA QTL analysis

Reference data

  • Reference Data
    • Reference Data Standardization
    • Generation of Topologically Associated Domains and their Boundaries
    • Independent list of variants using LD clumping
    • RSS LD Sketch Pipeline

Molecular Phenotypes

  • RNA-seq expression
    • Quantifying expression from RNA-seq data
    • Sample-level RNA-seq quality control
    • Bulk RNA-seq counts normalization
    • snRNA-seq Preprocessing
    • Single-nuclei Pseudobulk Preprocessing (RNA-seq and ATAC-seq)
  • Methylation Data Preprocessing
    • Quantification of methylation data
  • Alternative splicing from RNA-seq data
    • Quantifying alternative splicing from RNA-seq data
    • Splicing QC and Normalization
  • Alternative polyadenylation
    • APA Calling
    • Post-APA calling: Imputation and QC

Data Pre-processing

  • Genotype Preprocessing
    • Genotype VCF File Quality Control
    • Genotype Data Formatting
    • Genotype PLINK File Quality Control
    • Principal Component Analysis
    • Genomic Relationship Matrices
  • Phenotype Preprocessing
    • Gene Coordinate Annotation
    • Phenotype data imputation
    • Phenotype Data Formatting
  • Covariate Data Preprocessing
    • Covariate Data Formatting
    • Hidden Factor Analysis

QTL Association Testing

  • QTL Association Testing
    • QTL Association Testing (TensorQTL)
    • Quantile regression for QTL association testing
  • Hierarchical Multiple Testing

Multivariate Mixture Model

  • Mixture Multivariate Distribution Estimate
    • Extract genome-wide data for multivariate analysis
    • Mixture prior estimation for MASH
    • MASH analysis pipeline with data-driven prior matrices
    • MASH analysis pipeline with posterior computation

Multiomics Regression Models

  • Integrative Analysis with High-Dimensional Regression
    • Steps
    • Multivariate Fine-Mapping for multiple genes
    • Univariate Fine-Mapping of Functional (Epigenomic) Data with fSuSiE
    • Multivariate Fine-Mapping with mvSuSiE and mr.mash
    • Regression with Summary Statistics (RSS) Fine-Mapping and TWAS with SuSiE
    • Advanced regression models for association analysis with individual-level data
    • RSS Fine-mapping with GWAS Summary Statistics
  • Fine-mapping result post-processing

GWAS Integration

  • xQTL-GWAS pairwise enrichment and colocalization
  • TWAS, cTWAS and MR
  • Multi-trait colocalization using ColocBoost

Enrichment and Validation

  • Chromosome-Specific Enrichment Analysis of Annotations Using Block Jackknife
  • Pathway Analysis
  • GREGOR enrichment analysis
  • Stratified LD Score Regression (S-LDSC) Enrichment

xQTL Modifier Score

  • scEEMS Model Training
  • scEEMS Prediction
  • Suggest edit
  • Open issue
  • .ipynb

Genotype PLINK File Quality Control

Contents

  • Description
  • Methods
  • Default Parameters: QC
  • Input Files
  • Output Files
  • Minimal Working Example
    • Step 1. Basic QC (rare and common variants)
    • Step 2. Sample Match with Phenotype
    • Step 3. Kinship QC
    • Step 4. Prepare Unrelated Individuals for PCA
    • Step 5. Extract Pruned Variants for PCA
  • Command Interface
  • Estimate kinship in the sample
  • Genotype and sample QC
  • Extract genotype based on overlap with phenotype
  • Anticipated Results

Genotype PLINK File Quality Control#

This workflow implements some preliminary data QC steps for PLINK input files. It supports both PLINK1 binary format (BED/BIM/FAM) and PLINK2 format (PGEN/PVAR/PSAM). VCF format of inputs will be converted to PLINK before performing QC.

Description#

This notebook includes workflow for

  • Compute kinship matrix in sample and estimate related individuals

  • Genotype and sample QC: by MAF, missing data and HWE

  • LD pruning for follow up PCA analysis on genotype, as needed

A potential limitation is that the workflow requires all samples and chromosomes to be merged as one single file, in order to perform both sample and variant level QC. However, in our experience using this pipeline with 200K exomes with 15 million variants, this pipeline works on the single merged PLINK file.

Methods#

Depending on the context of your problem, the workflow can be executed in two ways:

  1. Run qc command to perform genotype data QC and LD pruning to generate a subset of variants in preparation for analysis such as PCA.

  2. Run king first on either the original or a subset of common variants to identify unrelated individuals. The king pipeline will split samples to related and unrelated individuals. Then you perform qc on these individuals only and finally extract the same set of QC-ed variants for related individuals.

Default Parameters: QC#

  • Kinship coefficient for related individuals: 0.0625

  • MAF and MAC default: 0

    • Above default includes both common and are variant

    • Recommand MAF for PCA: 0.01, we should stick to common variants

    • Recommand MAC for single variant analysis: 5.

  • Variant level missingness threshold: 0.1

  • Sample level missingness threshold: 0.1

  • LD pruning via PLINK for PCA analysis:

    • window 50

    • shift 10

    • r2 0.1

  • HWE default: 1E-15 which is very lenient

Input Files#

File

Description

output/genotype_formatting/plink/protocol_example.genotype.merged.{bed,bim,fam}

Merged genome-wide PLINK bundle from genotype_formatting (Step 1 input)

input/rnaseq/protocol_example.rnaseq.bed.gz

Toy RNA-seq expression phenotype (60 samples SAMPLE_001..060) for sample matching

Genotype input may be PLINK1 (bed/bim/fam) or PLINK2 (pgen/pvar/psam). For VCF input, first convert with the genotype_formatting pipeline.

Output Files#

File

Description

output/gwas_qc/plink/protocol_example.genotype.merged.plink_qc.{bed,bim,fam}

Basic-QC genotype (Step 1)

output/gwas_qc/genotype/protocol_example.rnaseq.bed.sample_genotypes.txt

Genotype-sample list overlapping the phenotype (Step 2)

output/gwas_qc/kinship/*.king.unrelated.{bed,bim,fam} and *.king.related.{bed,bim,fam}

KING-split unrelated/related sets (Step 3)

output/gwas_qc/genotype/*.king.unrelated.plink_qc.prune.in

LD-pruned variant list for PCA (Step 4)

output/gwas_qc/cache/*.for_pca.{bed,bim,fam}

Pruned genotype set prepared for PCA (Step 5)

Minimal Working Example#

Minimal working example data-set as well as the singularity container bioinfo.sif can be downloaded from Synapse.

The chr1_chr6 data-set was merged from chr1 and chr6 data, using merge_plink command from genotype formatting pipeline.

Step 1. Basic QC (rare and common variants)#

Apply variant- and sample-level filters (missingness, HWE, MAC). Timing: <1 min on toy data.

sos run pipeline/GWAS_QC.ipynb qc_no_prune \
    --cwd output/gwas_qc/plink \
    --genoFile output/genotype_formatting/plink/protocol_example.genotype.merged.bed \
    --geno-filter 0.1 \
    --mind-filter 0.1 \
    --hwe-filter 1e-08 \
    --mac-filter 0

Step 2. Sample Match with Phenotype#

Find samples shared between genotype and phenotype, writing the overlapping sample lists. Timing: <1 min.

sos run pipeline/GWAS_QC.ipynb genotype_phenotype_sample_overlap \
    --cwd output/gwas_qc/genotype \
    --genoFile output/gwas_qc/plink/protocol_example.genotype.merged.plink_qc.fam \
    --phenoFile input/rnaseq/protocol_example.rnaseq.bed.gz

Step 3. Kinship QC#

Estimate kinship with KING and split samples into related and unrelated sets. In this toy dataset SAMPLE_059 and SAMPLE_060 are a parent-offspring pair, so KING does detect related individuals and produces both *.related.bed and *.unrelated.bed.

Timing: <2 min on toy data.

sos run pipeline/GWAS_QC.ipynb king \
    --cwd output/gwas_qc/kinship \
    --genoFile output/gwas_qc/plink/protocol_example.genotype.merged.plink_qc.bed \
    --name protocol_example.king \
    --keep-samples output/gwas_qc/genotype/protocol_example.rnaseq.bed.sample_genotypes.txt

Step 4. Prepare Unrelated Individuals for PCA#

Because related individuals were detected (Step 3), run qc on the KING *.unrelated.bed to produce the LD-pruned, unrelated genotype set used downstream for PCA.

Timing: <1 min on toy data.

sos run pipeline/GWAS_QC.ipynb qc \
    --cwd output/gwas_qc/genotype \
    --genoFile output/gwas_qc/kinship/protocol_example.genotype.merged.plink_qc.protocol_example.king.unrelated.bed \
    --mac-filter 5

Alternative (no related individuals): if KING reports No related individuals detected and produces no *.unrelated.bed, run qc on the original QC genotype with --keep-samples instead. For this toy dataset related individuals ARE present, so the Step 4 path above applies; the command below is shown for reference.

Timing: <1 min

sos run pipeline/GWAS_QC.ipynb qc \
    --cwd output/gwas_qc/genotype \
    --genoFile output/gwas_qc/plink/protocol_example.genotype.merged.plink_qc.bed \
    --mac-filter 5

Step 5. Extract Pruned Variants for PCA#

Extract the LD-pruned variants (from Step 4) out of the full genotype set, applying only sample-level missingness, in preparation for PCA.

sos run pipeline/GWAS_QC.ipynb qc_no_prune \
    --cwd output/gwas_qc/cache \
    --genoFile output/genotype_formatting/plink/protocol_example.genotype.merged.bed \
    --geno-filter 0 \
    --mind-filter 0.1 \
    --maf-filter 0 \
    --keep-variants output/gwas_qc/genotype/protocol_example.genotype.merged.plink_qc.protocol_example.king.unrelated.plink_qc.prune.in \
    --name for_pca

Command Interface#

sos run GWAS_QC.ipynb -h
[global]
parameter: modular_script_dir = path('code/script')  # override with --modular-script-dir
# the output directory for generated files
parameter: cwd = path("output")
# A string to identify your analysis run
parameter: name = ""
# PLINK binary files (either BED/BIM/FAM or PGEN/PVAR/PSAM format)
parameter: genoFile = paths
# The path to the file that contains the list of samples to remove (format FID, IID)
parameter: remove_samples = path('.')
# The path to the file that contains the list of samples to keep (format FID, IID)
parameter: keep_samples = path('.')
# The path to the file that contains the list of variants to keep
parameter: keep_variants = path('.')
# The path to the file that contains the list of variants to exclude
parameter: exclude_variants = path('.')
# Kinship coefficient threshold for related individuals
# (e.g first degree above 0.25, second degree above 0.125, third degree above 0.0625)
parameter: kinship = 0.0625
# For cluster jobs, number commands to run per job
parameter: job_size = 1
# Wall clock time expected
parameter: walltime = "5h"
# Memory expected
parameter: mem = "16G"
# Number of threads
parameter: numThreads = 20
# Software container option
parameter: container = ""
parameter: entrypoint= ""
# use this function to edit memory string for PLINK input
from sos.utils import expand_size
cwd = path(f"{cwd:a}")

# Determine if the file is in PLINK1 (BED/BIM/FAM) or PLINK2 (PGEN/PVAR/PSAM) format
def determine_plink_format(file_path):
    """
    Determine the PLINK file format based on file extensions and companion files.
    
    Args:
        file_path (str or Path): Path to the input file
    
    Returns:
        str: 'plink1' or 'plink2'
    """
    # Convert to string if it's a Path object
    file_path = str(file_path)
    
    # Check direct file extensions
    if file_path.endswith('.bed'):
        return 'plink1'
    elif file_path.endswith('.pgen'):
        return 'plink2'
    
    # If the file doesn't have a standard extension, try to infer format
    try:
        # Remove the file extension if present
        base_path = file_path.rsplit('.', 1)[0] if '.' in file_path else file_path
        
        # Check for PLINK1 companion files
        plink1_companion_files = [
            f"{base_path}.bim",
            f"{base_path}.fam"
        ]
        
        # Check for PLINK2 companion files
        plink2_companion_files = [
            f"{base_path}.pvar",
            f"{base_path}.psam"
        ]
        
        # Check PLINK1 format
        if all(os.path.exists(f) for f in plink1_companion_files):
            return 'plink1'
        
        # Check PLINK2 format
        if all(os.path.exists(f) for f in plink2_companion_files):
            return 'plink2'
    
    except Exception as e:
        print(f"Error determining PLINK format: {e}")
    
    # Default to PLINK1 if can't determine
    return 'plink1'


# Get the appropriate PLINK command based on the input file format
def get_plink_command_prefix(file_path):
    format_type = determine_plink_format(file_path)
    if format_type == 'plink1':
        return "--bfile"
    else:  # plink2
        return "--pfile"
        
# Generate the appropriate file extension based on the requested format
def get_output_extension(output_format, is_prune=False):
    if output_format == 'plink1':
        return '.bed' if not is_prune else '.prune.bed'
    else:  # plink2
        return '.pgen' if not is_prune else '.prune.pgen'
        
# Choose the make-bed or make-pgen command based on desired output format
def get_make_command(output_format):
    if output_format == 'plink1':
        return '--make-bed'
    else:  # plink2
        return '--make-pgen'

def get_other_args_flags(other_args):
    import re
    if not other_args:
        return ""
    args = [other_args] if isinstance(other_args, str) else list(other_args)
    flags = []
    for arg in args:
        arg = str(arg)
        if not re.fullmatch(r"[A-Za-z0-9][A-Za-z0-9_.:-]*", arg):
            raise ValueError(f"Cannot safely pass PLINK other_args value: {arg}")
        flags.append(f"--other-arg {arg}")
    return " ".join(flags)

Estimate kinship in the sample#

The output is a list of related individuals, as well as the kinship matrix

# Inference of relationships in the sample to identify closely related individuals
[king_1]
# PLINK binary file
parameter: kin_maf = 0.01
input: genoFile
output: f'{cwd}/{_input:bn}{("."+name) if name else ""}.kin0'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
plink_command = get_plink_command_prefix(genoFile)
bash: expand= "${ }", stderr = f'{_output}.stderr', stdout = f'{_output}.stdout', container = container, entrypoint = entrypoint
    bash ${modular_script_dir}/data_preprocessing/genotype/GWAS_QC.sh king \
        --cwd "${cwd}" \
        --genoFile "${genoFile}" \
        --plink-command "${plink_command}" \
        --out-prefix "${_output:n}" \
        --keep-samples "${keep_samples}" \
        --remove-samples "${remove_samples}" \
        --name "${name}" \
        --kinship ${kinship} \
        --kin-maf ${kin_maf} \
        --numThreads ${numThreads}
# Select a list of unrelated individual with an attempt to maximize the unrelated individuals selected from the data 
[king_2: shared = "related_id" ]
related_id = [x.strip() for x in open(_input).readlines() if not x.startswith("#")]
output: f'{_input:n}.related_id'
with open(_output, 'a'):
    pass
done_if(len(related_id) == 0, msg = f"No related individuals detected from {_input}.")
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: expand= "${ }", stderr = f'{_output}.stderr', stdout = f'{_output}.stdout', container = container, entrypoint = entrypoint
    Rscript ${modular_script_dir}/data_preprocessing/genotype/GWAS_QC.R \
        --step king_2 \
        --input "${_input}" \
        --output "${_output}" \
        --kinship ${kinship}
# Split genotype data into related and unrelated samples, if related individuals are detected
[king_3]
depends: sos_variable("related_id")
input: output_from(2), genoFile
plink_command = get_plink_command_prefix(_input[1])
output_format = determine_plink_format(_input[1])
make_command = get_make_command(output_format)
output_ext = get_output_extension(output_format)
output: unrelated_bed = f'{cwd}/{_input[0]:bn}.unrelated{output_ext}',
        related_bed = f'{cwd}/{_input[0]:bn}.related{output_ext}'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output[0]:bn}'
bash: expand= "${ }", stderr = f'{_output[0]:n}.stderr', stdout = f'{_output[0]:n}.stdout', container = container, entrypoint=entrypoint
    plink2 \
      ${plink_command} ${_input[1]:n} \
      --remove ${_input[0]} \
      ${('--keep %s' % keep_samples) if keep_samples.is_file() else ""} \
      ${make_command} \
      --out ${_output[0]:n} \
      --threads ${numThreads} \
      --memory ${int(expand_size(mem) * 0.9)/1e6} --new-id-max-allele-len 1000 --set-all-var-ids chr@:#_\$r_\$a 

    if [ ${len(related_id)} -ne 0 ] ; then
    plink2 \
      ${plink_command} ${_input[1]:n} \
      --keep ${_input[0]} \
      ${make_command} \
      --out ${_output[1]:n} \
      --threads ${numThreads} \
      --memory ${int(expand_size(mem) * 0.9)/1e6} --new-id-max-allele-len 1000 --set-all-var-ids chr@:#_\$r_\$a 
    else
       touch ${_output[1]}
    fi

Genotype and sample QC#

QC the genetic data based on MAF, sample and variant missigness and Hardy-Weinberg Equilibrium (HWE).

In this step you may also provide a list of samples to keep, for example in the case when you would like to subset a sample based on their ancestries to perform independent analyses on each of these groups.

The default parameters are set to reflect some suggestions in Table 1 of this paper.

# Filter SNPs and select individuals 
[qc_no_prune, qc_1 (basic QC filters)]
# minimum MAF filter to use. 0 means do not apply this filter.
parameter: maf_filter = 0.0
# maximum MAF filter to use. 0 means do not apply this filter.
parameter: maf_max_filter = 0.0
# minimum MAC filter to use. 0 means do not apply this filter.
parameter: mac_filter = 0.0
# maximum MAC filter to use. 0 means do not apply this filter.
parameter: mac_max_filter = 0.0 
# Maximum missingess per-variant
parameter: geno_filter = 0.1
# Maximum missingness per-sample
parameter: mind_filter = 0.1
# HWE filter -- a very lenient one
parameter: hwe_filter = 1e-15
# Other PLINK arguments e.g snps_only, write-samples, etc
parameter: other_args = []
# Only output SNP and sample list, rather than the PLINK binary format of subset data
parameter: meta_only = False
# Remove duplicate variants
parameter: rm_dups = False
# Add option to process dosage
parameter: treat_dosage_missing = False

fail_if(not (keep_samples.is_file() or keep_samples == path('.')), msg = f'Cannot find ``{keep_samples}``')
fail_if(not (keep_variants.is_file() or keep_variants == path('.')), msg = f'Cannot find ``{keep_variants}``')
fail_if(not (remove_samples.is_file() or remove_samples == path('.')), msg = f'Cannot find ``{remove_samples}``')

input: genoFile, group_by=1
plink_command = get_plink_command_prefix(_input)
output_format = determine_plink_format(_input)
make_command = get_make_command(output_format) if not meta_only else "--write-snplist --write-samples"
output_ext = get_output_extension(output_format) if not meta_only else ".snplist"
other_args_flags = get_other_args_flags(other_args)
output: f'{cwd}/{_input:bn}{("." + name) if name else ""}.plink_qc{".extracted" if keep_variants.is_file() else ""}{output_ext}'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: expand= "${ }", stderr = f'{_output:n}.stderr', stdout = f'{_output:n}.stdout', container = container, entrypoint = entrypoint
    bash ${modular_script_dir}/data_preprocessing/genotype/GWAS_QC.sh qc_no_prune \
        --cwd "${cwd}" \
        --genoFile "${_input}" \
        --plink-command "${plink_command}" \
        --output-format "${output_format}" \
        --make-command "${make_command}" \
        --out-prefix "${_output:n}" \
        --name "${name}" \
        --mac-filter ${mac_filter} \
        --maf-filter ${maf_filter} \
        --maf-max-filter ${maf_max_filter} \
        --mac-max-filter ${mac_max_filter} \
        --geno-filter ${geno_filter} \
        --mind-filter ${mind_filter} \
        --hwe-filter ${hwe_filter} \
        ${"--keep-samples " + str(keep_samples) if keep_samples.is_file() else ""} \
        ${"--remove-samples " + str(remove_samples) if remove_samples.is_file() else ""} \
        ${"--exclude-variants " + str(exclude_variants) if exclude_variants.is_file() else ""} \
        ${"--keep-variants " + str(keep_variants) if keep_variants.is_file() else ""} \
        ${"--meta-only" if meta_only else ""} \
        ${"--rm-dups" if rm_dups else ""} \
        ${"--treat-dosage-missing" if treat_dosage_missing else ""} \
        ${other_args_flags} \
        --numThreads ${numThreads}
# LD prunning and remove related individuals (both ind of a pair)
# Plink2 has multi-threaded calculation for LD prunning
[qc_2 (LD pruning)]
# Window size
parameter: window = 50
# Shift window every 10 snps
parameter: shift = 10
parameter: r2 = 0.1
parameter: mac_filter = 0.0
parameter: other_args = []
# Use PLINK --bad-ld flag (skip LD pruning diagnostics for regions with extreme LD)
parameter: bad_ld = False
stop_if(r2==0)
plink_command = get_plink_command_prefix(_input)
output_format = determine_plink_format(_input)
make_command = get_make_command(output_format)
output_ext = get_output_extension(output_format, is_prune=True)
other_args_flags = get_other_args_flags(other_args)
output: bed=f'{cwd}/{_input:bn}{output_ext}', prune=f'{cwd}/{_input:bn}.prune.in'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output[0]:bn}'
bash: expand= "${ }", stderr = f'{_output[0]:n}.stderr', stdout = f'{_output[0]:n}.stdout', container = container, entrypoint = entrypoint
    bash ${modular_script_dir}/data_preprocessing/genotype/GWAS_QC.sh qc \
        --cwd "${cwd}" \
        --genoFile "${_input}" \
        --plink-command "${plink_command}" \
        --output-format "${output_format}" \
        --make-command "${make_command}" \
        --prune-prefix "${_output['prune']:nn}" \
        --out-prefix "${_output['bed']:n}" \
        --mac-filter ${mac_filter} \
        --window ${window} \
        --shift ${shift} \
        --r2 ${r2} \
        ${"--bad-ld" if bad_ld else ""} \
        ${other_args_flags} \
        --numThreads ${numThreads}

Extract genotype based on overlap with phenotype#

This is an auxiliary step to match genotype and phenotype based on the data and look-up table. The look up table should contain two columns: sample_id, genotype_id. If the look up table is not provided or look-up table file not found, then we will assume the names have already been matched.

Anticipated Results#

The pipeline produces output files in the output/ subdirectory named after the workflow step. Verify success by checking that output files exist and are non-empty. See the Output section above for the expected file names and formats.

# This workflow extracts overlapping samples for genotype data with phenotype data, and output the filtered sample genotype list as well as sample phenotype list
[genotype_phenotype_sample_overlap]
# A genotype fam file
parameter: genoFile = path
# A phenotype file, can be bed.gz or tsv
parameter: phenoFile = path
# If this file is provided, a genotype/phenotype sample name match will be performed
# It must contain two column names: genotype_id, sample_id
parameter: sample_participant_lookup = path(".")
depends: executable('tabix'), executable('bgzip')
input: genoFile, phenoFile
output: f'{cwd:a}/{path(_input[1]):bn}.sample_overlap.txt', f'{cwd:a}/{path(_input[1]):bn}.sample_genotypes.txt'
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: expand= "${ }", stderr = f'{_output[0]:n}.stderr', stdout = f'{_output[0]:n}.stdout', container = container, entrypoint = entrypoint
    bash ${modular_script_dir}/data_preprocessing/genotype/GWAS_QC.sh genotype_phenotype_sample_overlap \
        --cwd "${cwd}" \
        --genoFile "${genoFile}" \
        --phenoFile "${phenoFile}" \
        --name "${name}" \
        --numThreads ${numThreads}

previous

Genotype Data Formatting

next

Principal Component Analysis

Contents
  • Description
  • Methods
  • Default Parameters: QC
  • Input Files
  • Output Files
  • Minimal Working Example
    • Step 1. Basic QC (rare and common variants)
    • Step 2. Sample Match with Phenotype
    • Step 3. Kinship QC
    • Step 4. Prepare Unrelated Individuals for PCA
    • Step 5. Extract Pruned Variants for PCA
  • Command Interface
  • Estimate kinship in the sample
  • Genotype and sample QC
  • Extract genotype based on overlap with phenotype
  • Anticipated Results

By The NIH/NIA Alzheimer's Disease Sequencing Project Functional Genomics xQTL Consortium

© Copyright 2021+, FunGen xQTL Analysis Working Group.