{ "cells": [ { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "# MASH analysis pipeline with data-driven prior matrices\n", "\n", "In this notebook, we utilize the MASH prior, referred to as the [mixture_prior](https://github.com/statfungen/xqtl-protocol/blob/6c637645ce16aee2aa7dc86bbc334fb6bb66b9d9/code/multivariate/MASH/mixture_prior.ipynb#L4), from a previous step. Our objective is to conduct a multivariate analysis under the MASH model. After fitting the model, we subsequently compute the posteriors for our variables of interest." ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## Methods" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "### Multivariate adaptive shrinkage (MASH) analysis of eQTL data\n", "\n", "\n", "Since we published Urbut 2019, we have improved implementation of MASH algorithm and made a new R package, [`mashr`](https://github.com/stephenslab/mashr). Major improvements compared to Urbut 2019 are:\n", "\n", "1. Faster computation of likelihood and posterior quantities via matrix algebra tricks and a C++ implementation.\n", "2. Faster computation of MASH mixture via convex optimization.\n", "3. New ways to estimate prior in place of the `SFA` approach, see `mixture_prior.ipynb` for details.\n", "4. Improve estimate of residual variance $\\hat{V}$.\n", "\n", "At this point, the input data have already been converted from the original association summary statistics to a format convenient for analysis in MASH." ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## MWE Data\n", "\n", "Avaiable on [synapse.org](https://www.synapse.org/#!Synapse:syn52624471)" ] }, { "cell_type": "markdown", "metadata": { "kernel": "Bash" }, "source": [ "## Multivariate analysis with MASH model\n", "\n", "Using MWE with [prior previously generated](https://github.com/statfungen/xqtl-protocol/blob/main/code/multivariate/MASH/mixture_prior.ipynb)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "kernel": "Bash" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO: Running \u001b[32mmash_1\u001b[0m: Fit MASH mixture model (time estimate: <15min for 70K by 49 matrix)\n", "INFO: \u001b[32mmash_1\u001b[0m (index=0) is \u001b[32mignored\u001b[0m due to saved signature\n", "INFO: \u001b[32mmash_1\u001b[0m output: \u001b[32m/home/gw/Documents/xQTL/output/mashr/protocol_example.EZ.mash_model.rds\u001b[0m\n", "INFO: Running \u001b[32mmash_2\u001b[0m: Compute posterior for the \"strong\" set of data as in Urbut et al 2017. This is optional because most of the time we want to apply the MASH model learned on much larger data-set.\n", "INFO: \u001b[32mmash_2\u001b[0m is \u001b[32mcompleted\u001b[0m.\n", "INFO: \u001b[32mmash_2\u001b[0m output: \u001b[32m/home/gw/Documents/xQTL/output/mashr/protocol_example.EZ.posterior.rds\u001b[0m\n", "INFO: Workflow mash (ID=wdb1a3918af0fd3c3) is executed successfully with 1 completed step and 1 ignored step.\n" ] } ], "source": [ "sos run pipeline/mash_fit.ipynb mash \\\n", " --output-prefix protocol_example \\\n", " --data output/protocol_example.mashr_input.rds \\\n", " --vhat-data output/mashr/protocol_example.ed_mixture.EZ.V_simple.rds \\\n", " --prior-data output/mashr/protocol_example.ed_mixture.EZ.prior.rds \\\n", " --compute-posterior \\\n", " --cwd output/mashr \\\n", " --container oras://ghcr.io/statfungen/stephenslab_apptainer:latest" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "kernel": "R" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A matrix: 17 × 8 of type dbl
ALLAstEndExcInhMicOPCOli
1.000725e-061.206902e-010.380245012.341416e-050.0212380262.438671e-022.822754e-014.950394e-02
6.195229e-016.725207e-010.650460086.835928e-010.5822886196.287649e-016.903822e-014.929722e-01
5.502451e-086.952073e-050.428491533.848081e-090.0013491392.974207e-012.267577e-012.193946e-01
1.141256e-023.673827e-010.299842838.920542e-030.0625202811.557580e-011.438823e-013.652727e-01
4.803276e-011.793390e-010.019161761.752551e-010.3846611763.138391e-014.888775e-013.793039e-01
3.240321e-011.006760e-040.284235162.692014e-100.4646108468.359305e-048.654122e-211.358764e-09
4.744668e-013.330669e-160.357285212.052281e-050.4371276483.887724e-021.843843e-051.374751e-06
3.166308e-030.000000e+000.345738212.111164e-040.4244658241.686171e-029.992007e-162.119553e-07
4.826046e-043.121070e-020.317639163.181940e-010.0001077906.340894e-192.832298e-023.138958e-02
2.426264e-013.950080e-010.161929051.023271e-010.2842975243.167406e-041.449052e-024.348748e-01
3.999852e-098.353939e-020.473998692.161030e-060.0179151401.450814e-012.240465e-014.253355e-01
8.830382e-065.623546e-030.230466667.952619e-080.4659642749.821045e-026.871004e-022.039190e-01
4.616982e-052.434419e-020.320921682.148510e-030.0013188344.224729e-041.671158e-012.667831e-01
1.929503e-012.808606e-010.001778752.362496e-020.3272447703.820119e-013.156853e-011.851656e-01
2.744579e-025.054301e-020.159135651.581546e-040.0863971363.507312e-019.995466e-021.706981e-01
2.349326e-012.796353e-010.244664782.804030e-010.2515160323.344815e-011.402243e-023.982418e-01
1.711193e-096.569349e-030.034575804.631637e-010.4221356453.970892e-012.515601e-015.054854e-14
\n" ], "text/latex": [ "A matrix: 17 × 8 of type dbl\n", "\\begin{tabular}{llllllll}\n", " ALL & Ast & End & Exc & Inh & Mic & OPC & Oli\\\\\n", "\\hline\n", "\t 1.000725e-06 & 1.206902e-01 & 0.38024501 & 2.341416e-05 & 0.021238026 & 2.438671e-02 & 2.822754e-01 & 4.950394e-02\\\\\n", "\t 6.195229e-01 & 6.725207e-01 & 0.65046008 & 6.835928e-01 & 0.582288619 & 6.287649e-01 & 6.903822e-01 & 4.929722e-01\\\\\n", "\t 5.502451e-08 & 6.952073e-05 & 0.42849153 & 3.848081e-09 & 0.001349139 & 2.974207e-01 & 2.267577e-01 & 2.193946e-01\\\\\n", "\t 1.141256e-02 & 3.673827e-01 & 0.29984283 & 8.920542e-03 & 0.062520281 & 1.557580e-01 & 1.438823e-01 & 3.652727e-01\\\\\n", "\t 4.803276e-01 & 1.793390e-01 & 0.01916176 & 1.752551e-01 & 0.384661176 & 3.138391e-01 & 4.888775e-01 & 3.793039e-01\\\\\n", "\t 3.240321e-01 & 1.006760e-04 & 0.28423516 & 2.692014e-10 & 0.464610846 & 8.359305e-04 & 8.654122e-21 & 1.358764e-09\\\\\n", "\t 4.744668e-01 & 3.330669e-16 & 0.35728521 & 2.052281e-05 & 0.437127648 & 3.887724e-02 & 1.843843e-05 & 1.374751e-06\\\\\n", "\t 3.166308e-03 & 0.000000e+00 & 0.34573821 & 2.111164e-04 & 0.424465824 & 1.686171e-02 & 9.992007e-16 & 2.119553e-07\\\\\n", "\t 4.826046e-04 & 3.121070e-02 & 0.31763916 & 3.181940e-01 & 0.000107790 & 6.340894e-19 & 2.832298e-02 & 3.138958e-02\\\\\n", "\t 2.426264e-01 & 3.950080e-01 & 0.16192905 & 1.023271e-01 & 0.284297524 & 3.167406e-04 & 1.449052e-02 & 4.348748e-01\\\\\n", "\t 3.999852e-09 & 8.353939e-02 & 0.47399869 & 2.161030e-06 & 0.017915140 & 1.450814e-01 & 2.240465e-01 & 4.253355e-01\\\\\n", "\t 8.830382e-06 & 5.623546e-03 & 0.23046666 & 7.952619e-08 & 0.465964274 & 9.821045e-02 & 6.871004e-02 & 2.039190e-01\\\\\n", "\t 4.616982e-05 & 2.434419e-02 & 0.32092168 & 2.148510e-03 & 0.001318834 & 4.224729e-04 & 1.671158e-01 & 2.667831e-01\\\\\n", "\t 1.929503e-01 & 2.808606e-01 & 0.00177875 & 2.362496e-02 & 0.327244770 & 3.820119e-01 & 3.156853e-01 & 1.851656e-01\\\\\n", "\t 2.744579e-02 & 5.054301e-02 & 0.15913565 & 1.581546e-04 & 0.086397136 & 3.507312e-01 & 9.995466e-02 & 1.706981e-01\\\\\n", "\t 2.349326e-01 & 2.796353e-01 & 0.24466478 & 2.804030e-01 & 0.251516032 & 3.344815e-01 & 1.402243e-02 & 3.982418e-01\\\\\n", "\t 1.711193e-09 & 6.569349e-03 & 0.03457580 & 4.631637e-01 & 0.422135645 & 3.970892e-01 & 2.515601e-01 & 5.054854e-14\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 17 × 8 of type dbl\n", "\n", "| ALL | Ast | End | Exc | Inh | Mic | OPC | Oli |\n", "|---|---|---|---|---|---|---|---|\n", "| 1.000725e-06 | 1.206902e-01 | 0.38024501 | 2.341416e-05 | 0.021238026 | 2.438671e-02 | 2.822754e-01 | 4.950394e-02 |\n", "| 6.195229e-01 | 6.725207e-01 | 0.65046008 | 6.835928e-01 | 0.582288619 | 6.287649e-01 | 6.903822e-01 | 4.929722e-01 |\n", "| 5.502451e-08 | 6.952073e-05 | 0.42849153 | 3.848081e-09 | 0.001349139 | 2.974207e-01 | 2.267577e-01 | 2.193946e-01 |\n", "| 1.141256e-02 | 3.673827e-01 | 0.29984283 | 8.920542e-03 | 0.062520281 | 1.557580e-01 | 1.438823e-01 | 3.652727e-01 |\n", "| 4.803276e-01 | 1.793390e-01 | 0.01916176 | 1.752551e-01 | 0.384661176 | 3.138391e-01 | 4.888775e-01 | 3.793039e-01 |\n", "| 3.240321e-01 | 1.006760e-04 | 0.28423516 | 2.692014e-10 | 0.464610846 | 8.359305e-04 | 8.654122e-21 | 1.358764e-09 |\n", "| 4.744668e-01 | 3.330669e-16 | 0.35728521 | 2.052281e-05 | 0.437127648 | 3.887724e-02 | 1.843843e-05 | 1.374751e-06 |\n", "| 3.166308e-03 | 0.000000e+00 | 0.34573821 | 2.111164e-04 | 0.424465824 | 1.686171e-02 | 9.992007e-16 | 2.119553e-07 |\n", "| 4.826046e-04 | 3.121070e-02 | 0.31763916 | 3.181940e-01 | 0.000107790 | 6.340894e-19 | 2.832298e-02 | 3.138958e-02 |\n", "| 2.426264e-01 | 3.950080e-01 | 0.16192905 | 1.023271e-01 | 0.284297524 | 3.167406e-04 | 1.449052e-02 | 4.348748e-01 |\n", "| 3.999852e-09 | 8.353939e-02 | 0.47399869 | 2.161030e-06 | 0.017915140 | 1.450814e-01 | 2.240465e-01 | 4.253355e-01 |\n", "| 8.830382e-06 | 5.623546e-03 | 0.23046666 | 7.952619e-08 | 0.465964274 | 9.821045e-02 | 6.871004e-02 | 2.039190e-01 |\n", "| 4.616982e-05 | 2.434419e-02 | 0.32092168 | 2.148510e-03 | 0.001318834 | 4.224729e-04 | 1.671158e-01 | 2.667831e-01 |\n", "| 1.929503e-01 | 2.808606e-01 | 0.00177875 | 2.362496e-02 | 0.327244770 | 3.820119e-01 | 3.156853e-01 | 1.851656e-01 |\n", "| 2.744579e-02 | 5.054301e-02 | 0.15913565 | 1.581546e-04 | 0.086397136 | 3.507312e-01 | 9.995466e-02 | 1.706981e-01 |\n", "| 2.349326e-01 | 2.796353e-01 | 0.24466478 | 2.804030e-01 | 0.251516032 | 3.344815e-01 | 1.402243e-02 | 3.982418e-01 |\n", "| 1.711193e-09 | 6.569349e-03 | 0.03457580 | 4.631637e-01 | 0.422135645 | 3.970892e-01 | 2.515601e-01 | 5.054854e-14 |\n", "\n" ], "text/plain": [ " ALL Ast End Exc Inh \n", " [1,] 1.000725e-06 1.206902e-01 0.38024501 2.341416e-05 0.021238026\n", " [2,] 6.195229e-01 6.725207e-01 0.65046008 6.835928e-01 0.582288619\n", " [3,] 5.502451e-08 6.952073e-05 0.42849153 3.848081e-09 0.001349139\n", " [4,] 1.141256e-02 3.673827e-01 0.29984283 8.920542e-03 0.062520281\n", " [5,] 4.803276e-01 1.793390e-01 0.01916176 1.752551e-01 0.384661176\n", " [6,] 3.240321e-01 1.006760e-04 0.28423516 2.692014e-10 0.464610846\n", " [7,] 4.744668e-01 3.330669e-16 0.35728521 2.052281e-05 0.437127648\n", " [8,] 3.166308e-03 0.000000e+00 0.34573821 2.111164e-04 0.424465824\n", " [9,] 4.826046e-04 3.121070e-02 0.31763916 3.181940e-01 0.000107790\n", "[10,] 2.426264e-01 3.950080e-01 0.16192905 1.023271e-01 0.284297524\n", "[11,] 3.999852e-09 8.353939e-02 0.47399869 2.161030e-06 0.017915140\n", "[12,] 8.830382e-06 5.623546e-03 0.23046666 7.952619e-08 0.465964274\n", "[13,] 4.616982e-05 2.434419e-02 0.32092168 2.148510e-03 0.001318834\n", "[14,] 1.929503e-01 2.808606e-01 0.00177875 2.362496e-02 0.327244770\n", "[15,] 2.744579e-02 5.054301e-02 0.15913565 1.581546e-04 0.086397136\n", "[16,] 2.349326e-01 2.796353e-01 0.24466478 2.804030e-01 0.251516032\n", "[17,] 1.711193e-09 6.569349e-03 0.03457580 4.631637e-01 0.422135645\n", " Mic OPC Oli \n", " [1,] 2.438671e-02 2.822754e-01 4.950394e-02\n", " [2,] 6.287649e-01 6.903822e-01 4.929722e-01\n", " [3,] 2.974207e-01 2.267577e-01 2.193946e-01\n", " [4,] 1.557580e-01 1.438823e-01 3.652727e-01\n", " [5,] 3.138391e-01 4.888775e-01 3.793039e-01\n", " [6,] 8.359305e-04 8.654122e-21 1.358764e-09\n", " [7,] 3.887724e-02 1.843843e-05 1.374751e-06\n", " [8,] 1.686171e-02 9.992007e-16 2.119553e-07\n", " [9,] 6.340894e-19 2.832298e-02 3.138958e-02\n", "[10,] 3.167406e-04 1.449052e-02 4.348748e-01\n", "[11,] 1.450814e-01 2.240465e-01 4.253355e-01\n", "[12,] 9.821045e-02 6.871004e-02 2.039190e-01\n", "[13,] 4.224729e-04 1.671158e-01 2.667831e-01\n", "[14,] 3.820119e-01 3.156853e-01 1.851656e-01\n", "[15,] 3.507312e-01 9.995466e-02 1.706981e-01\n", "[16,] 3.344815e-01 1.402243e-02 3.982418e-01\n", "[17,] 3.970892e-01 2.515601e-01 5.054854e-14" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "readRDS('output/mashr/protocol_example.EZ.posterior.rds')$lfsr" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## Global parameter settings" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[global]\n", "parameter: cwd = path('./mashr_workflow_output')\n", "# Input summary statistics data\n", "parameter: data = path\n", "parameter: prior_data = path\n", "parameter: vhat_data = path\n", "# Prefix of output files. If not specified, it will derive it from data.\n", "# If it is specified, for example, `--output-prefix AnalysisResults`\n", "# It will save output files as `{cwd}/AnalysisResults*`.\n", "parameter: output_prefix = ''\n", "parameter: output_suffix = 'all'\n", "# Exchangable effect (EE) or exchangable z-scores (EZ)\n", "parameter: effect_model = 'EE'\n", "parameter: container = \"\"\n", "# For cluster jobs, number commands to run per job\n", "parameter: job_size = 1\n", "# Wall clock time expected\n", "parameter: walltime = \"5h\"\n", "# Memory expected\n", "parameter: mem = \"16G\"\n", "# Number of threads\n", "parameter: numThreads = 1\n", "import re\n", "parameter: entrypoint= ('micromamba run -a \"\" -n' + ' ' + re.sub(r'(_apptainer:latest|_docker:latest|\\.sif)$', '', container.split('/')[-1])) if container else \"\"\n", "data = data.absolute()\n", "cwd = cwd.absolute()\n", "prior_data = prior_data.absolute()\n", "vhat_data = vhat_data.absolute()\n", "if len(output_prefix) == 0:\n", " output_prefix = f\"{data:bn}\"\n", "\n", "mash_model = file_target(f\"{cwd:a}/{output_prefix}.{effect_model}.mash_model.rds\")\n", "\n", "def sort_uniq(seq):\n", " seen = set()\n", " return [x for x in seq if not (x in seen or seen.add(x))]" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## `mashr` mixture model fitting\n", "\n", "Main reference are our `mashr` vignettes [this for mashr eQTL outline](https://stephenslab.github.io/mashr/articles/eQTL_outline.html) and [this for using FLASH prior](https://github.com/stephenslab/mashr/blob/master/vignettes/flash_mash.Rmd). \n", "\n", "The outcome of this workflow should be found under `./mashr_workflow_output` folder (can be configured). File names have pattern `*.mash_model_*.rds`. They can be used to computer posterior for input list of gene-SNP pairs (see next section)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "# Fit MASH mixture model (time estimate: <15min for 70K by 49 matrix)\n", "[mash_1]\n", "parameter: output_level = 4\n", "input: data, vhat_data, prior_data\n", "output: mash_model\n", "\n", "task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, tags = f'{step_name}_{_output:bn}' \n", "R: expand = \"${ }\", workdir = cwd, stderr = f\"{_output:n}.stderr\", stdout = f\"{_output:n}.stdout\", container = container, entrypoint=entrypoint\n", " library(mashr)\n", " dat = readRDS(${_input[0]:r})\n", " vhat = readRDS(${_input[1]:r})\n", " U = readRDS(${_input[2]:r})$U\n", " mash_data = mash_set_data(dat$random.b, Shat=dat$random.s, alpha=${1 if effect_model == 'EZ' else 0}, V=vhat, zero_Bhat_Shat_reset = 1E3)\n", " saveRDS(list(mash_model = mash(mash_data, Ulist = U, outputlevel = ${output_level}), vhat_file=${_input[1]:r}, prior_file=${_input[2]:r}), ${_output:r})" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## Optional posterior computations\n", "\n", "Additionally provide posterior for the \"strong\" set in MASH input data." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "# Compute posterior for the \"strong\" set of data as in Urbut et al 2017.\n", "# This is optional because most of the time we want to apply the \n", "# MASH model learned on much larger data-set.\n", "[mash_2]\n", "# default to True; use --no-compute-posterior to disable this\n", "parameter: compute_posterior = True\n", "# input Vhat file for the batch of posterior data\n", "skip_if(not compute_posterior)\n", "\n", "input: data, vhat_data, mash_model\n", "output: f\"{cwd:a}/{output_prefix}.{effect_model}.posterior.rds\"\n", "\n", "task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, tags = f'{step_name}_{_output:bn}' \n", "R: expand = \"${ }\", workdir = cwd, stderr = f\"{_output:n}.stderr\", stdout = f\"{_output:n}.stdout\", container = container, entrypoint=entrypoint\n", " library(mashr)\n", " dat = readRDS(${_input[0]:r})\n", " vhat = readRDS(${_input[1]:r})\n", " mash_data = mash_set_data(dat$strong.b, Shat=dat$strong.s, alpha=${1 if effect_model == 'EZ' else 0}, V=vhat, zero_Bhat_Shat_reset = 1E3)\n", " mash_model = readRDS(${_input[2]:ar})$mash_model\n", " saveRDS(mash_compute_posterior_matrices(mash_model, mash_data), ${_output:r})" ] } ], "metadata": { "kernelspec": { "display_name": "SoS", "language": "sos", "name": "sos" }, "language_info": { "codemirror_mode": "sos", "file_extension": ".sos", "mimetype": "text/x-sos", "name": "sos", "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter", "pygments_lexer": "sos" }, "sos": { "kernels": [ [ "Bash", "bash", "Bash", "#E6EEFF", "shell" ], [ "R", "ir", "R", "#DCDCDA", "" ], [ "SoS", "sos", "", "", "sos" ] ], "panel": { "displayed": true, "height": 0, "style": "side" }, "version": "0.24.0" } }, "nbformat": 4, "nbformat_minor": 4 }