{ "cells": [ { "cell_type": "markdown", "id": "1451f16f-f4d6-413a-b994-63116ea54f47", "metadata": { "kernel": "SoS" }, "source": [ "# Fineboost (single trait ColocBoost)" ] }, { "cell_type": "markdown", "id": "5e9f261a-c03d-4e70-95c0-4844c59a577d", "metadata": { "kernel": "SoS" }, "source": [ "## Goal\n", "\n", "This notebook takes the input and run colocboost (single trait, Fineboost), the result can be summarized to get power and FDR.\n", "\n", "## Input \n", "\n", "Individual level data X and Y or summary statistics z and LD from other notebooks. file names of these data should be put in the parameter: simufile.\n", "\n", "## Output\n", "\n", "Fineboost / fine-mapping original result, along with some summarized elements.\n", "\n", "Example output:" ] }, { "cell_type": "code", "execution_count": 4, "id": "c1396790-7ff3-4535-98a4-e20ac6ebae50", "metadata": { "kernel": "R", "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 2654
  2. 8038
  3. 10675
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 2654\n", "\\item 8038\n", "\\item 10675\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 2654\n", "2. 8038\n", "3. 10675\n", "\n", "\n" ], "text/plain": [ "[1] 2654 8038 10675" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "[1] \"=====Prediction=====\"\n" ] }, { "data": { "text/html": [ "3" ], "text/latex": [ "3" ], "text/markdown": [ "3" ], "text/plain": [ "[1] 3" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "3" ], "text/latex": [ "3" ], "text/markdown": [ "3" ], "text/plain": [ "[1] 3" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "[1] \"=====Truth=====\"\n" ] }, { "data": { "text/html": [ "3" ], "text/latex": [ "3" ], "text/markdown": [ "3" ], "text/plain": [ "[1] 3" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "3" ], "text/latex": [ "3" ], "text/markdown": [ "3" ], "text/plain": [ "[1] 3" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "result = readRDS(\"/home/hs3393/cloud_colocalization/simulation_result/fineboost/sample_41_h2g_0.05_ncausal_3_ntrait_fineboost.rds\")\n", "\n", "result$var\n", "\n", "print(\"=====Prediction=====\")\n", "# colocalization result - CoS\n", "result$true_cs_num\n", "\n", "# colocalization result - colocalizing trait\n", "result$total_cs_num\n", "\n", "print(\"=====Truth=====\")\n", "# true colocalizing CoS\n", "result$cover_var_num\n", "\n", "# true colocalizing trait\n", "result$total_var_num" ] }, { "cell_type": "markdown", "id": "07a7b8c2-d50e-4b7d-a5af-35695f6e4101", "metadata": { "kernel": "SoS" }, "source": [ "## Fineboost Running code" ] }, { "cell_type": "code", "execution_count": null, "id": "4cc954fd-305a-4b98-a1e1-46382fbfea03", "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[fineboost_Jan]\n", "parameter: simufile = paths\n", "parameter: cwd = path(\"output\")\n", "parameter: job_size = 15\n", "parameter: walltime = \"80h\"\n", "parameter: mem = \"60G\"\n", "parameter: numThreads = 3\n", "parameter: trait = 1\n", "parameter: container = \"\"\n", "input: simufile, group_by = 1\n", "output: f'{cwd:a}/{_input[0]:bn}_ntrait_{step_name}.rds'\n", "task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output[0]:bn}'\n", "R: expand = '${ }', stdout = f\"{_output:n}.stdout\", stderr = f\"{_output:n}.stderr\", container = container \n", " for(file in list.files(\"/home/xc2270/COLOCBoost/code_COLOCBoost/colocboost_updating/\", full.names = T)) {source(file)}\n", " rds = readRDS(${_input:ar})\n", "\n", " X = rds$X\n", " Y = as.matrix(rds$Y[, 1, drop = FALSE])\n", " variant = rds$variant[[1]]\n", "\n", "\n", " start_time <- Sys.time()\n", " colocboost_result = colocboost(\n", " X = X, \n", " Y = Y, output_level=2)\n", " end_time <- Sys.time()\n", "\n", " # record true variant, analysed trait number and corresponding file name\n", " colocboost_result$var = variant\n", " colocboost_result$file = \"${_input[0]:a}\"\n", " # show which snp appear in at least two traits as the coloc_variants\n", " all_var = unlist(variant)\n", " \n", " cs_num = length(colocboost_result$ucos_details$ucos$ucos_index)\n", " \n", " cover_var_num = 0\n", " true_cs_num = 0\n", " if(length(colocboost_result$ucos_details$ucos$ucos_index) == 0){\n", " cover_var_num = 0 \n", " true_cs_num = 0\n", " total_cs_num = 0\n", " }else{\n", " \n", " # count how many var are covered\n", " \n", " # count how many var are covered\n", " for(i in 1:length(all_var)){\n", " cs_vars = unlist(colocboost_result$ucos_details$ucos$ucos_index)\n", " if(all_var[i] %in% cs_vars){\n", " cover_var_num = cover_var_num + 1 \n", " }\n", "\n", " }\n", "\n", " # cover how many cs are true\n", " for(i in 1:cs_num){\n", " if( length( intersect( colocboost_result$ucos_details$ucos$ucos_index[[i]] , all_var )) > 0 ){\n", " true_cs_num = true_cs_num + 1 \n", " }\n", " }\n", " }\n", " \n", " \n", " colocboost_report = colocboost_result\n", " colocboost_report$cover_var_num = cover_var_num \n", " colocboost_report$total_var_num = length(all_var)\n", " colocboost_report$true_cs_num = true_cs_num \n", " colocboost_report$total_cs_num = cs_num\n", " colocboost_report$time = end_time - start_time\n", " saveRDS(colocboost_report, ${_output:r})" ] }, { "cell_type": "markdown", "id": "d23b46f1-ccac-4aa4-9cbd-fac557f2a03f", "metadata": { "kernel": "SoS" }, "source": [ "## Fineboost summary code" ] }, { "cell_type": "code", "execution_count": null, "id": "55f3c958-74e2-4d3b-b8ff-932c5869863c", "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[fineboost_summary]\n", "parameter: folder = path\n", "parameter: cwd = path(\"output\")\n", "parameter: job_size = 1\n", "parameter: walltime = \"50h\"\n", "parameter: mem = \"60G\"\n", "parameter: numThreads = 1\n", "parameter: container = \"\"\n", "input: folder , group_by = 1\n", "output: f'{cwd:a}/{_input[0]:b}_summary.rds'\n", "task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output[0]:bn}'\n", "R: expand = '${ }', stdout = f\"{_output:n}.stdout\", stderr = f\"{_output:n}.stderr\", container = container\n", " library(\"tidyverse\")\n", " source(\"/home/hs3393/cloud_colocalization/simulation_code/fineboost_summary.r\")\n", " filenames <- list.files(${folder:r}, pattern=\"*.rds$\", full.names=TRUE, recursive = T)\n", " read_and_extract_rds <- function(file_path) {\n", " rds_file <- readRDS(file_path) # Read the RDS file\n", " if(length(rds_file$ucos_details$ucos$ucos_index) == 0){\n", " rds_file$ucos_details$ucos$ucos_index = NULL\n", " }\n", " data = fineboost_summary(unlist(rds_file$var), \n", " rds_file$ucos_details$ucos$ucos_index)\n", " table <- tibble(match_variant_number = data$match_variant,\n", " total_variant_number = data$total_variant_number,\n", " true_trait_number = data$true_trait_number,\n", " total_set_number = data$total_set_number,\n", " file = rds_file$file,\n", " out_file = file_path)\n", " return(table)\n", " }\n", "\n", " combined_table <- map_dfr(filenames, read_and_extract_rds)\n", " saveRDS(combined_table, ${_output:r})\n", "\n", " \n", " " ] }, { "cell_type": "markdown", "id": "b36035fe-b321-40c0-89f6-54ea2944de86", "metadata": { "kernel": "SoS" }, "source": [ "### Fineboost batch file" ] }, { "cell_type": "code", "execution_count": null, "id": "305bf2ba-3f10-46ed-9354-9b239c6d75be", "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "work_dir=\"/home/hs3393/cb_simulation/simulation_result/fineboost\"\n", "job=\"fineboost\"\n", "mkdir -p ${work_dir}\n", "\n", "mkdir -p ${work_dir}/code\n", "cd ${work_dir}/code\n", "\n", "# Create the base_script file and write the bash code into it\n", "cat << 'EOF' > base_script\n", "#!/bin/bash -l\n", "# NOTE the -l flag!\n", "#\n", "#SBATCH -t 68:00:00\n", "#SBATCH --mem=25000\n", "#SBATCH -J JOB\n", "#SBATCH -o PWD/JOB.%j.out\n", "#SBATCH -e PWD/JOB.%j.err\n", "\n", "source ~/mamba_activate.sh\n", "module load Singularity\n", "\n", "cd /home/hs3393/cloud_colocalization/simulation_data/simulation_signal/causal_CAUSAL\n", "sos run /home/hs3393/cb_simulation/simulation_code/Fineboost.ipynb fineboost \\\n", " --simufile $(find -type f -name '*ncausal_CAUSAL*.rds') \\\n", " --mem 25G \\\n", " --cwd PWD/\n", "EOF\n", "\n", "\n", "for causal in 1 2 3 4 5; do\n", " output_script=\"causal_${causal}.sh\"\n", " cat base_script | sed \"s|PWD|${work_dir}|g\" | sed \"s|CAUSAL|${causal}|g\" | sed \"s|JOB|${job}|g\" > ${output_script}\n", " sbatch ${output_script}\n", "done\n" ] }, { "cell_type": "markdown", "id": "f87dce8e-0d5e-4fb5-b973-5cbd9426d0e0", "metadata": { "kernel": "SoS" }, "source": [ "### Fineboost summary batch file" ] }, { "cell_type": "code", "execution_count": null, "id": "eb43fdd4-83e0-408b-8c9f-e794055f996d", "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "#!/bin/bash -l\n", "# NOTE the -l flag!\n", "#\n", "#SBATCH -t 48:00:00\n", "#SBATCH --mem=40000\n", "#SBATCH -J sum\n", "#SBATCH -o /home/hs3393/cloud_colocalization/simulation_result/fineboost/log/summary.%j.out\n", "#SBATCH -e /home/hs3393/cloud_colocalization/simulation_result/fineboost/log/summary.%j.err\n", "\n", "source ~/mamba_activate.sh\n", "module load Singularity\n", "\n", "sos run /home/hs3393/cloud_colocalization/simulation_code/Fineboost.ipynb fineboost_summary \\\n", " --folder /home/hs3393/cb_tune/simulation_result/fineboost/ \\\n", " --cwd /home/hs3393/cb_tune/simulation_result/fineboost//summary" ] }, { "cell_type": "markdown", "id": "02e4c35e-a416-4f86-ae13-4d6466c78f9a", "metadata": { "kernel": "SoS" }, "source": [ "## SuSiE running code" ] }, { "cell_type": "code", "execution_count": null, "id": "f7c3a1c1-1866-4ffd-bd4c-9626f20ca51e", "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[susie]\n", "parameter: simufile = paths\n", "parameter: cwd = path(\"output\")\n", "parameter: job_size = 10\n", "parameter: walltime = \"20h\"\n", "parameter: mem = \"50G\"\n", "parameter: numThreads = 3\n", "parameter: container = \"\"\n", "input: simufile, group_by = 1\n", "output: f'{cwd:a}/{_input[0]:bn}_susie.rds'\n", "task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output[0]:bn}'\n", "R: expand = '${ }', stdout = f\"{_output:n}.stdout\", stderr = f\"{_output:n}.stderr\", container = container \n", " library(\"susieR\")\n", " # read 50 traits a s group, if we are going to analyse 3 traits-colocalization, we\n", " # select the first 3 traits X-Y pari in this group\n", " file_list = c(${_input:ar,})\n", " rds_file = readRDS(file_list[1])\n", "\n", " rds = readRDS(${_input:ar})\n", "\n", "\n", " X = rds$X\n", " Y = as.matrix(rds$Y[, 1, drop = FALSE])\n", " variant = rds$variant[[1]]\n", " \n", " \n", " susie_result = susie(X, Y)\n", " # record true variant, analysed trait number and corresponding file name\n", " susie_result$var = variant\n", " susie_result$file = \"${_input[0]:a}\"\n", " # show which snp appear in at least two traits as the coloc_variants\n", " all_var = unlist(variant)\n", " \n", " cs_num = length(susie_result$sets$cs)\n", " \n", " cover_var_num = 0\n", " true_cs_num = 0\n", " if(length(susie_result$sets$cs) == 0){\n", " cover_var_num = 0 \n", " true_cs_num = 0\n", " total_cs_num = 0\n", " }else{\n", " \n", " # count how many var are covered\n", " for(i in 1:length(all_var)){\n", " cs_vars = unlist(susie_result$sets$cs)\n", " if(all_var[i] %in% cs_vars){\n", " cover_var_num = cover_var_num + 1 \n", " }\n", "\n", " }\n", "\n", " # cover how many cs are true\n", " for(i in 1:cs_num){\n", " if( length( intersect( susie_result$sets$cs[[i]] , all_var )) > 0 ){\n", " true_cs_num = true_cs_num + 1 \n", " }\n", " }\n", " }\n", " \n", " \n", " susie_result$cover_var_num = cover_var_num \n", " susie_result$total_var_num = length(all_var)\n", " susie_result$true_cs_num = true_cs_num \n", " susie_result$total_cs_num = cs_num\n", " # show which snp appear in at least two traits as the coloc_variants\n", " saveRDS(susie_result, ${_output:r})" ] }, { "cell_type": "markdown", "id": "48174f67-1522-4bfb-a21a-4564ec345766", "metadata": { "kernel": "SoS" }, "source": [ "### Run SuSiE batch file" ] }, { "cell_type": "code", "execution_count": null, "id": "306b1e5d-2317-4239-8f73-72b15b97952c", "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "work_dir=\"/home/hs3393//cloud_colocalization/simulation_result/susie\"\n", "job=\"susie\"\n", "mkdir -p ${work_dir}\n", "\n", "mkdir -p ${work_dir}/code\n", "cd ${work_dir}/code\n", "\n", "# Create the base_script file and write the bash code into it\n", "cat << 'EOF' > base_script\n", "#!/bin/bash -l\n", "# NOTE the -l flag!\n", "#\n", "#SBATCH -t 68:00:00\n", "#SBATCH --mem=25000\n", "#SBATCH -J JOB\n", "#SBATCH -o PWD/JOB.%j.out\n", "#SBATCH -e PWD/JOB.%j.err\n", "\n", "source ~/mamba_activate.sh\n", "module load Singularity\n", "\n", "cd /home/hs3393/cloud_colocalization/simulation_data/simulation_signal/causal_CAUSAL\n", "sos run /home/hs3393//cloud_colocalization/simulation_code/Fineboost.ipynb susie \\\n", " --simufile $(find -type f -name '*ncausal_CAUSAL*.rds') \\\n", " --mem 25G \\\n", " --cwd PWD/\n", "EOF\n", "\n", "\n", "for causal in 1 2 3 4 5; do\n", " output_script=\"causal_${causal}.sh\"\n", " cat base_script | sed \"s|PWD|${work_dir}|g\" | sed \"s|CAUSAL|${causal}|g\" | sed \"s|JOB|${job}|g\" > ${output_script}\n", " sbatch ${output_script}\n", "done\n" ] }, { "cell_type": "markdown", "id": "11f35189-3b31-4e26-a6ac-fe99fd64128e", "metadata": { "kernel": "SoS" }, "source": [ "### SuSiE summary batch file" ] }, { "cell_type": "code", "execution_count": null, "id": "1da41e6e-a297-45c3-b9fd-03fe0ead492a", "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "#!/bin/bash -l\n", "# NOTE the -l flag!\n", "#\n", "#SBATCH -t 48:00:00\n", "#SBATCH --mem=40000\n", "#SBATCH -J susie\n", "#SBATCH -o /home/hs3393/cloud_colocalization/simulation_result/susie/log/summary.%j.out\n", "#SBATCH -e /home/hs3393/cloud_colocalization/simulation_result/susie/log/summary.%j.err\n", "\n", "source ~/mamba_activate.sh\n", "module load Singularity\n", "\n", "sos run /home/hs3393/cloud_colocalization/simulation_code/Fineboost.ipynb fineboost_summary \\\n", " --folder /home/hs3393/cb_simulation/simulation_result/fineboost \\\n", " --cwd /home/hs3393/cb_simulation/simulation_result/fineboost/summary" ] } ], "metadata": { "kernelspec": { "display_name": "SoS", "language": "sos", "name": "sos" }, "language_info": { "codemirror_mode": "sos", "file_extension": ".sos", "mimetype": "text/x-sos", "name": "sos", "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter", "pygments_lexer": "sos" }, "sos": { "kernels": [ [ "R", "ir", "R", "#DCDCDA", "r" ], [ "SoS", "sos", "", "", "sos" ] ], "version": "0.24.3" } }, "nbformat": 4, "nbformat_minor": 5 }