{ "cells": [ { "cell_type": "markdown", "id": "9144495b-2433-4bb9-9b6f-6e282ea07891", "metadata": {}, "source": [ "# 3 LINCS SCIPLEX GENE MATCHING\n", "\n", "**Requires**\n", "* `'lincs_full_smiles.h5ad'`\n", "* `'sciplex_raw_chunk_{i}.h5ad'` with $i \\in \\{0,1,2,3,4\\}$\n", "\n", "**Output**\n", "* `'sciplex3_matched_genes_lincs.h5ad'`\n", "* `lincs`: `'sciplex3_lincs_genes.h5ad'`\n", "* `sciplex`: `'lincs_full_smiles_sciplex_genes.h5ad'`\n", "\n", "\n", "\n", "## Description \n", "\n", "The goal of this notebook is to match and merge genes between the LINCS and SciPlex datasets, resulting in the creation of three new datasets:\n", "\n", "### Created datasets\n", "\n", "- **`sciplex3_matched_genes_lincs.h5ad`**: Contains **SciPlex observations**. **Genes are limited to the intersection** of the genes found in both LINCS and SciPlex datasets, and or highly variable genes in sciplex.\n", "\n", "\n", "- **`sciplex3_lincs_genes.h5ad`**: Contains **SciPlex data**, but filtered to include **only the genes that are shared with the LINCS dataset**. (strict intersection, 977 genes)\n", "\n", "- **`lincs_full_smiles_sciplex_genes.h5ad`**: Contains **LINCS data**, but filtered to include **only the genes that are shared with the SciPlex dataset**.\n", "\n", "\n", "\n", "To create these datasets, we need to match the genes between the two datasets, which is done as follows:\n", "\n", "### Gene Matching\n", "\n", "1. **Gene ID Assignment**: SciPlex gene names are standardized to Ensembl gene IDs by extracting the primary identifier and using either **sfaira** or a predefined mapping (`symbols_dict.json`). The LINCS dataset is already standardized.\n", "\n", "2. **Identifying Shared Genes**: We then compute the intersection of the gene IDs (`gene_id`) inside LINCS and SciPlex. Both datasets are then filtered to retain only these shared genes.\n", "\n", "3. **Reindexing**: The LINCS dataset is reindexed to match the order of genes in the SciPlex dataset.\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "9a33a003-9ca0-4994-955c-305852e4d354", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.2.0)/charset_normalizer (2.0.12) doesn't match a supported version!\n", " RequestsDependencyWarning)\n", "2023-08-19 10:31:31.638164: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "2023-08-19 10:31:34.020338: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n", "2023-08-19 10:31:34.020465: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n", "2023-08-19 10:31:34.020477: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.21.6 scipy==1.7.3 pandas==1.3.5 scikit-learn==1.0.2 statsmodels==0.13.2 pynndescent==0.5.6\n" ] } ], "source": [ "import os\n", "import sys\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import sfaira\n", "import warnings\n", "os.getcwd()\n", "\n", "from chemCPA.paths import DATA_DIR, PROJECT_DIR\n", "\n", "pd.set_option('display.max_columns', 100)\n", "\n", "root_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))\n", "sys.path.append(root_dir)\n", "import logging\n", "\n", "logging.basicConfig(level=logging.INFO)\n", "from notebook_utils import suppress_output\n", "\n", "import scanpy as sc\n", "with suppress_output():\n", " sc.set_figure_params(dpi=80, frameon=False)\n", " sc.logging.print_header()\n", " warnings.filterwarnings('ignore')\n", "\n", "# logging.info is visible when running as python script \n", "if not any('ipykernel' in arg for arg in sys.argv):\n", " logging.basicConfig(\n", " level=logging.INFO,\n", " format='%(asctime)s - %(levelname)s - %(message)s',\n", " datefmt='%Y-%m-%d %H:%M:%S'\n", " )" ] }, { "cell_type": "code", "execution_count": 2, "id": "d3c097de-7254-43e5-89e3-d3095e45f270", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The autoreload extension is already loaded. To reload it, use:\n", " %reload_ext autoreload\n" ] } ], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "id": "0c6073db-2c0a-413c-b4b5-17dcef7e064c", "metadata": { "tags": [] }, "source": [ "## Load data" ] }, { "cell_type": "markdown", "id": "3bcf6d3b-64c3-48cd-987b-f984c0e76ddd", "metadata": {}, "source": [ "Load lincs" ] }, { "cell_type": "code", "execution_count": 3, "id": "9a72ac06-5233-4e7a-ba63-79786d6d2c31", "metadata": { "tags": [] }, "outputs": [], "source": [ "adata_lincs = sc.read(DATA_DIR/'lincs_full_smiles.h5ad' )" ] }, { "cell_type": "markdown", "id": "05efc7d8-b2ac-4fb5-bfd5-0938e5b80b1a", "metadata": {}, "source": [ "Load sciplex " ] }, { "cell_type": "code", "execution_count": 4, "id": "b365aa7a-9957-4359-977d-dafe400df570", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.\n", " [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],\n" ] } ], "source": [ "from tqdm import tqdm\n", "from chemCPA.paths import DATA_DIR, PROJECT_DIR\n", "from raw_data.datasets import sciplex\n", "\n", "# Load and concatenate chunks\n", "adatas_sciplex = []\n", "logging.info(\"Starting to load in sciplex data\")\n", "\n", "# Get paths to all sciplex chunks\n", "chunk_paths = sciplex()\n", "\n", "# Load chunks with progress bar\n", "for chunk_path in tqdm(chunk_paths, desc=\"Loading sciplex chunks\"):\n", " tqdm.write(f\"Loading {os.path.basename(chunk_path)}\")\n", " adatas_sciplex.append(sc.read(chunk_path))\n", " \n", "adata_sciplex = adatas_sciplex[0].concatenate(adatas_sciplex[1:])\n", "logging.info(\"Sciplex data loaded\")" ] }, { "cell_type": "markdown", "id": "0f5c24ac-3b55-40f2-abee-22286b4c6d16", "metadata": {}, "source": [ "Add gene_id to sciplex" ] }, { "cell_type": "code", "execution_count": 5, "id": "b72d957d-18eb-4994-875b-0bdd9db254c9", "metadata": {}, "outputs": [], "source": [ "adata_sciplex.var['gene_id'] = adata_sciplex.var.id.str.split('.').str[0]\n", "adata_sciplex.var['gene_id'].head()" ] }, { "cell_type": "markdown", "id": "f0caf549-00e6-4d18-bb19-6c344c9f62c9", "metadata": { "tags": [] }, "source": [ "### Get gene ids from symbols via sfaira" ] }, { "cell_type": "markdown", "id": "065cb35d-ce63-4152-9fcb-ca939295bc29", "metadata": {}, "source": [ "Load genome container with sfaira" ] }, { "cell_type": "code", "execution_count": 6, "id": "11109df9-7724-4658-8865-7bb19ec98e3f", "metadata": {}, "outputs": [], "source": [ "try: \n", " # load json file with symbol to id mapping\n", " import json\n", " with open(DATA_DIR/ 'symbols_dict.json') as json_file:\n", " symbols_dict = json.load(json_file)\n", "except: \n", " logging.info(\"No symbols_dict.json found, falling back to sfaira\")\n", " genome_container = sfaira.versions.genomes.GenomeContainer(organism=\"homo_sapiens\", release=\"82\")\n", " symbols_dict = genome_container.symbol_to_id_dict\n", " # Extend symbols dict with unknown symbol\n", " symbols_dict.update({'PLSCR3':'ENSG00000187838'})" ] }, { "cell_type": "markdown", "id": "d071d09e-dc42-4872-8cf5-88ad20f35fc2", "metadata": {}, "source": [ "Identify genes that are shared between lincs and trapnell" ] }, { "cell_type": "code", "execution_count": 7, "id": "c9aba778-ade2-4bb1-b0e9-2e8bc7a95602", "metadata": { "tags": [] }, "outputs": [], "source": [ "# For lincs\n", "adata_lincs.var['gene_id'] = adata_lincs.var_names.map(symbols_dict)\n", "adata_lincs.var['in_sciplex'] = adata_lincs.var.gene_id.isin(adata_sciplex.var.gene_id)" ] }, { "cell_type": "code", "execution_count": 8, "id": "d79a3562-60ad-4795-823d-bdaed56e3fb5", "metadata": {}, "outputs": [], "source": [ "# For trapnell\n", "adata_sciplex.var['in_lincs'] = adata_sciplex.var.gene_id.isin(adata_lincs.var.gene_id)" ] }, { "cell_type": "markdown", "id": "7f4c3d35-0070-40d8-a87a-dce02883cf6e", "metadata": { "tags": [] }, "source": [ "## Preprocess sciplex dataset" ] }, { "cell_type": "markdown", "id": "ce8ecbc7-7d41-4e38-8817-f8c8d01ad29f", "metadata": {}, "source": [ "See `sciplex3.ipynb`" ] }, { "cell_type": "markdown", "id": "45442825-0f56-42bd-88f9-48fd0468010c", "metadata": {}, "source": [ "The original CPA implementation required to subset the data due to scaling limitations. \n", "In this version we expect to be able to handle the full sciplex dataset." ] }, { "cell_type": "code", "execution_count": 9, "id": "68ff6c9b-6e46-402c-96aa-a2da57af9c79", "metadata": {}, "outputs": [], "source": [ "SUBSET = False\n", "\n", "if SUBSET: \n", " sc.pp.subsample(adata_sciplex, fraction=0.5, random_state=42)" ] }, { "cell_type": "code", "execution_count": 10, "id": "363f0cf5-340c-4589-bc95-de8a1e22fbbe", "metadata": {}, "outputs": [], "source": [ "sc.pp.normalize_per_cell(adata_sciplex)" ] }, { "cell_type": "code", "execution_count": 11, "id": "fb7a8ae8-e5db-4c8e-bf0d-965f7c8e4dbe", "metadata": {}, "outputs": [], "source": [ "sc.pp.log1p(adata_sciplex)" ] }, { "cell_type": "code", "execution_count": 12, "id": "aecbc3c6-3882-442c-8b84-ce163f704b84", "metadata": {}, "outputs": [], "source": [ "sc.pp.highly_variable_genes(adata_sciplex, n_top_genes=1032, subset=False)" ] }, { "cell_type": "markdown", "id": "dc91a1a0-2011-4834-afb6-278206d15e71", "metadata": { "tags": [] }, "source": [ "### Combine HVG with lincs genes\n", "\n", "Union of genes that are considered highly variable and those that are shared with lincs" ] }, { "cell_type": "code", "execution_count": 13, "id": "761a5c25-2947-4f66-ab8c-c33a1b713444", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2000" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "((adata_sciplex.var.in_lincs) | (adata_sciplex.var.highly_variable)).sum()" ] }, { "cell_type": "markdown", "id": "db01d26e-e0f7-44a0-a8b8-380223049f81", "metadata": {}, "source": [ "Subset to that union of genes" ] }, { "cell_type": "code", "execution_count": 14, "id": "6e6328cd-cd03-4e72-85dc-b72eed2632f7", "metadata": {}, "outputs": [], "source": [ "adata_sciplex = adata_sciplex[:, (adata_sciplex.var.in_lincs) | (adata_sciplex.var.highly_variable)].copy()" ] }, { "cell_type": "markdown", "id": "b25d10e9-6e1e-4a13-a512-3580fc1295c8", "metadata": { "tags": [] }, "source": [ "### Create additional meta data " ] }, { "cell_type": "markdown", "id": "985349fe-37bf-4efc-9765-d612d8d440c8", "metadata": {}, "source": [ "Normalise dose values" ] }, { "cell_type": "code", "execution_count": 15, "id": "62b9e529-ca45-4f04-b2d6-75dd246aa36c", "metadata": {}, "outputs": [], "source": [ "adata_sciplex.obs['dose_val'] = adata_sciplex.obs.dose.astype(float) / np.max(adata_sciplex.obs.dose.astype(float))\n", "adata_sciplex.obs.loc[adata_sciplex.obs['product_name'].str.contains('Vehicle'), 'dose_val'] = 1.0" ] }, { "cell_type": "code", "execution_count": 16, "id": "ed4ea831-0650-4b6b-bc09-7fbaf5f004b7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.001 153013\n", "0.010 147670\n", "0.100 141828\n", "1.000 139266\n", "Name: dose_val, dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.obs['dose_val'].value_counts()" ] }, { "cell_type": "markdown", "id": "f4908bb2-4fd0-40d5-a6d7-24e68bcf9bb0", "metadata": {}, "source": [ "Change `product_name`" ] }, { "cell_type": "code", "execution_count": 17, "id": "71393716-2328-41d8-a077-ef8fc435bf61", "metadata": {}, "outputs": [], "source": [ "adata_sciplex.obs['product_name'] = [x.split(' ')[0] for x in adata_sciplex.obs['product_name']]\n", "adata_sciplex.obs.loc[adata_sciplex.obs['product_name'].str.contains('Vehicle'), 'product_name'] = 'control'" ] }, { "cell_type": "markdown", "id": "6bcd577f-500f-409e-bdc9-d1acac3dc583", "metadata": {}, "source": [ "Create copy of `product_name` with column name `control`" ] }, { "cell_type": "code", "execution_count": 18, "id": "b1cbcd93-43cb-4740-8c84-2331ccb4b066", "metadata": {}, "outputs": [], "source": [ "adata_sciplex.obs['condition'] = adata_sciplex.obs.product_name.copy()" ] }, { "cell_type": "markdown", "id": "0d148e98-c819-4455-92b4-ccc5cf2a46de", "metadata": {}, "source": [ "Add combinations of drug (`condition`), dose (`dose_val`), and cell_type (`cell_type`)" ] }, { "cell_type": "code", "execution_count": 19, "id": "05249c77-612d-4236-af95-12cf2d2aefdf", "metadata": {}, "outputs": [], "source": [ "# make column of dataframe to categorical \n", "adata_sciplex.obs[\"condition\"] = adata_sciplex.obs[\"condition\"].astype('category').cat.rename_categories({\"(+)-JQ1\": \"JQ1\"})\n", "adata_sciplex.obs['drug_dose_name'] = adata_sciplex.obs.condition.astype(str) + '_' + adata_sciplex.obs.dose_val.astype(str)\n", "adata_sciplex.obs['cov_drug_dose_name'] = adata_sciplex.obs.cell_type.astype(str) + '_' + adata_sciplex.obs.drug_dose_name.astype(str)\n", "adata_sciplex.obs['cov_drug'] = adata_sciplex.obs.cell_type.astype(str) + '_' + adata_sciplex.obs.condition.astype(str)" ] }, { "cell_type": "markdown", "id": "58850330-62b6-4d2c-a533-1cf238663805", "metadata": {}, "source": [ "Add `control` columns with vale `1` where only the vehicle was used" ] }, { "cell_type": "code", "execution_count": 20, "id": "5b7be27d-e6b8-42d6-b21b-400ddd5b3641", "metadata": {}, "outputs": [], "source": [ "adata_sciplex.obs['control'] = [1 if x == 'control_1.0' else 0 for x in adata_sciplex.obs.drug_dose_name.values]" ] }, { "cell_type": "markdown", "id": "c06409b8-d30e-45a1-82aa-784fa5c2f1b0", "metadata": { "tags": [] }, "source": [ "## Compute DE genes" ] }, { "cell_type": "code", "execution_count": 21, "id": "7753ba03-c908-4011-8365-2d871574cd56", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A549\n", "WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:394: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'names'] = self.var_names[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:396: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'scores'] = scores[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:399: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals'] = pvals[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:409: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals_adj'] = pvals_adj[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:421: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " foldchanges[global_indices]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "MCF7\n", "WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:394: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'names'] = self.var_names[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:396: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'scores'] = scores[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:399: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals'] = pvals[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:409: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals_adj'] = pvals_adj[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:421: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " foldchanges[global_indices]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "K562\n", "WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:394: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'names'] = self.var_names[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:396: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'scores'] = scores[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:399: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals'] = pvals[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:409: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals_adj'] = pvals_adj[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:421: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " foldchanges[global_indices]\n" ] } ], "source": [ "from chemCPA.helper import rank_genes_groups_by_cov\n", "\n", "rank_genes_groups_by_cov(adata_sciplex, groupby='cov_drug', covariate='cell_type', control_group='control', key_added='all_DEGs')" ] }, { "cell_type": "code", "execution_count": 22, "id": "4d9f098d-a04b-4407-a3a2-5041cfb480ec", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A549\n", "WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:394: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'names'] = self.var_names[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:396: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'scores'] = scores[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:399: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals'] = pvals[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:409: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals_adj'] = pvals_adj[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:421: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " foldchanges[global_indices]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "MCF7\n", "WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:394: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'names'] = self.var_names[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:396: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'scores'] = scores[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:399: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals'] = pvals[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:409: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals_adj'] = pvals_adj[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:421: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " foldchanges[global_indices]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "K562\n", "WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.\n", " df[key] = c\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:394: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'names'] = self.var_names[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:396: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'scores'] = scores[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:399: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals'] = pvals[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:409: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " self.stats[group_name, 'pvals_adj'] = pvals_adj[global_indices]\n", "/nfs/staff-hdd/hetzell/miniconda3/envs/chemical_CPA/lib/python3.7/site-packages/scanpy/tools/_rank_genes_groups.py:421: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n", " foldchanges[global_indices]\n" ] } ], "source": [ "adata_subset = adata_sciplex[:, adata_sciplex.var.in_lincs].copy()\n", "rank_genes_groups_by_cov(adata_subset, groupby='cov_drug', covariate='cell_type', control_group='control', key_added='lincs_DEGs')\n", "adata_sciplex.uns['lincs_DEGs'] = adata_subset.uns['lincs_DEGs']" ] }, { "cell_type": "markdown", "id": "428882cd-4e02-4af0-b281-51210aafbf79", "metadata": {}, "source": [ "### Map all unique `cov_drug_dose_name` to the computed DEGs, independent of the dose value\n", "\n", "Create mapping between names with dose and without dose" ] }, { "cell_type": "code", "execution_count": 23, "id": "238338ab-4950-4c29-94c5-2d2c4b5738ad", "metadata": {}, "outputs": [], "source": [ "cov_drug_dose_unique = adata_sciplex.obs.cov_drug_dose_name.unique()" ] }, { "cell_type": "code", "execution_count": 24, "id": "08aea617-1179-43e1-8391-d38eeee3b748", "metadata": {}, "outputs": [], "source": [ "remove_dose = lambda s: '_'.join(s.split('_')[:-1])\n", "cov_drug = pd.Series(cov_drug_dose_unique).apply(remove_dose)\n", "dose_no_dose_dict = dict(zip(cov_drug_dose_unique, cov_drug))" ] }, { "cell_type": "markdown", "id": "c78a6e80-5012-442f-b7bf-a6c581da92dd", "metadata": {}, "source": [ "### Compute new dicts for DEGs" ] }, { "cell_type": "code", "execution_count": 25, "id": "b6594da7-64c0-4212-8cca-58d247b2cc5f", "metadata": {}, "outputs": [], "source": [ "uns_keys = ['all_DEGs', 'lincs_DEGs']" ] }, { "cell_type": "code", "execution_count": 26, "id": "d5c73b35-31d3-4814-a5c0-658b23f1d0a1", "metadata": {}, "outputs": [], "source": [ "for uns_key in uns_keys:\n", " new_DEGs_dict = {}\n", "\n", " df_DEGs = pd.Series(adata_sciplex.uns[uns_key])\n", "\n", " for key, value in dose_no_dose_dict.items():\n", " if 'control' in key:\n", " continue\n", " new_DEGs_dict[key] = df_DEGs.loc[value]\n", " adata_sciplex.uns[uns_key] = new_DEGs_dict" ] }, { "cell_type": "code", "execution_count": 27, "id": "f713118a-514c-4cdb-b887-7118508ee37c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 581777 × 2000\n", " obs: 'cell_type', 'dose', 'dose_character', 'dose_pattern', 'g1s_score', 'g2m_score', 'pathway', 'pathway_level_1', 'pathway_level_2', 'product_dose', 'product_name', 'proliferation_index', 'replicate', 'size_factor', 'target', 'vehicle', 'batch', 'n_counts', 'dose_val', 'condition', 'drug_dose_name', 'cov_drug_dose_name', 'cov_drug', 'control'\n", " var: 'id', 'num_cells_expressed-0-0', 'num_cells_expressed-1-0', 'num_cells_expressed-1', 'gene_id', 'in_lincs', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'\n", " uns: 'log1p', 'hvg', 'all_DEGs', 'lincs_DEGs'" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex" ] }, { "cell_type": "markdown", "id": "d23ef784-5747-46de-a9bd-d5d869ff8042", "metadata": { "tags": [] }, "source": [ "## Create sciplex splits\n", "\n", "This is not the right configuration fot the experiments we want but for the moment this is okay" ] }, { "cell_type": "markdown", "id": "6acf9d6e-d1af-4544-8021-8b9f4185d938", "metadata": { "tags": [] }, "source": [ "### OOD in Pathways" ] }, { "cell_type": "code", "execution_count": 28, "id": "4b70dba8-132a-41fa-8958-78812063b738", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "DNA damage & DNA repair 6640\n", "Epigenetic regulation 6093\n", "Tyrosine kinase signaling 5846\n", "Protein folding & Protein degradation 3863\n", "Neuronal signaling 3635\n", "Antioxidant 3616\n", "HIF signaling 3501\n", "Metabolic regulation 3470\n", "Focal adhesion signaling 3450\n", "Nuclear receptor signaling 3420\n", "JAK/STAT signaling 3155\n", "Apoptotic regulation 3141\n", "TGF/BMP signaling 2794\n", "PKC signaling 2778\n", "Cell cycle regulation 2237\n", "Other 0\n", "Vehicle 0\n", "Name: pathway_level_1, dtype: int64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.obs['split_ho_pathway'] = 'train' # reset\n", "\n", "ho_drugs = [\n", " # selection of drugs from various pathways\n", " \"Azacitidine\",\n", " \"Carmofur\",\n", " \"Pracinostat\",\n", " \"Cediranib\",\n", " \"Luminespib\",\n", " \"Crizotinib\",\n", " \"SNS-314\",\n", " \"Obatoclax\",\n", " \"Momelotinib\",\n", " \"AG-14361\",\n", " \"Entacapone\",\n", " \"Fulvestrant\",\n", " \"Mesna\",\n", " \"Zileuton\",\n", " \"Enzastaurin\",\n", " \"IOX2\",\n", " \"Alvespimycin\",\n", " \"XAV-939\",\n", " \"Fasudil\",\n", "]\n", "\n", "ho_drug_pathway = adata_sciplex.obs['condition'].isin(ho_drugs)\n", "adata_sciplex.obs.loc[ho_drug_pathway, 'pathway_level_1'].value_counts()" ] }, { "cell_type": "code", "execution_count": 29, "id": "65e41d95-3d6a-400b-b3d6-142161773d4d", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "57639" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ho_drug_pathway.sum()" ] }, { "cell_type": "code", "execution_count": 30, "id": "3ce7605f-8fef-4c7d-9b62-be3879bd2991", "metadata": {}, "outputs": [], "source": [ "adata_sciplex.obs.loc[ho_drug_pathway & (adata_sciplex.obs['dose_val'] == 1.0), 'split_ho_pathway'] = 'ood'\n", "\n", "test_idx = sc.pp.subsample(adata_sciplex[adata_sciplex.obs['split_ho_pathway'] != 'ood'], .15, copy=True).obs.index\n", "adata_sciplex.obs.loc[test_idx, 'split_ho_pathway'] = 'test'" ] }, { "cell_type": "code", "execution_count": 31, "id": "89cf167d-67bc-4603-b9f8-a73dd9980280", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
conditionAG-14361AlvespimycinAzacitidineCarmofurCediranibCrizotinibEntacaponeEnzastaurinFasudilFulvestrantIOX2LuminespibMesnaMomelotinibObatoclaxPracinostatSNS-314XAV-939Zileuton
pathway_level_1
Antioxidant0000000000003616000000
Apoptotic regulation0000000000000031410000
Cell cycle regulation0000000000000000223700
DNA damage & DNA repair3401003239000000000000000
Epigenetic regulation0031510000000000002942000
Focal adhesion signaling0000000034500000000000
HIF signaling0000000000350100000000
JAK/STAT signaling0000000000000315500000
Metabolic regulation0000000000000000003470
Neuronal signaling0000003635000000000000
Nuclear receptor signaling0000000003420000000000
PKC signaling0000000277800000000000
Protein folding & Protein degradation0185800000000020050000000
TGF/BMP signaling0000000000000000027940
Tyrosine kinase signaling0000306027860000000000000
\n", "
" ], "text/plain": [ "condition AG-14361 Alvespimycin Azacitidine \\\n", "pathway_level_1 \n", "Antioxidant 0 0 0 \n", "Apoptotic regulation 0 0 0 \n", "Cell cycle regulation 0 0 0 \n", "DNA damage & DNA repair 3401 0 0 \n", "Epigenetic regulation 0 0 3151 \n", "Focal adhesion signaling 0 0 0 \n", "HIF signaling 0 0 0 \n", "JAK/STAT signaling 0 0 0 \n", "Metabolic regulation 0 0 0 \n", "Neuronal signaling 0 0 0 \n", "Nuclear receptor signaling 0 0 0 \n", "PKC signaling 0 0 0 \n", "Protein folding & Protein degradation 0 1858 0 \n", "TGF/BMP signaling 0 0 0 \n", "Tyrosine kinase signaling 0 0 0 \n", "\n", "condition Carmofur Cediranib Crizotinib \\\n", "pathway_level_1 \n", "Antioxidant 0 0 0 \n", "Apoptotic regulation 0 0 0 \n", "Cell cycle regulation 0 0 0 \n", "DNA damage & DNA repair 3239 0 0 \n", "Epigenetic regulation 0 0 0 \n", "Focal adhesion signaling 0 0 0 \n", "HIF signaling 0 0 0 \n", "JAK/STAT signaling 0 0 0 \n", "Metabolic regulation 0 0 0 \n", "Neuronal signaling 0 0 0 \n", "Nuclear receptor signaling 0 0 0 \n", "PKC signaling 0 0 0 \n", "Protein folding & Protein degradation 0 0 0 \n", "TGF/BMP signaling 0 0 0 \n", "Tyrosine kinase signaling 0 3060 2786 \n", "\n", "condition Entacapone Enzastaurin Fasudil \\\n", "pathway_level_1 \n", "Antioxidant 0 0 0 \n", "Apoptotic regulation 0 0 0 \n", "Cell cycle regulation 0 0 0 \n", "DNA damage & DNA repair 0 0 0 \n", "Epigenetic regulation 0 0 0 \n", "Focal adhesion signaling 0 0 3450 \n", "HIF signaling 0 0 0 \n", "JAK/STAT signaling 0 0 0 \n", "Metabolic regulation 0 0 0 \n", "Neuronal signaling 3635 0 0 \n", "Nuclear receptor signaling 0 0 0 \n", "PKC signaling 0 2778 0 \n", "Protein folding & Protein degradation 0 0 0 \n", "TGF/BMP signaling 0 0 0 \n", "Tyrosine kinase signaling 0 0 0 \n", "\n", "condition Fulvestrant IOX2 Luminespib Mesna \\\n", "pathway_level_1 \n", "Antioxidant 0 0 0 3616 \n", "Apoptotic regulation 0 0 0 0 \n", "Cell cycle regulation 0 0 0 0 \n", "DNA damage & DNA repair 0 0 0 0 \n", "Epigenetic regulation 0 0 0 0 \n", "Focal adhesion signaling 0 0 0 0 \n", "HIF signaling 0 3501 0 0 \n", "JAK/STAT signaling 0 0 0 0 \n", "Metabolic regulation 0 0 0 0 \n", "Neuronal signaling 0 0 0 0 \n", "Nuclear receptor signaling 3420 0 0 0 \n", "PKC signaling 0 0 0 0 \n", "Protein folding & Protein degradation 0 0 2005 0 \n", "TGF/BMP signaling 0 0 0 0 \n", "Tyrosine kinase signaling 0 0 0 0 \n", "\n", "condition Momelotinib Obatoclax Pracinostat \\\n", "pathway_level_1 \n", "Antioxidant 0 0 0 \n", "Apoptotic regulation 0 3141 0 \n", "Cell cycle regulation 0 0 0 \n", "DNA damage & DNA repair 0 0 0 \n", "Epigenetic regulation 0 0 2942 \n", "Focal adhesion signaling 0 0 0 \n", "HIF signaling 0 0 0 \n", "JAK/STAT signaling 3155 0 0 \n", "Metabolic regulation 0 0 0 \n", "Neuronal signaling 0 0 0 \n", "Nuclear receptor signaling 0 0 0 \n", "PKC signaling 0 0 0 \n", "Protein folding & Protein degradation 0 0 0 \n", "TGF/BMP signaling 0 0 0 \n", "Tyrosine kinase signaling 0 0 0 \n", "\n", "condition SNS-314 XAV-939 Zileuton \n", "pathway_level_1 \n", "Antioxidant 0 0 0 \n", "Apoptotic regulation 0 0 0 \n", "Cell cycle regulation 2237 0 0 \n", "DNA damage & DNA repair 0 0 0 \n", "Epigenetic regulation 0 0 0 \n", "Focal adhesion signaling 0 0 0 \n", "HIF signaling 0 0 0 \n", "JAK/STAT signaling 0 0 0 \n", "Metabolic regulation 0 0 3470 \n", "Neuronal signaling 0 0 0 \n", "Nuclear receptor signaling 0 0 0 \n", "PKC signaling 0 0 0 \n", "Protein folding & Protein degradation 0 0 0 \n", "TGF/BMP signaling 0 2794 0 \n", "Tyrosine kinase signaling 0 0 0 " ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(adata_sciplex.obs.pathway_level_1, adata_sciplex.obs['condition'][adata_sciplex.obs.condition.isin(ho_drugs)])" ] }, { "cell_type": "code", "execution_count": 32, "id": "3325a1e0-dd95-4e53-a773-99df4b463767", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "train 483951\n", "test 85403\n", "ood 12423\n", "Name: split_ho_pathway, dtype: int64" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.obs['split_ho_pathway'].value_counts()" ] }, { "cell_type": "code", "execution_count": 33, "id": "dfecfaa3-55c2-4d7d-872b-0e0208eac6a6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Fasudil 966\n", "IOX2 913\n", "Mesna 884\n", "Entacapone 868\n", "Fulvestrant 836\n", "Zileuton 822\n", "Carmofur 767\n", "AG-14361 759\n", "Azacitidine 736\n", "Enzastaurin 694\n", "Pracinostat 658\n", "SNS-314 547\n", "Cediranib 528\n", "Momelotinib 487\n", "XAV-939 479\n", "Crizotinib 464\n", "Luminespib 405\n", "Obatoclax 404\n", "Alvespimycin 206\n", "Name: condition, dtype: int64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex[adata_sciplex.obs.split_ho_pathway == 'ood'].obs.condition.value_counts()" ] }, { "cell_type": "code", "execution_count": 34, "id": "a591e3d0-c1dd-4723-879f-76b37b16b962", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "control 1964\n", "ENMD-2076 914\n", "RG108 604\n", "GSK-LSD1 596\n", "Altretamine 573\n", " ... \n", "Luminespib 236\n", "Patupilone 228\n", "Flavopiridol 207\n", "Epothilone 181\n", "YM155 112\n", "Name: condition, Length: 188, dtype: int64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex[adata_sciplex.obs.split_ho_pathway == 'test'].obs.condition.value_counts()" ] }, { "cell_type": "markdown", "id": "ff1a2cd4-2a68-4f67-8e13-46fe8fe06c42", "metadata": { "tags": [] }, "source": [ "### OOD drugs in epigenetic regulation, Tyrosine kinase signaling, cell cycle regulation" ] }, { "cell_type": "code", "execution_count": 35, "id": "244d46ca-9ff8-4e4c-b225-c1a26c84b8da", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Epigenetic regulation 147875\n", "Tyrosine kinase signaling 85503\n", "JAK/STAT signaling 70922\n", "DNA damage & DNA repair 60042\n", "Cell cycle regulation 53952\n", "Other 19980\n", "Nuclear receptor signaling 19940\n", "Protein folding & Protein degradation 19191\n", "Metabolic regulation 17989\n", "Neuronal signaling 14071\n", "Antioxidant 13414\n", "Apoptotic regulation 13141\n", "Vehicle 13004\n", "HIF signaling 9279\n", "PKC signaling 8804\n", "TGF/BMP signaling 8774\n", "Focal adhesion signaling 5896\n", "Name: pathway_level_1, dtype: int64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.obs['pathway_level_1'].value_counts()" ] }, { "cell_type": "markdown", "id": "448e9822-947a-49ed-b80b-1485c60a218b", "metadata": { "tags": [] }, "source": [ "___\n", "\n", "#### Tyrosine signaling" ] }, { "cell_type": "code", "execution_count": 36, "id": "aac8694c-1c90-40b5-870e-ff1e41fb8527", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PD98059 3763\n", "AG-490 3533\n", "Motesanib 3363\n", "TGX-221 3358\n", "Ki8751 3347\n", " ... \n", "Fedratinib 0\n", "Filgotinib 0\n", "Flavopiridol 0\n", "Fluorouracil 0\n", "control 0\n", "Name: condition, Length: 188, dtype: int64" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.obs.loc[adata_sciplex.obs.pathway_level_1.isin([\"Tyrosine kinase signaling\"]),'condition'].value_counts()" ] }, { "cell_type": "code", "execution_count": 37, "id": "cbce2aca-0b65-456c-bf26-f01a982b2e99", "metadata": {}, "outputs": [], "source": [ "tyrosine_drugs = adata_sciplex.obs.loc[adata_sciplex.obs.pathway_level_1.isin([\"Tyrosine kinase signaling\"]),'condition'].unique()" ] }, { "cell_type": "code", "execution_count": 38, "id": "a7f03a94-0ee5-4e84-9367-020a0b20988e", "metadata": {}, "outputs": [], "source": [ "adata_sciplex.obs['split_tyrosine_ood'] = 'train' \n", "\n", "test_idx = sc.pp.subsample(adata_sciplex[adata_sciplex.obs.pathway_level_1.isin([\"Tyrosine kinase signaling\"])], .20, copy=True).obs.index\n", "adata_sciplex.obs.loc[test_idx, 'split_tyrosine_ood'] = 'test'\n", "\n", "adata_sciplex.obs.loc[adata_sciplex.obs.condition.isin([\"Cediranib\", \"Crizotinib\", \"Motesanib\", \"BMS-754807\", \"Nintedanib\"]), 'split_tyrosine_ood'] = 'ood' " ] }, { "cell_type": "code", "execution_count": 39, "id": "a6386e14-7463-4f08-8ea0-d6991b9e3af1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "train 552761\n", "ood 14880\n", "test 14136\n", "Name: split_tyrosine_ood, dtype: int64" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.obs.split_tyrosine_ood.value_counts()" ] }, { "cell_type": "code", "execution_count": 40, "id": "8cc683c9-5fdb-47a1-b057-e357b93442a9", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
conditionAC480AG-490BMS-536924BMS-754807BosutinibCediranibCrizotinibDasatinibGlesatinib?(MGCD265)KW-2449Ki8751LapatinibLinifanibMotesanibNilotinibNintedanibPD173074PD98059PelitinibRegorafenibRigosertibSL-327SorafenibTAK-901TGX-221TemsirolimusTie2TrametinibVandetanib
split_tyrosine_ood
ood00026760306027860000003363029950000000000000
test64572858204910049165658064160367806390702723620502377678658419620453647443560
train25972805231801945002047252724522706243524870244802588304023062182156225212413164927381780261620312294
\n", "
" ], "text/plain": [ "condition AC480 AG-490 BMS-536924 BMS-754807 Bosutinib \\\n", "split_tyrosine_ood \n", "ood 0 0 0 2676 0 \n", "test 645 728 582 0 491 \n", "train 2597 2805 2318 0 1945 \n", "\n", "condition Cediranib Crizotinib Dasatinib Glesatinib?(MGCD265) \\\n", "split_tyrosine_ood \n", "ood 3060 2786 0 0 \n", "test 0 0 491 656 \n", "train 0 0 2047 2527 \n", "\n", "condition KW-2449 Ki8751 Lapatinib Linifanib Motesanib \\\n", "split_tyrosine_ood \n", "ood 0 0 0 0 3363 \n", "test 580 641 603 678 0 \n", "train 2452 2706 2435 2487 0 \n", "\n", "condition Nilotinib Nintedanib PD173074 PD98059 Pelitinib \\\n", "split_tyrosine_ood \n", "ood 0 2995 0 0 0 \n", "test 639 0 702 723 620 \n", "train 2448 0 2588 3040 2306 \n", "\n", "condition Regorafenib Rigosertib SL-327 Sorafenib TAK-901 \\\n", "split_tyrosine_ood \n", "ood 0 0 0 0 0 \n", "test 502 377 678 658 419 \n", "train 2182 1562 2521 2413 1649 \n", "\n", "condition TGX-221 Temsirolimus Tie2 Trametinib Vandetanib \n", "split_tyrosine_ood \n", "ood 0 0 0 0 0 \n", "test 620 453 647 443 560 \n", "train 2738 1780 2616 2031 2294 " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(adata_sciplex.obs.split_tyrosine_ood, adata_sciplex.obs['condition'][adata_sciplex.obs.condition.isin(tyrosine_drugs)])" ] }, { "cell_type": "code", "execution_count": 41, "id": "2fa637e1-a444-4235-8c5d-0c8b5acc1b9a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dose_val0.0010.0100.1001.000
split_tyrosine_ood
ood4226411838222714
test3928393035902688
train144859139622134416133864
\n", "
" ], "text/plain": [ "dose_val 0.001 0.010 0.100 1.000\n", "split_tyrosine_ood \n", "ood 4226 4118 3822 2714\n", "test 3928 3930 3590 2688\n", "train 144859 139622 134416 133864" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(adata_sciplex.obs.split_tyrosine_ood, adata_sciplex.obs.dose_val)" ] }, { "cell_type": "markdown", "id": "c16410d8-57f6-4958-8aec-db63f5acbfd2", "metadata": { "tags": [] }, "source": [ "____\n", "\n", "#### Epigenetic regulation" ] }, { "cell_type": "code", "execution_count": 42, "id": "226d2855-8739-4bf4-bab0-eaac30ffe7b7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RG108 3715\n", "Tubastatin 3710\n", "GSK-LSD1 3688\n", "SRT2104 3687\n", "Tacedinaline 3664\n", " ... \n", "Fulvestrant 0\n", "G007-LK 0\n", "GSK1070916 0\n", "Gandotinib 0\n", "control 0\n", "Name: condition, Length: 188, dtype: int64" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.obs.loc[adata_sciplex.obs.pathway_level_1.isin([\"Epigenetic regulation\"]),'condition'].value_counts()" ] }, { "cell_type": "code", "execution_count": 43, "id": "bf8532c2-e843-4d6e-87bb-a12dd3333d27", "metadata": {}, "outputs": [], "source": [ "epigenetic_drugs = adata_sciplex.obs.loc[adata_sciplex.obs.pathway_level_1.isin([\"Epigenetic regulation\"]),'condition'].unique()" ] }, { "cell_type": "code", "execution_count": 44, "id": "a3548623-3991-49fe-add3-aed28f6a3ee5", "metadata": {}, "outputs": [], "source": [ "adata_sciplex.obs['split_epigenetic_ood'] = 'train' \n", "\n", "test_idx = sc.pp.subsample(adata_sciplex[adata_sciplex.obs.pathway_level_1.isin([\"Epigenetic regulation\"])], .20, copy=True).obs.index\n", "adata_sciplex.obs.loc[test_idx, 'split_epigenetic_ood'] = 'test'\n", "\n", "adata_sciplex.obs.loc[adata_sciplex.obs.condition.isin([\"Azacitidine\", \"Pracinostat\", \"Trichostatin\", \"Quisinostat\", \"Tazemetostat\"]), 'split_epigenetic_ood'] = 'ood' " ] }, { "cell_type": "code", "execution_count": 45, "id": "fed7945c-8e2b-44a8-860e-38fea4fac1b4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "train 540070\n", "test 26538\n", "ood 15169\n", "Name: split_epigenetic_ood, dtype: int64" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.obs.split_epigenetic_ood.value_counts()" ] }, { "cell_type": "code", "execution_count": 46, "id": "0474d068-1bb1-4a9a-b8f3-f6661054f30b", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
conditionJQ1A-366AR-42AbexinostatAnacardicAzacitidineBRD4770BelinostatCUDC-101CUDC-907DacinostatDecitabineDivalproexDroxinostatEED226EntinostatGSKGSK-LSD1GivinostatITSA-1M344MC1568MocetinostatPCI-34051PFI-1PanobinostatPracinostatQuisinostatRG108ResminostatResveratrolSRT1720SRT2104SRT3025SelisistatSirtinolSodiumTMP195TacedinalineTazemetostatTrichostatinTubastatinTucidinostatUNC0379UNC0631UNC1999Valproic
split_epigenetic_ood
ood00000315100000000000000000000294223540000000000036393083000000
test62564562358272807435816615195184916476526457166906866315446116553855916185170070164965558377960569066971051174700718453686664686728
train24122751227823312876028862444254818981998186625812545262426692911300224742282254327611593235025892056003014267023172487290824052684287227872067291700299218002595289026832812
\n", "
" ], "text/plain": [ "condition JQ1 A-366 AR-42 Abexinostat Anacardic Azacitidine \\\n", "split_epigenetic_ood \n", "ood 0 0 0 0 0 3151 \n", "test 625 645 623 582 728 0 \n", "train 2412 2751 2278 2331 2876 0 \n", "\n", "condition BRD4770 Belinostat CUDC-101 CUDC-907 Dacinostat \\\n", "split_epigenetic_ood \n", "ood 0 0 0 0 0 \n", "test 743 581 661 519 518 \n", "train 2886 2444 2548 1898 1998 \n", "\n", "condition Decitabine Divalproex Droxinostat EED226 Entinostat \\\n", "split_epigenetic_ood \n", "ood 0 0 0 0 0 \n", "test 491 647 652 645 716 \n", "train 1866 2581 2545 2624 2669 \n", "\n", "condition GSK GSK-LSD1 Givinostat ITSA-1 M344 MC1568 \\\n", "split_epigenetic_ood \n", "ood 0 0 0 0 0 0 \n", "test 690 686 631 544 611 655 \n", "train 2911 3002 2474 2282 2543 2761 \n", "\n", "condition Mocetinostat PCI-34051 PFI-1 Panobinostat \\\n", "split_epigenetic_ood \n", "ood 0 0 0 0 \n", "test 385 591 618 517 \n", "train 1593 2350 2589 2056 \n", "\n", "condition Pracinostat Quisinostat RG108 Resminostat \\\n", "split_epigenetic_ood \n", "ood 2942 2354 0 0 \n", "test 0 0 701 649 \n", "train 0 0 3014 2670 \n", "\n", "condition Resveratrol SRT1720 SRT2104 SRT3025 Selisistat \\\n", "split_epigenetic_ood \n", "ood 0 0 0 0 0 \n", "test 655 583 779 605 690 \n", "train 2317 2487 2908 2405 2684 \n", "\n", "condition Sirtinol Sodium TMP195 Tacedinaline Tazemetostat \\\n", "split_epigenetic_ood \n", "ood 0 0 0 0 3639 \n", "test 669 710 511 747 0 \n", "train 2872 2787 2067 2917 0 \n", "\n", "condition Trichostatin Tubastatin Tucidinostat UNC0379 \\\n", "split_epigenetic_ood \n", "ood 3083 0 0 0 \n", "test 0 718 453 686 \n", "train 0 2992 1800 2595 \n", "\n", "condition UNC0631 UNC1999 Valproic \n", "split_epigenetic_ood \n", "ood 0 0 0 \n", "test 664 686 728 \n", "train 2890 2683 2812 " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(adata_sciplex.obs.split_epigenetic_ood, adata_sciplex.obs['condition'][adata_sciplex.obs.condition.isin(epigenetic_drugs)])" ] }, { "cell_type": "code", "execution_count": 47, "id": "7fbc8c54-82e7-40bb-b61e-c7c0e30c6717", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dose_val0.0010.0100.1001.000
split_tyrosine_ood
ood4226411838222714
test3928393035902688
train144859139622134416133864
\n", "
" ], "text/plain": [ "dose_val 0.001 0.010 0.100 1.000\n", "split_tyrosine_ood \n", "ood 4226 4118 3822 2714\n", "test 3928 3930 3590 2688\n", "train 144859 139622 134416 133864" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(adata_sciplex.obs.split_tyrosine_ood, adata_sciplex.obs.dose_val)" ] }, { "cell_type": "markdown", "id": "3f5dc02f-c298-46a3-b79d-39b414f87d0f", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "__________\n", "\n", "#### Cell cycle regulation" ] }, { "cell_type": "code", "execution_count": 48, "id": "b97eade0-5807-41b7-b657-809cb7f9b930", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ENMD-2076 5757\n", "BMS-265246 3274\n", "Roscovitine 3254\n", "Aurora 3036\n", "MK-5108 3006\n", " ... \n", "Fedratinib 0\n", "Filgotinib 0\n", "Fluorouracil 0\n", "Fulvestrant 0\n", "control 0\n", "Name: condition, Length: 188, dtype: int64" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.obs.loc[adata_sciplex.obs.pathway_level_1.isin([\"Cell cycle regulation\"]),'condition'].value_counts()" ] }, { "cell_type": "code", "execution_count": 49, "id": "04af6f98-298d-4630-9d53-1cd4892ed0d8", "metadata": {}, "outputs": [], "source": [ "cell_cycle_drugs = adata_sciplex.obs.loc[adata_sciplex.obs.pathway_level_1.isin([\"Cell cycle regulation\"]),'condition'].unique()" ] }, { "cell_type": "code", "execution_count": 50, "id": "2c47f50d-2178-4122-90b3-bb4202cc9f36", "metadata": {}, "outputs": [], "source": [ "adata_sciplex.obs['split_cellcycle_ood'] = 'train' \n", "\n", "test_idx = sc.pp.subsample(adata_sciplex[adata_sciplex.obs.pathway_level_1.isin([\"Cell cycle regulation\"])], .20, copy=True).obs.index\n", "adata_sciplex.obs.loc[test_idx, 'split_cellcycle_ood'] = 'test'\n", "\n", "adata_sciplex.obs.loc[adata_sciplex.obs.condition.isin([\"SNS-314\", \"Flavopiridol\", \"Roscovitine\"]), 'split_cellcycle_ood'] = 'ood' " ] }, { "cell_type": "code", "execution_count": 51, "id": "4845abcb-fbe7-4206-9d6b-19d9a937bd6b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "train 565503\n", "test 9376\n", "ood 6898\n", "Name: split_cellcycle_ood, dtype: int64" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.obs.split_cellcycle_ood.value_counts()" ] }, { "cell_type": "code", "execution_count": 52, "id": "ac569309-1243-4af2-8b06-3b3416a6e4d7", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
conditionAMG-900AlisertibAuroraBMS-265246BarasertibCYC116DanusertibENMD-2076EpothiloneFlavopiridolGSK1070916HesperadinJNJ-7706621MK-5108MLN8054PHA-680632PatupiloneRoscovitineSNS-314TozasertibZM
split_cellcycle_ood
ood000000000140700000003254223700
test5454286166794635704691140230051235659059047845029000424546
train21651673242025951958238119274617991019901593239824161866173111910015962170
\n", "
" ], "text/plain": [ "condition AMG-900 Alisertib Aurora BMS-265246 Barasertib \\\n", "split_cellcycle_ood \n", "ood 0 0 0 0 0 \n", "test 545 428 616 679 463 \n", "train 2165 1673 2420 2595 1958 \n", "\n", "condition CYC116 Danusertib ENMD-2076 Epothilone Flavopiridol \\\n", "split_cellcycle_ood \n", "ood 0 0 0 0 1407 \n", "test 570 469 1140 230 0 \n", "train 2381 1927 4617 991 0 \n", "\n", "condition GSK1070916 Hesperadin JNJ-7706621 MK-5108 MLN8054 \\\n", "split_cellcycle_ood \n", "ood 0 0 0 0 0 \n", "test 512 356 590 590 478 \n", "train 1990 1593 2398 2416 1866 \n", "\n", "condition PHA-680632 Patupilone Roscovitine SNS-314 Tozasertib \\\n", "split_cellcycle_ood \n", "ood 0 0 3254 2237 0 \n", "test 450 290 0 0 424 \n", "train 1731 1191 0 0 1596 \n", "\n", "condition ZM \n", "split_cellcycle_ood \n", "ood 0 \n", "test 546 \n", "train 2170 " ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(adata_sciplex.obs.split_cellcycle_ood, adata_sciplex.obs['condition'][adata_sciplex.obs.condition.isin(cell_cycle_drugs)])" ] }, { "cell_type": "code", "execution_count": 53, "id": "b93f29bf-a79f-40aa-87dd-4bd736cc8fa7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dose_val0.0010.0100.1001.000
split_cellcycle_ood
ood2165177414571502
test2673242923291945
train148175143467138042135819
\n", "
" ], "text/plain": [ "dose_val 0.001 0.010 0.100 1.000\n", "split_cellcycle_ood \n", "ood 2165 1774 1457 1502\n", "test 2673 2429 2329 1945\n", "train 148175 143467 138042 135819" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(adata_sciplex.obs.split_cellcycle_ood, adata_sciplex.obs.dose_val)" ] }, { "cell_type": "code", "execution_count": 54, "id": "41ba1c76-85a8-4398-a637-5354ac5cfb18", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['split_ho_pathway',\n", " 'split_tyrosine_ood',\n", " 'split_epigenetic_ood',\n", " 'split_cellcycle_ood']" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[c for c in adata_sciplex.obs.columns if 'split' in c]" ] }, { "cell_type": "markdown", "id": "697e1caf-86a1-46e5-af03-76213446dfe2", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "### Further splits\n", "\n", "**We omit these split as we design our own splits - for referece this is commented out for the moment**\n", "\n", "Also a split which sees all data:" ] }, { "cell_type": "code", "execution_count": 55, "id": "97654e4f-4801-42df-94e4-b3abc596dff2", "metadata": {}, "outputs": [], "source": [ "# adata.obs['split_all'] = 'train'\n", "# test_idx = sc.pp.subsample(adata, .10, copy=True).obs.index\n", "# adata.obs.loc[test_idx, 'split_all'] = 'test'" ] }, { "cell_type": "code", "execution_count": 56, "id": "07984033-dc32-4cad-91c1-650e3a2926e3", "metadata": {}, "outputs": [], "source": [ "# adata.obs['ct_dose'] = adata.obs.cell_type.astype('str') + '_' + adata.obs.dose_val.astype('str')" ] }, { "cell_type": "markdown", "id": "ef98d16d-aa0e-4b39-9675-c7b0132510c9", "metadata": {}, "source": [ "Round robin splits: dose and cell line combinations will be held out in turn." ] }, { "cell_type": "code", "execution_count": 57, "id": "4492d665-b265-4c8c-9251-6d7598551116", "metadata": {}, "outputs": [], "source": [ "# i = 0\n", "# split_dict = {}" ] }, { "cell_type": "code", "execution_count": 58, "id": "84f59288-4925-4d9c-92e3-f2f0611910da", "metadata": { "scrolled": true }, "outputs": [], "source": [ "# # single ct holdout\n", "# for ct in adata.obs.cell_type.unique():\n", "# for dose in adata.obs.dose_val.unique():\n", "# i += 1\n", "# split_name = f'split{i}'\n", "# split_dict[split_name] = f'{ct}_{dose}'\n", " \n", "# adata.obs[split_name] = 'train'\n", "# adata.obs.loc[adata.obs.ct_dose == f'{ct}_{dose}', split_name] = 'ood'\n", " \n", "# test_idx = sc.pp.subsample(adata[adata.obs[split_name] != 'ood'], .16, copy=True).obs.index\n", "# adata.obs.loc[test_idx, split_name] = 'test'\n", " \n", "# display(adata.obs[split_name].value_counts())" ] }, { "cell_type": "code", "execution_count": 59, "id": "23d5c2bd-ebcc-4da8-a0b8-c04fab040c44", "metadata": { "scrolled": true }, "outputs": [], "source": [ "# # double ct holdout\n", "# for cts in [('A549', 'MCF7'), ('A549', 'K562'), ('MCF7', 'K562')]:\n", "# for dose in adata.obs.dose_val.unique():\n", "# i += 1\n", "# split_name = f'split{i}'\n", "# split_dict[split_name] = f'{cts[0]}+{cts[1]}_{dose}'\n", " \n", "# adata.obs[split_name] = 'train'\n", "# adata.obs.loc[adata.obs.ct_dose == f'{cts[0]}_{dose}', split_name] = 'ood'\n", "# adata.obs.loc[adata.obs.ct_dose == f'{cts[1]}_{dose}', split_name] = 'ood'\n", " \n", "# test_idx = sc.pp.subsample(adata[adata.obs[split_name] != 'ood'], .16, copy=True).obs.index\n", "# adata.obs.loc[test_idx, split_name] = 'test'\n", " \n", "# display(adata.obs[split_name].value_counts())" ] }, { "cell_type": "code", "execution_count": 60, "id": "e722a203-eeba-4e85-a542-33d8783afec7", "metadata": {}, "outputs": [], "source": [ "# # triple ct holdout\n", "# for dose in adata.obs.dose_val.unique():\n", "# i += 1\n", "# split_name = f'split{i}'\n", "\n", "# split_dict[split_name] = f'all_{dose}'\n", "# adata.obs[split_name] = 'train'\n", "# adata.obs.loc[adata.obs.dose_val == dose, split_name] = 'ood'\n", "\n", "# test_idx = sc.pp.subsample(adata[adata.obs[split_name] != 'ood'], .16, copy=True).obs.index\n", "# adata.obs.loc[test_idx, split_name] = 'test'\n", "\n", "# display(adata.obs[split_name].value_counts())" ] }, { "cell_type": "code", "execution_count": 61, "id": "34f21c22-1979-484a-92bb-9f6fc8b71fd2", "metadata": {}, "outputs": [], "source": [ "# adata.uns['all_DEGs']" ] }, { "cell_type": "markdown", "id": "615fa85a-417d-4c37-8530-f0129132fc4f", "metadata": { "tags": [] }, "source": [ "## Save adata" ] }, { "cell_type": "markdown", "id": "319f177a-549f-4424-a56e-51af6535c48e", "metadata": {}, "source": [ "Reindex the lincs dataset" ] }, { "cell_type": "code", "execution_count": 62, "id": "1353837b-9cf8-46f0-9529-1340ba033f2f", "metadata": {}, "outputs": [], "source": [ "sciplex_ids = pd.Index(adata_sciplex.var.gene_id)\n", "\n", "lincs_idx = [sciplex_ids.get_loc(_id) for _id in adata_lincs.var.gene_id[adata_lincs.var.in_sciplex]]" ] }, { "cell_type": "code", "execution_count": 63, "id": "92990709-882a-4ed6-b216-df893b4dcea2", "metadata": {}, "outputs": [], "source": [ "non_lincs_idx = [sciplex_ids.get_loc(_id) for _id in adata_sciplex.var.gene_id if not adata_lincs.var.gene_id.isin([_id]).any()]\n", "\n", "lincs_idx.extend(non_lincs_idx)" ] }, { "cell_type": "code", "execution_count": 64, "id": "edda556e-1ae9-47ad-906f-bf12d94dccef", "metadata": {}, "outputs": [], "source": [ "adata_sciplex = adata_sciplex[:, lincs_idx].copy()" ] }, { "cell_type": "code", "execution_count": 65, "id": "eea089c3-f96a-420c-af98-576d3a34bd1c", "metadata": { "tags": [] }, "outputs": [], "source": [ "fname = PROJECT_DIR/'datasets'/'sciplex3_matched_genes_lincs.h5ad'\n", "\n", "sc.write(fname, adata_sciplex)" ] }, { "cell_type": "markdown", "id": "44fb969d-2a45-41a3-b138-eda1ea8e7238", "metadata": {}, "source": [ "Check that it worked" ] }, { "cell_type": "code", "execution_count": 66, "id": "582e2283-6100-4de2-995b-b7befddd0a92", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 581777 × 2000\n", " obs: 'cell_type', 'dose', 'dose_character', 'dose_pattern', 'g1s_score', 'g2m_score', 'pathway', 'pathway_level_1', 'pathway_level_2', 'product_dose', 'product_name', 'proliferation_index', 'replicate', 'size_factor', 'target', 'vehicle', 'batch', 'n_counts', 'dose_val', 'condition', 'drug_dose_name', 'cov_drug_dose_name', 'cov_drug', 'control', 'split_ho_pathway', 'split_tyrosine_ood', 'split_epigenetic_ood', 'split_cellcycle_ood'\n", " var: 'id', 'num_cells_expressed-0-0', 'num_cells_expressed-1-0', 'num_cells_expressed-1', 'gene_id', 'in_lincs', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'\n", " uns: 'all_DEGs', 'hvg', 'lincs_DEGs', 'log1p'" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sc.read(fname)" ] }, { "cell_type": "markdown", "id": "bdd00494-7d01-41c9-9a05-cf2921e68393", "metadata": {}, "source": [ "## Subselect to shared only shared genes" ] }, { "cell_type": "markdown", "id": "425a4946-12a1-42c3-ab11-41673606be1e", "metadata": {}, "source": [ "Subset to shared genes" ] }, { "cell_type": "code", "execution_count": 67, "id": "a0005d9c-42e1-4ec1-b7f5-fa9d96037d5d", "metadata": { "tags": [] }, "outputs": [], "source": [ "adata_lincs = adata_lincs[:, adata_lincs.var.in_sciplex].copy() " ] }, { "cell_type": "code", "execution_count": 68, "id": "1b0831d3-c6c2-4dbd-b137-8ca27e4e0e52", "metadata": {}, "outputs": [], "source": [ "adata_sciplex = adata_sciplex[:, adata_sciplex.var.in_lincs].copy()" ] }, { "cell_type": "code", "execution_count": 69, "id": "e0fcb5f4-46a1-4f4b-ab41-7cefdc26abfe", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['DDR1', 'PAX8', 'RPS5', 'ABCF1', 'SPAG7', 'RHOA', 'RNPS1', 'SMNDC1',\n", " 'ATP6V0B', 'RPS6',\n", " ...\n", " 'P4HTM', 'SLC27A3', 'TBXA2R', 'RTN2', 'TSTA3', 'PPARD', 'GNA11',\n", " 'WDTC1', 'PLSCR3', 'NPEPL1'],\n", " dtype='object', length=977)" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_lincs.var_names" ] }, { "cell_type": "code", "execution_count": 70, "id": "8825ee35-daab-4625-93f9-764ced4ef32f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['DDR1', 'PAX8', 'RPS5', 'ABCF1', 'SPAG7', 'RHOA', 'RNPS1', 'SMNDC1',\n", " 'ATP6V0B', 'RPS6',\n", " ...\n", " 'P4HTM', 'SLC27A3', 'TBXA2R', 'RTN2', 'TSTA3', 'PPARD', 'GNA11',\n", " 'WDTC1', 'PLSCR3', 'NPEPL1'],\n", " dtype='object', name='index', length=977)" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_sciplex.var_names" ] }, { "cell_type": "markdown", "id": "c059ac22-e464-40a9-8021-cbb4d8a10aba", "metadata": {}, "source": [ "## Save adata objects with shared genes only\n", "Index of lincs has also been reordered accordingly" ] }, { "cell_type": "code", "execution_count": 71, "id": "4ea8321a-694c-4522-a931-38d51990b5a0", "metadata": {}, "outputs": [], "source": [ "fname = PROJECT_DIR/'datasets'/'sciplex3_lincs_genes.h5ad'\n", "\n", "sc.write(fname, adata_sciplex)" ] }, { "cell_type": "markdown", "id": "36596257-fb0c-479c-8868-996a25affeae", "metadata": {}, "source": [ "____" ] }, { "cell_type": "code", "execution_count": 72, "id": "881fb5d3-1c04-4ecd-8ea3-aeda1e3baf57", "metadata": {}, "outputs": [], "source": [ "fname_lincs = PROJECT_DIR/'datasets'/'lincs_full_smiles_sciplex_genes.h5ad'\n", "\n", "sc.write(fname_lincs, adata_lincs)" ] }, { "cell_type": "code", "execution_count": null, "id": "89dca192", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "interpreter": { "hash": "ad25c9354f8cefdf5a943c25e67813a21d2807e3af4d6d0915e47390a83b57ce" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" }, "toc-autonumbering": false }, "nbformat": 4, "nbformat_minor": 5 }