Usage#

%load_ext autoreload
%autoreload 2
import pertpy
from deres import DEResult
import scanpy as sc
import numpy as np
import pandas as pd

Build example dataset#

For the sake of this example, we just go with the pbmc3k dataset and assign a random sample identifier to each cell to mimick a multi-sample dataset.

adata = sc.datasets.pbmc3k_processed()
adata.obs["sample"] = pd.Categorical([f"sample_{x}" for x in np.random.randint(0, 10, adata.shape[0])])
# raw.X contains log-normalized values which we are going to use here
adata = sc.AnnData(adata.raw.X, obs=adata.obs, var=adata.raw.var)

Run differential expression testing#

We first contruct a pseudobulk-object and then use the statsmodels implementation in pertpy to compare gene expression between cell-types. For the sake of the example, we compare CD8+ T cells with CD4+ T cells and NK cells.

pb = sc.get.aggregate(adata, ["sample", "louvain"], "mean")
mod = pertpy.tl.Statsmodels(pb, design="~ louvain", layer="mean")
mod.fit()
res = mod.test_contrasts(
    {
        "CD8_vs_CD4": mod.cond(louvain="CD8 T cells") - mod.cond(louvain="CD4 T cells"),
        "CD8_vs_NK": mod.cond(louvain="CD8 T cells") - mod.cond(louvain="NK cells"),
    }
)
res["p_value"] = res["p_value"].astype(float)  # don't ask me why this is an object column
res
variable p_value t_value sd log_fc adj_p_value contrast
3694 GZMK 2.703991e-38 2.689104e+01 0.026700 7.179881e-01 2.703991e-38 CD8_vs_CD4
11083 CCL5 2.549464e-33 2.235275e+01 0.080934 1.809106e+00 2.549464e-33 CD8_vs_CD4
9292 GZMH 5.377138e-31 2.044751e+01 0.029001 5.930062e-01 5.377138e-31 CD8_vs_CD4
13001 NKG7 2.996402e-30 1.986191e+01 0.088122 1.750264e+00 2.996402e-30 CD8_vs_CD4
3695 GZMA 2.014626e-26 1.703723e+01 0.048890 8.329481e-01 2.014626e-26 CD8_vs_CD4
... ... ... ... ... ... ... ...
3098 PIGZ 1.000000e+00 -9.221825e-17 0.004703 -4.336809e-19 1.000000e+00 CD8_vs_NK
2663 HESX1 1.000000e+00 9.988825e-17 0.004342 4.336809e-19 1.000000e+00 CD8_vs_NK
7952 CCND1 1.000000e+00 0.000000e+00 0.004221 0.000000e+00 1.000000e+00 CD8_vs_NK
1505 TP53I3 1.000000e+00 0.000000e+00 0.012417 0.000000e+00 1.000000e+00 CD8_vs_NK
9584 RP11-164H13.1 1.000000e+00 -6.781390e-17 0.012790 -8.673617e-19 1.000000e+00 CD8_vs_NK

27428 rows × 7 columns

Construct DEResult class#

DEResult is class that holds a dataframe with DE results and associated metadata (e.g. which columns to use). The DEResult instance then provides summary and plotting functions to investigate the differential expression results.

de_res = DEResult(
    res, pb, p_col="p_value", effect_size_col="log_fc", var_col="variable", contrast_col="contrast", layer="mean"
)
de_res.summary()
total up down contrast
p < 0.1 1004 388 616 CD8_vs_CD4
p < 0.05 720 254 466 CD8_vs_CD4
p < 0.01 435 126 309 CD8_vs_CD4
p < 0.001 275 72 203 CD8_vs_CD4
p < 0.0001 153 45 108 CD8_vs_CD4
p < 0.1 1833 894 939 CD8_vs_NK
p < 0.05 1168 581 587 CD8_vs_NK
p < 0.01 632 317 315 CD8_vs_NK
p < 0.001 321 163 158 CD8_vs_NK
p < 0.0001 186 82 104 CD8_vs_NK

We can use .p_adjust to apply FDR correction of the p-values. It adds another column with the adjusted p-values to the results data frame and updates the pointer of the p-value column to use to use the FDR-adjusted p-values.

As a result, we can see in the summary DF, that now a lot fewer genes are considered significant at the different thresholds.

de_res.p_adjust()
de_res.summary()
total up down contrast
p < 0.1 323 85 238 CD8_vs_CD4
p < 0.05 280 72 208 CD8_vs_CD4
p < 0.01 158 48 110 CD8_vs_CD4
p < 0.001 94 35 59 CD8_vs_CD4
p < 0.0001 68 29 39 CD8_vs_CD4
p < 0.1 421 207 214 CD8_vs_NK
p < 0.05 331 166 165 CD8_vs_NK
p < 0.01 196 87 109 CD8_vs_NK
p < 0.001 124 51 73 CD8_vs_NK
p < 0.0001 85 29 56 CD8_vs_NK

Use plotting functions#

de_res.plot_volcano("CD8_vs_NK", log2fc_thresh=0.5)
../_images/972dd5a38b66f8b032839c521a73be1de67f504c580b7391a0ea8b5276122c24.png
de_res.plot_paired("louvain", "sample", "CD8_vs_NK", groups=["CD8 T cells", "NK cells"], n_top_vars=3)
Index(['CD8 T cells', 'NK cells'], dtype='object')
../_images/45938f85865776b225e837d1aa4b4c3c959dc44a9a4827eeb8c655b6127ab9a7.png
de_res.plot_fold_change("CD8_vs_NK")
../_images/b9a7bdf42ee4ac324bdbc013d0d557cacb3a53c6e3bd2abd233e8539843f9592.png
de_res.plot_multicomparison_fc()
../_images/df566236564306be75b5f723b28ca5f68a70dcf752e42d4c9ff4a1fe40b9aca8.png