Usage#

%load_ext autoreload
%autoreload 2

import pertpy
from deres import DEResult
import scanpy as sc
import numpy as np
import pandas as pd

Build example dataset#

For the sake of this example, we just go with the pbmc3k dataset and assign a random sample identifier to each cell to mimick a multi-sample dataset.

adata = sc.datasets.pbmc3k_processed()
adata.obs["sample"] = pd.Categorical([f"sample_{x}" for x in np.random.randint(0, 10, adata.shape[0])])
# raw.X contains log-normalized values which we are going to use here
adata = sc.AnnData(adata.raw.X, obs=adata.obs, var=adata.raw.var)

Run differential expression testing#

We first contruct a pseudobulk-object and then use the statsmodels implementation in pertpy to compare gene expression between cell-types. For the sake of the example, we compare CD8+ T cells with CD4+ T cells and NK cells.

pb = sc.get.aggregate(adata, ["sample", "louvain"], "mean")

mod = pertpy.tl.Statsmodels(pb, design="~ louvain", layer="mean")
mod.fit()

res = mod.test_contrasts(
    {
        "CD8_vs_CD4": mod.cond(louvain="CD8 T cells") - mod.cond(louvain="CD4 T cells"),
        "CD8_vs_NK": mod.cond(louvain="CD8 T cells") - mod.cond(louvain="NK cells"),
    }
)
res["p_value"] = res["p_value"].astype(float)  # don't ask me why this is an object column

res

	variable	p_value	t_value	sd	log_fc	adj_p_value	contrast
3694	GZMK	2.703991e-38	2.689104e+01	0.026700	7.179881e-01	2.703991e-38	CD8_vs_CD4
11083	CCL5	2.549464e-33	2.235275e+01	0.080934	1.809106e+00	2.549464e-33	CD8_vs_CD4
9292	GZMH	5.377138e-31	2.044751e+01	0.029001	5.930062e-01	5.377138e-31	CD8_vs_CD4
13001	NKG7	2.996402e-30	1.986191e+01	0.088122	1.750264e+00	2.996402e-30	CD8_vs_CD4
3695	GZMA	2.014626e-26	1.703723e+01	0.048890	8.329481e-01	2.014626e-26	CD8_vs_CD4
...	...	...	...	...	...	...	...
3098	PIGZ	1.000000e+00	-9.221825e-17	0.004703	-4.336809e-19	1.000000e+00	CD8_vs_NK
2663	HESX1	1.000000e+00	9.988825e-17	0.004342	4.336809e-19	1.000000e+00	CD8_vs_NK
7952	CCND1	1.000000e+00	0.000000e+00	0.004221	0.000000e+00	1.000000e+00	CD8_vs_NK
1505	TP53I3	1.000000e+00	0.000000e+00	0.012417	0.000000e+00	1.000000e+00	CD8_vs_NK
9584	RP11-164H13.1	1.000000e+00	-6.781390e-17	0.012790	-8.673617e-19	1.000000e+00	CD8_vs_NK

27428 rows × 7 columns

Construct DEResult class#

DEResult is class that holds a dataframe with DE results and associated metadata (e.g. which columns to use). The DEResult instance then provides summary and plotting functions to investigate the differential expression results.

de_res = DEResult(
    res, pb, p_col="p_value", effect_size_col="log_fc", var_col="variable", contrast_col="contrast", layer="mean"
)

de_res.summary()

	total	up	down	contrast
p < 0.1	1004	388	616	CD8_vs_CD4
p < 0.05	720	254	466	CD8_vs_CD4
p < 0.01	435	126	309	CD8_vs_CD4
p < 0.001	275	72	203	CD8_vs_CD4
p < 0.0001	153	45	108	CD8_vs_CD4
p < 0.1	1833	894	939	CD8_vs_NK
p < 0.05	1168	581	587	CD8_vs_NK
p < 0.01	632	317	315	CD8_vs_NK
p < 0.001	321	163	158	CD8_vs_NK
p < 0.0001	186	82	104	CD8_vs_NK

We can use .p_adjust to apply FDR correction of the p-values. It adds another column with the adjusted p-values to the results data frame and updates the pointer of the p-value column to use to use the FDR-adjusted p-values.

As a result, we can see in the summary DF, that now a lot fewer genes are considered significant at the different thresholds.

de_res.p_adjust()

de_res.summary()

	total	up	down	contrast
p < 0.1	323	85	238	CD8_vs_CD4
p < 0.05	280	72	208	CD8_vs_CD4
p < 0.01	158	48	110	CD8_vs_CD4
p < 0.001	94	35	59	CD8_vs_CD4
p < 0.0001	68	29	39	CD8_vs_CD4
p < 0.1	421	207	214	CD8_vs_NK
p < 0.05	331	166	165	CD8_vs_NK
p < 0.01	196	87	109	CD8_vs_NK
p < 0.001	124	51	73	CD8_vs_NK
p < 0.0001	85	29	56	CD8_vs_NK

Use plotting functions#

de_res.plot_volcano("CD8_vs_NK", log2fc_thresh=0.5)

../_images/972dd5a38b66f8b032839c521a73be1de67f504c580b7391a0ea8b5276122c24.png

de_res.plot_paired("louvain", "sample", "CD8_vs_NK", groups=["CD8 T cells", "NK cells"], n_top_vars=3)

Index(['CD8 T cells', 'NK cells'], dtype='object')

../_images/45938f85865776b225e837d1aa4b4c3c959dc44a9a4827eeb8c655b6127ab9a7.png

de_res.plot_fold_change("CD8_vs_NK")

../_images/b9a7bdf42ee4ac324bdbc013d0d557cacb3a53c6e3bd2abd233e8539843f9592.png

de_res.plot_multicomparison_fc()

../_images/df566236564306be75b5f723b28ca5f68a70dcf752e42d4c9ff4a1fe40b9aca8.png

Usage

Contents

Usage#

Build example dataset#

Run differential expression testing#

Construct DEResult class#

Use plotting functions#