deres.DEResult#

class deres.DEResult(res, adata=None, *, layer=None, p_col='p_value', effect_size_col='log_fc', contrast_col=None, var_col=None)#

Container to hold a differential expression result and associated metadata.

Parameters:

res (DataFrame) – The data frame with the statistical result. Typically contains a column with p-values and a column with some sort of effect size (e.g. fold change). The data frame may contain any additional columns
adata (Optional[AnnData] (default: None)) – associated AnnData object that holds expression values that were used to obtain the statistical results. This is optional, and only required for some plot types.
layer (Optional[str] (default: None)) – layer of AnnData to use (if any). If None, use X
p_col (str (default: 'p_value')) – Column in res containing the p-value
effect_size_col (str (default: 'log_fc')) – Column in res containing the effect size (e.g. log fold change)
contrast_col (Optional[str] (default: None)) – Column in res containing the contrast name. Only applicable if results from multiple comparisons are stored in the data frame. If it contains only the results from a single comparison, just leave this as None.
var_col (Optional[str] (default: None)) – Column in res containing the variable name (e.g. gene symbol). If None, use the index.

Attributes table#

contrasts

Get a list of all contrast available in the results df

Methods table#

`get_df`([contrast])	Get a copy of the results dataframe for a given contrast
`p_adjust`([method, adj_col_name])	Multiple testing correction for p-values
`plot_fold_change`([contrast, var_names, ...])	Plot a metric from the results as a bar chart.
`plot_multicomparison_fc`(*[, n_top_vars, ...])	Plot a matrix of log2 fold changes from the results.
`plot_paired`(groupby, pairedby[, contrast, ...])	Creates a pairwise expression plot from a Pandas DataFrame or Anndata.
`plot_volcano`([contrast, pval_thresh, ...])	Create a volcano plot from a pandas DataFrame or AnnData.
`summary`(*[, cutoffs])	Obtain a summary data frame of differential expression results

Attributes#

DEResult.contrasts#: Get a list of all contrast available in the results df

Methods#

DEResult.get_df(contrast=None)#

Get a copy of the results dataframe for a given contrast

If contrast is None, return the entire dataframe without filtering.

Return type:: DataFrame

DEResult.p_adjust(method='fdr', adj_col_name='adj_p_value')#

Multiple testing correction for p-values

Adds a new column to the results dataframe and updates the pointer p_col.

Parameters:

method (Literal['fdr'] (default: 'fdr')) – method to use for multiple testing correction. Currently only fdr is implemented.
adj_col_name (default: 'adj_p_value') – Col name used for the adjusted p values.

Return type:

None

DEResult.plot_fold_change(contrast=None, *, var_names=None, n_top_vars=15, y_label='Log2 fold change', figsize=(10, 5), return_fig=False, **barplot_kwargs)#

Plot a metric from the results as a bar chart.

Parameters:

var_names (Optional[Sequence[str]] (default: None)) – Variables to plot. If None, the top n_top_vars variables based on the log2 fold change are plotted.
n_top_vars (int (default: 15)) – Number of top variables to plot. The top and bottom n_top_vars variables are plotted, respectively.
y_label (str (default: 'Log2 fold change')) – Label for the y-axis.
figsize (tuple[int, int] (default: (10, 5))) – Size of the figure.
return_fig (bool (default: False)) – If True, return the figure. Default: False.
**barplot_kwargs – Additional arguments for seaborn.barplot.

Return type:

Figure | None

Returns:

Figure or None If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> edgr.fit()
>>> res_df = edgr.test_contrasts(
...     edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo")
... )
>>> edgr.plot_fold_change(res_df)

DEResult.plot_multicomparison_fc(*, n_top_vars=15, marker_size=100, figsize=(10, 2), x_label='Contrast', y_label='Gene', return_fig=False, **heatmap_kwargs)#

Plot a matrix of log2 fold changes from the results.

Parameters:

n_top_vars (default: 15) – Number of top variables to plot per group. Default: 15.
marker_size (int (default: 100)) – Size of the biggest marker for significant variables. Default: 100.
figsize (tuple[int, int] (default: (10, 2))) – Size of the figure. Default: (10, 2).
x_label (str (default: 'Contrast')) – Label for the x-axis. Default: “Contrast”.
y_label (str (default: 'Gene')) – Label for the y-axis. Default: “Gene”.
return_fig (bool (default: False)) – If True, return the figure, otherwise None. Default: False.
**heatmap_kwargs – Additional arguments for seaborn.heatmap.

Return type:

Figure | None

Returns:

If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> res_df = edgr.compare_groups(pdata, column="Efficacy", baseline="SD", groups_to_compare=["PR", "PD"])
>>> edgr.plot_multicomparison_fc(res_df)

DEResult.plot_paired(groupby, pairedby, contrast=None, *, groups=None, var_names=None, n_top_vars=15, n_cols=4, panel_size=(5, 5), show_legend=True, size=10, y_label='expression', pvalue_template=<function DEResult.<lambda>>, boxplot_properties=None, palette=None, return_fig=False)#

Creates a pairwise expression plot from a Pandas DataFrame or Anndata.

Visualizes a panel of paired scatterplots per variable.

Parameters:

groupby (str) – .obs column containing the grouping. Must contain exactly two different values.
pairedby (str) – .obs column containing the pairing (e.g. “patient_id”). If None, an independent t-test is performed.
contrast (Optional[str] (default: None)) – If multiple contrasts are stored in the results data frame, you need to specify one contrast here.
groups (Optional[Sequence[str]] (default: None)) – If the AnnData object contains more than two unique values in pairedby, you need to specify the two categories you’d like to show in the plot.
var_names (Optional[Sequence[str]] (default: None)) – Variables to plot.
n_top_vars (int (default: 15)) – Number of top variables to plot. Default: 15.
layer – Layer to use for plotting.
n_cols (int (default: 4)) – Number of columns in the plot. Default: 4.
panel_size (tuple[int, int] (default: (5, 5))) – Size of each panel. Default: (5, 5).
show_legend (bool (default: True)) – Whether to show the legend. Default: True.
size (int (default: 10)) – Size of the points. Default: 10.
y_label (str (default: 'expression')) – Label for the y-axis. Default: “expression”.
pvalue_template (default: <function DEResult.<lambda> at 0x7c8af37d09d0>) – Template for the p-value string displayed in the title of each panel.
boxplot_properties (default: None) – Additional properties for the boxplot, passed to seaborn.boxplot.
palette (default: None) – Color palette for the line- and stripplot.
return_fig (bool (default: False)) – If True, return the figure. Default: False.

Return type:

Figure | None

Returns:

Figure or None If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> edgr.fit()
>>> res_df = edgr.test_contrasts(
...     edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo")
... )
>>> edgr.plot_paired(pdata, results_df=res_df, n_top_vars=8, groupby="Treatment", pairedby="Efficacy")

DEResult.plot_volcano(contrast=None, *, pval_thresh=0.05, log2fc_thresh=0.75, to_label=5, s_curve=False, colors=None, color_dict=None, shape_dict=None, size_col=None, fontsize=10, top_right_frame=False, figsize=(5, 5), legend_pos=(1.6, 1), point_sizes=(15, 150), shapes=None, shape_order=None, x_label=None, y_label=None, return_fig=False, **kwargs)#

Create a volcano plot from a pandas DataFrame or AnnData.

Parameters:

pval_thresh (float (default: 0.05)) – Threshold p value for significance, by default 0.05
log2fc_thresh (float (default: 0.75)) – Threshold for log2 fold change significance, by default 0.75
to_label (int | list[str] (default: 5)) – Number of top genes or list of genes to label, by default 5
s_curve (bool | None (default: False)) – Whether to use a reciprocal threshold for up and down gene determination, by default False
colors (Optional[list[str]] (default: None)) – Colors for [non-DE, up, down] genes. Defaults to [‘gray’, ‘#D62728’, ‘#1F77B4’].
varm_key – Key in AnnData.varm slot to use for plotting if an AnnData object was passed.
color_dict (Optional[dict[str, list[str]]] (default: None)) – Dictionary for coloring dots by categories.
shape_dict (Optional[dict[str, list[str]]] (default: None)) – Dictionary for shaping dots by categories.
size_col (Optional[str] (default: None)) – Column name to size points by.
fontsize (int (default: 10)) – Size of gene labels, by default 10
top_right_frame (bool (default: False)) – Whether to show the top and right frame of the plot, by default False
figsize (tuple[int, int] (default: (5, 5))) – Size of the figure, by default (5, 5)
legend_pos (tuple[float, float] (default: (1.6, 1))) – Position of the legend as determined by matplotlib, by default (1.6, 1)
point_sizes (tuple[int, int] (default: (15, 150))) – Lower and upper bounds of point sizes, by default (15, 150)
shapes (Optional[list[str]] (default: None)) – List of matplotlib marker ids.
shape_order (Optional[list[str]] (default: None)) – Order of categories for shapes.
x_label (Optional[str] (default: None)) – Label for the x-axis.
y_label (Optional[str] (default: None)) – Label for the y-axis.
return_fig (bool (default: False)) – Whether to return the figure, by default False
**kwargs (int) – Additional arguments for seaborn.scatterplot.

Return type:

Figure | None

Returns:

If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> edgr.fit()
>>> res_df = edgr.test_contrasts(
...     edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo")
... )
>>> edgr.plot_volcano(res_df, log2fc_thresh=0)

DEResult.summary(*, cutoffs=(0.1, 0.05, 0.01, 0.001, 0.0001))#

Obtain a summary data frame of differential expression results

Return type:: DataFrame

deres.DEResult

Contents

deres.DEResult#

Attributes table#

Methods table#

Attributes#

Methods#