deres.DEResult#

class deres.DEResult(res, adata=None, *, layer=None, p_col='p_value', effect_size_col='log_fc', contrast_col=None, var_col=None)#

Container to hold a differential expression result and associated metadata.

Parameters:
  • res (DataFrame) – The data frame with the statistical result. Typically contains a column with p-values and a column with some sort of effect size (e.g. fold change). The data frame may contain any additional columns

  • adata (Optional[AnnData] (default: None)) – associated AnnData object that holds expression values that were used to obtain the statistical results. This is optional, and only required for some plot types.

  • layer (Optional[str] (default: None)) – layer of AnnData to use (if any). If None, use X

  • p_col (str (default: 'p_value')) – Column in res containing the p-value

  • effect_size_col (str (default: 'log_fc')) – Column in res containing the effect size (e.g. log fold change)

  • contrast_col (Optional[str] (default: None)) – Column in res containing the contrast name. Only applicable if results from multiple comparisons are stored in the data frame. If it contains only the results from a single comparison, just leave this as None.

  • var_col (Optional[str] (default: None)) – Column in res containing the variable name (e.g. gene symbol). If None, use the index.

Attributes table#

contrasts

Get a list of all contrast available in the results df

Methods table#

get_df([contrast])

Get a copy of the results dataframe for a given contrast

p_adjust([method, adj_col_name])

Multiple testing correction for p-values

plot_fold_change([contrast, var_names, ...])

Plot a metric from the results as a bar chart.

plot_multicomparison_fc(*[, n_top_vars, ...])

Plot a matrix of log2 fold changes from the results.

plot_paired(groupby, pairedby[, contrast, ...])

Creates a pairwise expression plot from a Pandas DataFrame or Anndata.

plot_volcano([contrast, pval_thresh, ...])

Create a volcano plot from a pandas DataFrame or AnnData.

summary(*[, cutoffs])

Obtain a summary data frame of differential expression results

Attributes#

DEResult.contrasts#

Get a list of all contrast available in the results df

Methods#

DEResult.get_df(contrast=None)#

Get a copy of the results dataframe for a given contrast

If contrast is None, return the entire dataframe without filtering.

Return type:

DataFrame

DEResult.p_adjust(method='fdr', adj_col_name='adj_p_value')#

Multiple testing correction for p-values

Adds a new column to the results dataframe and updates the pointer p_col.

Parameters:
  • method (Literal['fdr'] (default: 'fdr')) – method to use for multiple testing correction. Currently only fdr is implemented.

  • adj_col_name (default: 'adj_p_value') – Col name used for the adjusted p values.

Return type:

None

DEResult.plot_fold_change(contrast=None, *, var_names=None, n_top_vars=15, y_label='Log2 fold change', figsize=(10, 5), return_fig=False, **barplot_kwargs)#

Plot a metric from the results as a bar chart.

Parameters:
  • var_names (Optional[Sequence[str]] (default: None)) – Variables to plot. If None, the top n_top_vars variables based on the log2 fold change are plotted.

  • n_top_vars (int (default: 15)) – Number of top variables to plot. The top and bottom n_top_vars variables are plotted, respectively.

  • y_label (str (default: 'Log2 fold change')) – Label for the y-axis.

  • figsize (tuple[int, int] (default: (10, 5))) – Size of the figure.

  • return_fig (bool (default: False)) – If True, return the figure. Default: False.

  • **barplot_kwargs – Additional arguments for seaborn.barplot.

Return type:

Figure | None

Returns:

Figure or None If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> edgr.fit()
>>> res_df = edgr.test_contrasts(
...     edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo")
... )
>>> edgr.plot_fold_change(res_df)
DEResult.plot_multicomparison_fc(*, n_top_vars=15, marker_size=100, figsize=(10, 2), x_label='Contrast', y_label='Gene', return_fig=False, **heatmap_kwargs)#

Plot a matrix of log2 fold changes from the results.

Parameters:
  • n_top_vars (default: 15) – Number of top variables to plot per group. Default: 15.

  • marker_size (int (default: 100)) – Size of the biggest marker for significant variables. Default: 100.

  • figsize (tuple[int, int] (default: (10, 2))) – Size of the figure. Default: (10, 2).

  • x_label (str (default: 'Contrast')) – Label for the x-axis. Default: “Contrast”.

  • y_label (str (default: 'Gene')) – Label for the y-axis. Default: “Gene”.

  • return_fig (bool (default: False)) – If True, return the figure, otherwise None. Default: False.

  • **heatmap_kwargs – Additional arguments for seaborn.heatmap.

Return type:

Figure | None

Returns:

If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> res_df = edgr.compare_groups(pdata, column="Efficacy", baseline="SD", groups_to_compare=["PR", "PD"])
>>> edgr.plot_multicomparison_fc(res_df)
DEResult.plot_paired(groupby, pairedby, contrast=None, *, groups=None, var_names=None, n_top_vars=15, n_cols=4, panel_size=(5, 5), show_legend=True, size=10, y_label='expression', pvalue_template=<function DEResult.<lambda>>, boxplot_properties=None, palette=None, return_fig=False)#

Creates a pairwise expression plot from a Pandas DataFrame or Anndata.

Visualizes a panel of paired scatterplots per variable.

Parameters:
  • groupby (str) – .obs column containing the grouping. Must contain exactly two different values.

  • pairedby (str) – .obs column containing the pairing (e.g. “patient_id”). If None, an independent t-test is performed.

  • contrast (Optional[str] (default: None)) – If multiple contrasts are stored in the results data frame, you need to specify one contrast here.

  • groups (Optional[Sequence[str]] (default: None)) – If the AnnData object contains more than two unique values in pairedby, you need to specify the two categories you’d like to show in the plot.

  • var_names (Optional[Sequence[str]] (default: None)) – Variables to plot.

  • n_top_vars (int (default: 15)) – Number of top variables to plot. Default: 15.

  • layer – Layer to use for plotting.

  • n_cols (int (default: 4)) – Number of columns in the plot. Default: 4.

  • panel_size (tuple[int, int] (default: (5, 5))) – Size of each panel. Default: (5, 5).

  • show_legend (bool (default: True)) – Whether to show the legend. Default: True.

  • size (int (default: 10)) – Size of the points. Default: 10.

  • y_label (str (default: 'expression')) – Label for the y-axis. Default: “expression”.

  • pvalue_template (default: <function DEResult.<lambda> at 0x7c8af37d09d0>) – Template for the p-value string displayed in the title of each panel.

  • boxplot_properties (default: None) – Additional properties for the boxplot, passed to seaborn.boxplot.

  • palette (default: None) – Color palette for the line- and stripplot.

  • return_fig (bool (default: False)) – If True, return the figure. Default: False.

Return type:

Figure | None

Returns:

Figure or None If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> edgr.fit()
>>> res_df = edgr.test_contrasts(
...     edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo")
... )
>>> edgr.plot_paired(pdata, results_df=res_df, n_top_vars=8, groupby="Treatment", pairedby="Efficacy")
DEResult.plot_volcano(contrast=None, *, pval_thresh=0.05, log2fc_thresh=0.75, to_label=5, s_curve=False, colors=None, color_dict=None, shape_dict=None, size_col=None, fontsize=10, top_right_frame=False, figsize=(5, 5), legend_pos=(1.6, 1), point_sizes=(15, 150), shapes=None, shape_order=None, x_label=None, y_label=None, return_fig=False, **kwargs)#

Create a volcano plot from a pandas DataFrame or AnnData.

Parameters:
  • pval_thresh (float (default: 0.05)) – Threshold p value for significance, by default 0.05

  • log2fc_thresh (float (default: 0.75)) – Threshold for log2 fold change significance, by default 0.75

  • to_label (int | list[str] (default: 5)) – Number of top genes or list of genes to label, by default 5

  • s_curve (bool | None (default: False)) – Whether to use a reciprocal threshold for up and down gene determination, by default False

  • colors (Optional[list[str]] (default: None)) – Colors for [non-DE, up, down] genes. Defaults to [‘gray’, ‘#D62728’, ‘#1F77B4’].

  • varm_key – Key in AnnData.varm slot to use for plotting if an AnnData object was passed.

  • color_dict (Optional[dict[str, list[str]]] (default: None)) – Dictionary for coloring dots by categories.

  • shape_dict (Optional[dict[str, list[str]]] (default: None)) – Dictionary for shaping dots by categories.

  • size_col (Optional[str] (default: None)) – Column name to size points by.

  • fontsize (int (default: 10)) – Size of gene labels, by default 10

  • top_right_frame (bool (default: False)) – Whether to show the top and right frame of the plot, by default False

  • figsize (tuple[int, int] (default: (5, 5))) – Size of the figure, by default (5, 5)

  • legend_pos (tuple[float, float] (default: (1.6, 1))) – Position of the legend as determined by matplotlib, by default (1.6, 1)

  • point_sizes (tuple[int, int] (default: (15, 150))) – Lower and upper bounds of point sizes, by default (15, 150)

  • shapes (Optional[list[str]] (default: None)) – List of matplotlib marker ids.

  • shape_order (Optional[list[str]] (default: None)) – Order of categories for shapes.

  • x_label (Optional[str] (default: None)) – Label for the x-axis.

  • y_label (Optional[str] (default: None)) – Label for the y-axis.

  • return_fig (bool (default: False)) – Whether to return the figure, by default False

  • **kwargs (int) – Additional arguments for seaborn.scatterplot.

Return type:

Figure | None

Returns:

If return_fig is True, returns the figure, otherwise None.

Examples

>>> # Example with EdgeR
>>> import pertpy as pt
>>> adata = pt.dt.zhang_2021()
>>> adata.layers["counts"] = adata.X.copy()
>>> ps = pt.tl.PseudobulkSpace()
>>> pdata = ps.compute(
...     adata,
...     target_col="Patient",
...     groups_col="Cluster",
...     layer_key="counts",
...     mode="sum",
...     min_cells=10,
...     min_counts=1000,
... )
>>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment")
>>> edgr.fit()
>>> res_df = edgr.test_contrasts(
...     edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo")
... )
>>> edgr.plot_volcano(res_df, log2fc_thresh=0)
DEResult.summary(*, cutoffs=(0.1, 0.05, 0.01, 0.001, 0.0001))#

Obtain a summary data frame of differential expression results

Return type:

DataFrame