deres.DEResult#
- class deres.DEResult(res, adata=None, *, layer=None, p_col='p_value', effect_size_col='log_fc', contrast_col=None, var_col=None)#
Container to hold a differential expression result and associated metadata.
- Parameters:
res (
DataFrame) – The data frame with the statistical result. Typically contains a column with p-values and a column with some sort of effect size (e.g. fold change). The data frame may contain any additional columnsadata (
Optional[AnnData] (default:None)) – associated AnnData object that holds expression values that were used to obtain the statistical results. This is optional, and only required for some plot types.layer (
Optional[str] (default:None)) – layer of AnnData to use (if any). IfNone, useXp_col (
str(default:'p_value')) – Column inrescontaining the p-valueeffect_size_col (
str(default:'log_fc')) – Column inrescontaining the effect size (e.g. log fold change)contrast_col (
Optional[str] (default:None)) – Column inrescontaining the contrast name. Only applicable if results from multiple comparisons are stored in the data frame. If it contains only the results from a single comparison, just leave this asNone.var_col (
Optional[str] (default:None)) – Column inrescontaining the variable name (e.g. gene symbol). IfNone, use the index.
Attributes table#
Get a list of all contrast available in the results df |
Methods table#
|
Get a copy of the results dataframe for a given contrast |
|
Multiple testing correction for p-values |
|
Plot a metric from the results as a bar chart. |
|
Plot a matrix of log2 fold changes from the results. |
|
Creates a pairwise expression plot from a Pandas DataFrame or Anndata. |
|
Create a volcano plot from a pandas DataFrame or AnnData. |
|
Obtain a summary data frame of differential expression results |
Attributes#
- DEResult.contrasts#
Get a list of all contrast available in the results df
Methods#
- DEResult.get_df(contrast=None)#
Get a copy of the results dataframe for a given contrast
If contrast is None, return the entire dataframe without filtering.
- Return type:
- DEResult.p_adjust(method='fdr', adj_col_name='adj_p_value')#
Multiple testing correction for p-values
Adds a new column to the results dataframe and updates the pointer
p_col.
- DEResult.plot_fold_change(contrast=None, *, var_names=None, n_top_vars=15, y_label='Log2 fold change', figsize=(10, 5), return_fig=False, **barplot_kwargs)#
Plot a metric from the results as a bar chart.
- Parameters:
var_names (
Optional[Sequence[str]] (default:None)) – Variables to plot. If None, the top n_top_vars variables based on the log2 fold change are plotted.n_top_vars (
int(default:15)) – Number of top variables to plot. The top and bottom n_top_vars variables are plotted, respectively.y_label (
str(default:'Log2 fold change')) – Label for the y-axis.figsize (
tuple[int,int] (default:(10, 5))) – Size of the figure.return_fig (
bool(default:False)) – If True, return the figure. Default: False.**barplot_kwargs – Additional arguments for seaborn.barplot.
- Return type:
- Returns:
Figure or None If
return_figisTrue, returns the figure, otherwiseNone.
Examples
>>> # Example with EdgeR >>> import pertpy as pt >>> adata = pt.dt.zhang_2021() >>> adata.layers["counts"] = adata.X.copy() >>> ps = pt.tl.PseudobulkSpace() >>> pdata = ps.compute( ... adata, ... target_col="Patient", ... groups_col="Cluster", ... layer_key="counts", ... mode="sum", ... min_cells=10, ... min_counts=1000, ... ) >>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment") >>> edgr.fit() >>> res_df = edgr.test_contrasts( ... edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo") ... ) >>> edgr.plot_fold_change(res_df)
- DEResult.plot_multicomparison_fc(*, n_top_vars=15, marker_size=100, figsize=(10, 2), x_label='Contrast', y_label='Gene', return_fig=False, **heatmap_kwargs)#
Plot a matrix of log2 fold changes from the results.
- Parameters:
n_top_vars (default:
15) – Number of top variables to plot per group. Default: 15.marker_size (
int(default:100)) – Size of the biggest marker for significant variables. Default: 100.figsize (
tuple[int,int] (default:(10, 2))) – Size of the figure. Default: (10, 2).x_label (
str(default:'Contrast')) – Label for the x-axis. Default: “Contrast”.y_label (
str(default:'Gene')) – Label for the y-axis. Default: “Gene”.return_fig (
bool(default:False)) – If True, return the figure, otherwise None. Default: False.**heatmap_kwargs – Additional arguments for seaborn.heatmap.
- Return type:
- Returns:
If
return_figisTrue, returns the figure, otherwiseNone.
Examples
>>> # Example with EdgeR >>> import pertpy as pt >>> adata = pt.dt.zhang_2021() >>> adata.layers["counts"] = adata.X.copy() >>> ps = pt.tl.PseudobulkSpace() >>> pdata = ps.compute( ... adata, ... target_col="Patient", ... groups_col="Cluster", ... layer_key="counts", ... mode="sum", ... min_cells=10, ... min_counts=1000, ... ) >>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment") >>> res_df = edgr.compare_groups(pdata, column="Efficacy", baseline="SD", groups_to_compare=["PR", "PD"]) >>> edgr.plot_multicomparison_fc(res_df)
- DEResult.plot_paired(groupby, pairedby, contrast=None, *, groups=None, var_names=None, n_top_vars=15, n_cols=4, panel_size=(5, 5), show_legend=True, size=10, y_label='expression', pvalue_template=<function DEResult.<lambda>>, boxplot_properties=None, palette=None, return_fig=False)#
Creates a pairwise expression plot from a Pandas DataFrame or Anndata.
Visualizes a panel of paired scatterplots per variable.
- Parameters:
groupby (
str) – .obs column containing the grouping. Must contain exactly two different values.pairedby (
str) – .obs column containing the pairing (e.g. “patient_id”). If None, an independent t-test is performed.contrast (
Optional[str] (default:None)) – If multiple contrasts are stored in the results data frame, you need to specify one contrast here.groups (
Optional[Sequence[str]] (default:None)) – If the AnnData object contains more than two unique values inpairedby, you need to specify the two categories you’d like to show in the plot.var_names (
Optional[Sequence[str]] (default:None)) – Variables to plot.n_top_vars (
int(default:15)) – Number of top variables to plot. Default: 15.layer – Layer to use for plotting.
n_cols (
int(default:4)) – Number of columns in the plot. Default: 4.panel_size (
tuple[int,int] (default:(5, 5))) – Size of each panel. Default: (5, 5).show_legend (
bool(default:True)) – Whether to show the legend. Default: True.size (
int(default:10)) – Size of the points. Default: 10.y_label (
str(default:'expression')) – Label for the y-axis. Default: “expression”.pvalue_template (default:
<function DEResult.<lambda> at 0x7c8af37d09d0>) – Template for the p-value string displayed in the title of each panel.boxplot_properties (default:
None) – Additional properties for the boxplot, passed to seaborn.boxplot.palette (default:
None) – Color palette for the line- and stripplot.return_fig (
bool(default:False)) – If True, return the figure. Default: False.
- Return type:
- Returns:
Figure or None If
return_figisTrue, returns the figure, otherwiseNone.
Examples
>>> # Example with EdgeR >>> import pertpy as pt >>> adata = pt.dt.zhang_2021() >>> adata.layers["counts"] = adata.X.copy() >>> ps = pt.tl.PseudobulkSpace() >>> pdata = ps.compute( ... adata, ... target_col="Patient", ... groups_col="Cluster", ... layer_key="counts", ... mode="sum", ... min_cells=10, ... min_counts=1000, ... ) >>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment") >>> edgr.fit() >>> res_df = edgr.test_contrasts( ... edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo") ... ) >>> edgr.plot_paired(pdata, results_df=res_df, n_top_vars=8, groupby="Treatment", pairedby="Efficacy")
- DEResult.plot_volcano(contrast=None, *, pval_thresh=0.05, log2fc_thresh=0.75, to_label=5, s_curve=False, colors=None, color_dict=None, shape_dict=None, size_col=None, fontsize=10, top_right_frame=False, figsize=(5, 5), legend_pos=(1.6, 1), point_sizes=(15, 150), shapes=None, shape_order=None, x_label=None, y_label=None, return_fig=False, **kwargs)#
Create a volcano plot from a pandas DataFrame or AnnData.
- Parameters:
pval_thresh (
float(default:0.05)) – Threshold p value for significance, by default 0.05log2fc_thresh (
float(default:0.75)) – Threshold for log2 fold change significance, by default 0.75to_label (
int|list[str] (default:5)) – Number of top genes or list of genes to label, by default 5s_curve (
bool|None(default:False)) – Whether to use a reciprocal threshold for up and down gene determination, by default Falsecolors (
Optional[list[str]] (default:None)) – Colors for [non-DE, up, down] genes. Defaults to [‘gray’, ‘#D62728’, ‘#1F77B4’].varm_key – Key in AnnData.varm slot to use for plotting if an AnnData object was passed.
color_dict (
Optional[dict[str,list[str]]] (default:None)) – Dictionary for coloring dots by categories.shape_dict (
Optional[dict[str,list[str]]] (default:None)) – Dictionary for shaping dots by categories.size_col (
Optional[str] (default:None)) – Column name to size points by.fontsize (
int(default:10)) – Size of gene labels, by default 10top_right_frame (
bool(default:False)) – Whether to show the top and right frame of the plot, by default Falsefigsize (
tuple[int,int] (default:(5, 5))) – Size of the figure, by default (5, 5)legend_pos (
tuple[float,float] (default:(1.6, 1))) – Position of the legend as determined by matplotlib, by default (1.6, 1)point_sizes (
tuple[int,int] (default:(15, 150))) – Lower and upper bounds of point sizes, by default (15, 150)shapes (
Optional[list[str]] (default:None)) – List of matplotlib marker ids.shape_order (
Optional[list[str]] (default:None)) – Order of categories for shapes.x_label (
Optional[str] (default:None)) – Label for the x-axis.y_label (
Optional[str] (default:None)) – Label for the y-axis.return_fig (
bool(default:False)) – Whether to return the figure, by default False**kwargs (
int) – Additional arguments for seaborn.scatterplot.
- Return type:
- Returns:
If
return_figisTrue, returns the figure, otherwiseNone.
Examples
>>> # Example with EdgeR >>> import pertpy as pt >>> adata = pt.dt.zhang_2021() >>> adata.layers["counts"] = adata.X.copy() >>> ps = pt.tl.PseudobulkSpace() >>> pdata = ps.compute( ... adata, ... target_col="Patient", ... groups_col="Cluster", ... layer_key="counts", ... mode="sum", ... min_cells=10, ... min_counts=1000, ... ) >>> edgr = pt.tl.EdgeR(pdata, design="~Efficacy+Treatment") >>> edgr.fit() >>> res_df = edgr.test_contrasts( ... edgr.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo") ... ) >>> edgr.plot_volcano(res_df, log2fc_thresh=0)