Visualizing data with Panpipes

This tutorial guides you through the visualization of single-cell data using Panpipes. The workflow describes the steps run by the pipeline in greater detail. The vis workflow can be run after any of the other Panpipes workflows and visualizes the data in e.g. UMAPs, PCA, barplots, etc. It expects a MuData object as input.

For all the tutorials, we will append the --local command which ensures that the pipeline runs on the computing node you’re currently on, namely your local machine or an interactive session on a computing node on a cluster.

Directories and data

In this tutorial, we will use the subset of the teaseq datasets that we also used in the Ingesting, Preprocessing, Integration, and Clustering tutorials. The data contains three modalities - RNA, ATAC, and protein. We will use the MuData object that we obtain after running the ingest, preprocess, and clustering workflows.

For the RNA modality, multiple QC metrics, as well as PCA and UMAP are present in the mdata.mod["rna"] slot:

Input MuData, RNA slot

Same goes for ATAC:

Input MuData, ATAC slot

and protein:

Input MuData, protein slot

We will be working in in the teaseq/vis directory and save the input MuData object into teaseq/vis/data:

mkdir teaseq teaseq/vis teaseq/vis/data
cd teaseq/vis

You can download the MuData object we will use for this tutorial here and save it to teaseq/vis/data.

After creating the directories and downloading the data, the folder structure looks as follows:

teaseq
├── vis
│   └── data
│       └──  teaseq_clustered.h5mu

Edit yaml file

To create the pipeline.log and a pipeline.yml file, call panpipes vis config in teaseq/vis (you potentially need to activate the conda environment with conda activate pipeline_env first!). Modify the pipeline.yml or simply replace it with the yaml file we provide. In the yaml file, you can specify which categorical and continuous variables to plot and on which embeddings.

If you decide to use the provided yaml file for this tutorial, you may also download the needed csv-files of custom markers, paired markers, and paired metrics.

Run Panpipes

In teaseq/vis, run panpipes vis make full --local to visualize your data.

After successfully running the pipeline with the the provided yaml file, the vis folder contains a folder for each modality, in this case, rna, atac, and prot.

In each folder, you can find the embeddings (in our case PCA, UMAP) coloured by continuous variables. In this example, the PCA and UMAP of the RNA modality are coloured by rna:total_counts:

PCA, RNA, total_counts UMAP, RNA, total_counts

The embeddings are also coloured by the categorical variables that are specified in the yaml. In this example, the PCA and UMAP embeddings of the RNA modality are coloured by a leiden clustering, doublet detection results, and sample ID:

PCA, RNA, categorical

UMAP, RNA, categorical

Besides the embedding plots, the pipeline also provides the possibility of (stacked) barplots for categorical variables and violin plots for continuous variables.

Plots of the custom markers that were specified in the custom markers csv file are also provided. The embeddings are coloured by the feature expression, additionally, dot plots and matrix plots are generated:

UMAP, RNA, Tcellmarkers

Dotplot, RNA, Tcellmarkers

Note: We find that keeping the suggested directory structure (one main directory by project with all the individual steps in separate folders) is useful for project management. You can of course customize your directories as you prefer, and change the paths accordingly in the pipeline.yml config files!