Ingesting 10X Visium data with Panpipes

Let’s run through an example of reading 10X Visium data into MuData objects and computing QC metrics using Panpipes. The workflow describes the steps run by the pipeline in detail.

For all the tutorials, we will append the --local command which ensures that the pipeline runs on the computing node you’re currently on, namely your local machine or an interactive session on a computing node on a cluster.

Create directories and download data

Let’s create a main spatial directory and in that directory, the directories ingestion and data.

mkdir spatial
cd spatial
mkdir ingestion ingestion/data

In this tutorial, we will use two 10X Visium datasets, a Human Lymph node dataset and a Human Heart dataset. To download the two datasets into the data folder, you can use the following commands:

cd ingestion/data 
mkdir V1_Human_Heart V1_Human_Lymph_Node

# download human heart count matrix and spatial info 
cd V1_Human_Heart
curl -O https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Human_Heart/V1_Human_Heart_filtered_feature_bc_matrix.h5
curl -O https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Human_Heart/V1_Human_Heart_spatial.tar.gz
tar -xf V1_Human_Heart_spatial.tar.gz

# download human lymph node count matrix and spatial info 
cd ../V1_Human_Lymph_Node
curl -O https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Human_Lymph_Node/V1_Human_Lymph_Node_filtered_feature_bc_matrix.h5
curl -O https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Human_Lymph_Node/V1_Human_Lymph_Node_spatial.tar.gz
tar -xf V1_Human_Lymph_Node_spatial.tar.gz

Inside the ingestion directory, you should now have a directory with all the data you downloaded:

data
├── V1_Human_Heart
│   ├── V1_Human_Heart_spatial.tar.gz
│   ├── V1_Human_Heart_filtered_feature_bc_matrix.h5
│   └── spatial
│       ├── aligned_fiducials.jpg
│       ├── detected_tissue_image.jpg
│       ├── scalefactors_json.json
│       ├── tissue_hires_image.png
│       ├── tissue_lowres_image.png
│       └── tissue_positions_list.csv
└── V1_Human_Lymph_Node
    ├── V1_Human_Lymph_Node_spatial.tar.gz
    ├── V1_Human_Lymph_Node_filtered_feature_bc_matrix.h5
    └── spatial
        ├── aligned_fiducials.jpg
        ├── detected_tissue_image.jpg
        ├── scalefactors_json.json
        ├── tissue_hires_image.png
        ├── tissue_lowres_image.png
        └── tissue_positions_list.csv

Please note, that the data folder structure needs to be structured as expected by the squidpy.read.visium function.

Edit submission and yaml file

In spatial/ingestion, create a submission file like the one we provide. For this tutorial, you can use the provided.

In general, the spatial submission file expects the following columns:

sample_id spatial_path spatial_filetype spatial_counts spatial_metadata spatial_transformation

For 10X Visium datasets, only the first four columns need to be specified. With Panpipes you can ingest multiple spatial slides by creating one line for each in the submission file. For each slide, one MuData will be created by the pipeline. Detailed information about the submission file is provided in the usage guidelines

Next, in spatial/ingestion call panpipes qc_spatial config (you potentially need to activate the conda environment with conda activate pipeline_env first!). This will generate a pipeline.log and a pipeline.yml file.

Modify the pipeline.yml or simply replace it with the one we provide. Make sure to specify the correct path to the submission file. If you’re using the provided example yaml file, you potentially need to add the path of the conda environment in the yaml.

Run Panpipes

In spatial/ingestion, run panpipes qc_spatial make full --local to ingest your Visium datasets.

In spatial/ingestion you should now have the following files:

ingestion
├── data
├── figures
│   └── spatial
│       ├── spatial_spatial_total_counts.V1_Human_Heart.png
│       ├── spatial_spatial_total_counts.V1_Human_Lymph_Node.png
│       ├── violin_obs_total_counts_sample_id.V1_Human_Heart.png
│       ├── violin_obs_total_counts_sample_id.V1_Human_Lymph_Node.png
│       ├── violin_var_total_counts.V1_Human_Heart.png
│       └── violin_var_total_counts.V1_Human_Lymph_Node.png
├── logs
│	├── make_mudatas_V1_Human_Heart.log
│	├── qcplot.V1_Human_Lymph_Node.log
│	├── make_mudatas_V1_Human_Lymph_Node.log
│	├── spatialQC_V1_Human_Heart.log
│	└── qcplot.V1_Human_Heart.log  
├── qc.data # MuDatas with QC metrics 
│	├──	V1_Human_Heart_unfilt.h5mu
│	└──	V1_Human_Lymph_Node_unfilt.h5mu
├── tmp # MuDatas without QC metrics
│	├──	V1_Human_Heart_raw.h5mu 
│	└──	V1_Human_Lymph_Node_raw.h5mu 
├── pipeline.log
├── pipeline.yml
├── sample_file_qc_spatial.txt
├── V1_Human_Heart_cell_metadata.tsv # Metadata, i.e. .obs
└── V1_Human_Lymph_Node_cell_metadata.tsv # Metadata, i.e. .obs

In the qc.data folder, the final MuData objects with computed QC metrics are stored. MuData objects without QC metrics are also available and stored in the tmp folder. The metadata of the final Mudata objects is additionally extracted and saved as tsv files, V1_Human_Heart_cell_metadata.tsv V1_Human_Lymph_Node_cell_metadata.tsv. Using the provided example yaml file, the first rows and columns of the V1_Human_Heart_cell_metadata tsv file should look as follows:

	spatial:in_tissue	spatial:array_row	spatial:array_col	spatial:sample_id	spatial:MarkersNeutro_score	spatial:n_genes_by_counts
AAACAAGTATCTCCCA-1	1	50	102	V1_Human_Heart	0.46748291571753986	1924

With the plots in spatial/ingestion/figures/spatial you can now decide on cutoffs for filtering. The plots include visualizations of the spatial embeddings, as well as violin plots:

Spatial embedding, total_counts Violin plot, total_counts

Next: filtering and preprocessing using `panpipes preprocess_spatial`

Note: In this workflow, we have decided to process individual ST sections instead of concatenating them at the beginning, as you saw for cell-suspension datasets. This is because the workflows for processing multiple spatial transcriptomics slides (especially concerning normalization, dimensionality reduction, and batch correction) are still experimental. With panpipes, you can group multiple samples and process them one by one with the same choice of parameters. In the future we will implement the advanced functionalities of SpatialData to deal with multi-sample ST datasets.

Note: We find that keeping the suggested directory structure (one main directory by project with all the individual steps in separate folders) is useful for project management. You can of course customize your directories as you prefer, and change the paths accordingly in the pipeline.yml config files!

Ingesting 10X Visium data with Panpipes

Create directories and download data

Edit submission and yaml file

Run Panpipes

Next: filtering and preprocessing using panpipes preprocess_spatial

Next: filtering and preprocessing using `panpipes preprocess_spatial`