Create aggregated data for a certain cluster — generate_aggregated

Function to generate aggregated inputs of a cetrain cluster. generate_aggregated_datasets takes as input sparse data. This function will aggregate binary accessibility scores (or gene expression) per cell cluster, if they do not overlap any existing group with more than 50% cells.

Usage

generate_aggregated_datasets(
  object,
  cell_coord,
  rna_assay = "RNA",
  atac_assay = "peaks",
  k_neigh = 50,
  atacbinary = TRUE,
  max_overlap = 0.8,
  seed = 123,
  verbose = TRUE
)

Arguments

object: A Seurat object.
cell_coord: A similarity matrix or dimensionality reduction (e.g., PCA, UMAP) used for identifying neighbors.
rna_assay: Character. Name of the assay containing gene expression data.
atac_assay: Character. Name of the assay containing peak (chromatin accessibility) data.
k_neigh: Integer. Number of neighboring cells to aggregate per group (default is 50).
atacbinary: Logical. Should the aggregated scATAC-seq matrix be binarized?
max_overlap: Numeric. Maximum allowed overlap ratio between two aggregated groups (default is 0.8).
seed: Integer. Random seed.
verbose: Logical. Logical. Should progress messages and warnings be printed?

Value

A matrix or sparse matrix containing aggregated accessibility or expression values.