Skip to contents

This function generates preprocessed input data for the SCEG-HiC framework. The output can be directly used by calculateHiCWeights and Run_SCEG_HiC. It supports both aggregated (clustered) data and non-aggregated (cell-type-specific) data generation.

Usage

process_data(
  object,
  aggregate = TRUE,
  celltype = NULL,
  rna_assay = "RNA",
  atac_assay = "peaks",
  cellnames = NULL,
  atacbinary = TRUE,
  k_neigh = 50,
  max_overlap = 0.8,
  size_factor_normalize = TRUE,
  reduction.name = NULL,
  seed = 123,
  verbose = TRUE
)

Arguments

object

A Seurat object.

aggregate

Logical. Whether to generate aggregated (clustered) data. If FALSE, a specific celltype must be provided.

celltype

Character. A specific cell type to subset for non-aggregated data. Required if aggregate = FALSE.

rna_assay

Character. Name of the assay containing gene expression data. If aggregate = TRUE, this should contain raw expression counts. If aggregate = FALSE, this should contain normalized expression values.

atac_assay

Character. Name of the assay containing chromatin accessibility data. If aggregate = TRUE, this should contain raw accessibility counts. If aggregate = FALSE, this should contain normalized accessibility values.

cellnames

Character vector. Name(s) of one or more metadata columns used to group the cells. Default is the current cell identities.

atacbinary

Logical. Should the aggregated scATAC-seq matrix be binarized?

k_neigh

Integer. Number of cells to aggregate per group (default is 50).

max_overlap

Numeric. Maximum allowed overlap ratio between two aggregated groups (default is 0.8).

size_factor_normalize

Logical. Whether to normalize accessibility values using size factors.

reduction.name

Character. Name of the dimensionality reduction to use for extracting cell coordinates for aggregation.

seed

Integer. Random seed.

verbose

Logical. Should progress messages and warnings be printed?

Value

A list containing preprocessed data for SCEG-HiC:

  • If aggregate = TRUE, returns a list with aggregated RNA and ATAC matrices.

  • If aggregate = FALSE, returns RNA and ATAC matrices corresponding to the selected cell type.

Examples

data(multiomic_small)

# Aggregated data with paired scRNA-seq and scATAC-seq
SCEGdata <- process_data(multiomic_small, k_neigh = 5, max_overlap = 0.5)
#> Generating aggregated data
#> Aggregating cluster 0
#> Sample cells randomly.
#> There are 11 samples
#> Aggregating cluster 1
#> Sample cells randomly.
#> There are 11 samples

# Single-cell type data for cell type "1"
SCEGdata <- process_data(multiomic_small, aggregate = FALSE, celltype = "1")