This function generates preprocessed input data for the SCEG-HiC framework. The output can be
directly used by calculateHiCWeights and Run_SCEG_HiC. It supports both aggregated
(clustered) data and non-aggregated (cell-type-specific) data generation.
Usage
process_data(
object,
aggregate = TRUE,
celltype = NULL,
rna_assay = "RNA",
atac_assay = "peaks",
cellnames = NULL,
atacbinary = TRUE,
k_neigh = 50,
max_overlap = 0.8,
size_factor_normalize = TRUE,
reduction.name = NULL,
seed = 123,
verbose = TRUE
)Arguments
- object
A Seurat object.
- aggregate
Logical. Whether to generate aggregated (clustered) data. If
FALSE, a specificcelltypemust be provided.- celltype
Character. A specific cell type to subset for non-aggregated data. Required if
aggregate = FALSE.- rna_assay
Character. Name of the assay containing gene expression data. If
aggregate = TRUE, this should contain raw expression counts. Ifaggregate = FALSE, this should contain normalized expression values.- atac_assay
Character. Name of the assay containing chromatin accessibility data. If
aggregate = TRUE, this should contain raw accessibility counts. Ifaggregate = FALSE, this should contain normalized accessibility values.- cellnames
Character vector. Name(s) of one or more metadata columns used to group the cells. Default is the current cell identities.
- atacbinary
Logical. Should the aggregated scATAC-seq matrix be binarized?
- k_neigh
Integer. Number of cells to aggregate per group (default is 50).
- max_overlap
Numeric. Maximum allowed overlap ratio between two aggregated groups (default is 0.8).
- size_factor_normalize
Logical. Whether to normalize accessibility values using size factors.
- reduction.name
Character. Name of the dimensionality reduction to use for extracting cell coordinates for aggregation.
- seed
Integer. Random seed.
- verbose
Logical. Should progress messages and warnings be printed?
Value
A list containing preprocessed data for SCEG-HiC:
If
aggregate = TRUE, returns a list with aggregated RNA and ATAC matrices.If
aggregate = FALSE, returns RNA and ATAC matrices corresponding to the selected cell type.
Examples
data(multiomic_small)
# Aggregated data with paired scRNA-seq and scATAC-seq
SCEGdata <- process_data(multiomic_small, k_neigh = 5, max_overlap = 0.5)
#> Generating aggregated data
#> Aggregating cluster 0
#> Sample cells randomly.
#> There are 11 samples
#> Aggregating cluster 1
#> Sample cells randomly.
#> There are 11 samples
# Single-cell type data for cell type "1"
SCEGdata <- process_data(multiomic_small, aggregate = FALSE, celltype = "1")