Skip to contents

This function runs the SCEG-HiC model to infer gene-enhancer (gene-peak) links using preprocessed single-cell multi-omics data and bulk average Hi-C data as a prior. It calculates a partial correlation matrix via the weighted graphical lasso (wglasso) approach and selects gene-peak pairs accordingly.

Usage

Run_SCEG_HiC(
  SCEGdata,
  HiCWeights,
  method = "wglasso",
  focus_gene,
  normalizeMethod = "1",
  cutoff = NULL,
  verbose = TRUE,
  alpha = NULL
)

Arguments

SCEGdata

A list containing data preprocessed by SCEG-HiC.

HiCWeights

A data frame of raw bulk average Hi-C interaction weights. Output of calculateHiCWeights().

method

The method used for gene network reconstruction. Default is "wglasso".

focus_gene

A character vector of gene symbols to focus on.

normalizeMethod

Method used to normalize the average Hi-C scores. Default is "1".

  • "1": Normalize by rank scores.

  • "2": Normalize by -log10 transformation.

  • "3": Binarize the scores (values > 0 become 1; values ≤ 0 become 0).

  • "4": Normalize by min-max scaling.

cutoff

Threshold for selecting gene-peak pairs. Default is NULL. If aggregate = TRUE, we recommend setting cutoff = 0.01. If aggregate = FALSE, we recommend cutoff = 0.001.

verbose

Logical. Should progress messages and warnings be printed?

alpha

Multiplicative scaling factor for the penalty parameter (alpha * rho).

Value

A data.frame containing correlation values and metadata for each gene-peak pair.

Examples

data(multiomic_small)
SCEGdata <- process_data(multiomic_small, k_neigh = 5, max_overlap = 0.5)
#> Generating aggregated data
#> Aggregating cluster 0
#> Sample cells randomly.
#> There are 11 samples
#> Aggregating cluster 1
#> Sample cells randomly.
#> There are 11 samples
fpath <- system.file("extdata", package = "SCEGHiC")
gene <- c("TRABD2A", "GNLY", "MFSD6", "CTLA4", "LCLAT1", "NCK2", "GALM", "TMSB10", "ID2", "CXCR4")
weight <- calculateHiCWeights(SCEGdata, species = "Homo sapiens", genome = "hg38", focus_gene = gene, averHicPath = fpath)
#> Processing chromosome chr2...
#> Found 10 TSS loci on chr2.
#> Calculating Hi-C weights for gene TRABD2A...
#> Calculating Hi-C weights for gene GNLY...
#> Calculating Hi-C weights for gene MFSD6...
#> Calculating Hi-C weights for gene CXCR4...
#> Calculating Hi-C weights for gene CTLA4...
#> Calculating Hi-C weights for gene LCLAT1...
#> Calculating Hi-C weights for gene NCK2...
#> Calculating Hi-C weights for gene ID2...
#> Calculating Hi-C weights for gene GALM...
#> Calculating Hi-C weights for gene TMSB10...
#> Finished calculating Hi-C weights for all genes.
results_SCEGHiC <- Run_SCEG_HiC(SCEGdata, weight, focus_gene = gene)
#> Total predicted genes: 10
#> Running model for gene: TRABD2A
#> [1] "The optimal penalty parameter (rho) selected by BIC is: 0.43"
#> Running model for gene: GNLY
#> [1] "The optimal penalty parameter (rho) selected by BIC is: 0.19"
#> Running model for gene: MFSD6
#> [1] "The optimal penalty parameter (rho) selected by BIC is: 0.22"
#> Running model for gene: CXCR4
#> [1] "The optimal penalty parameter (rho) selected by BIC is: 0.14"
#> Running model for gene: CTLA4
#> [1] "The optimal penalty parameter (rho) selected by BIC is: 0.17"
#> Running model for gene: LCLAT1
#> [1] "The optimal penalty parameter (rho) selected by BIC is: 0.41"
#> Running model for gene: NCK2
#> [1] "The optimal penalty parameter (rho) selected by BIC is: 0.25"
#> Running model for gene: ID2
#> [1] "The optimal penalty parameter (rho) selected by BIC is: 0.13"
#> Running model for gene: GALM
#> [1] "The optimal penalty parameter (rho) selected by BIC is: 0.11"
#> Running model for gene: TMSB10
#> [1] "The optimal penalty parameter (rho) selected by BIC is: 0.44"
results_SCEGHiC[results_SCEGHiC$gene == "CTLA4", ]
#>                           gene                     peak       score
#> chr2_203623664_203623982 CTLA4 chr2_203623664_203623982  0.05790189
#> chr2_203730093_203731243 CTLA4 chr2_203730093_203731243  0.25250722
#> chr2_203786816_203787404 CTLA4 chr2_203786816_203787404  0.00000000
#> chr2_203932737_203935679 CTLA4 chr2_203932737_203935679 -0.12877374
#> chr2_203945543_203945939 CTLA4 chr2_203945543_203945939  0.00000000
#> chr2_203992800_203993979 CTLA4 chr2_203992800_203993979  0.47307452
#> chr2_204109915_204110871 CTLA4 chr2_204109915_204110871  0.00000000