Skip to contents

This function obtains enhancers and promoters of each gene, and calculates the bulk average Hi-C matrix of contacts between the gene and its enhancers.

Usage

calculateHiCWeights(
  SCEGdata,
  species,
  genome,
  focus_gene,
  averHicPath,
  TSSwindow = 1000,
  upstream = 250000,
  downstream = 250000,
  verbose = TRUE
)

Arguments

SCEGdata

Preprocessed data for SCEG-HiC.

species

Character string specifying the species name. Supported values are "Homo sapiens" or "Mus musculus".

genome

Character string specifying the genome assembly. Supported values are "hg38", "hg19", "mm10", or "mm9".

focus_gene

Character vector of gene names to focus on.

averHicPath

Path to the bulk average Hi-C data.

TSSwindow

Numeric specifying the number of base pairs to extend upstream and downstream around each TSS to define promoters. Default is 1000 bp (total window size of 2000 bp).

upstream

Numeric specifying the number of base pairs upstream of each TSS to define enhancers. Default is 250,000 bp (250 kb).

downstream

Numeric specifying the number of base pairs downstream of each TSS to define enhancers. Default is 250,000 bp (250 kb).

verbose

Logical. Should progress messages and warnings be printed?

Value

A named list where each element corresponds to a focus gene and contains:

  • promoters: data.frame of promoter regions.

  • enhancers: data.frame of enhancer regions.

  • contact: bulk average Hi-C contact matrix between the gene and enhancers.

Examples

data(multiomic_small)
SCEGdata <- process_data(multiomic_small, k_neigh = 5, max_overlap = 0.5)
#> Generating aggregated data
#> Aggregating cluster 0
#> Sample cells randomly.
#> There are 11 samples
#> Aggregating cluster 1
#> Sample cells randomly.
#> There are 11 samples
fpath <- system.file("extdata", package = "SCEGHiC")
gene <- c("TRABD2A", "GNLY", "MFSD6", "CTLA4", "LCLAT1", "NCK2", "GALM", "TMSB10", "ID2", "CXCR4")
weight <- calculateHiCWeights(SCEGdata, species = "Homo sapiens", genome = "hg38", focus_gene = gene, averHicPath = fpath)
#> Processing chromosome chr2...
#> Found 10 TSS loci on chr2.
#> Calculating Hi-C weights for gene TRABD2A...
#> Calculating Hi-C weights for gene GNLY...
#> Calculating Hi-C weights for gene MFSD6...
#> Calculating Hi-C weights for gene CXCR4...
#> Calculating Hi-C weights for gene CTLA4...
#> Calculating Hi-C weights for gene LCLAT1...
#> Calculating Hi-C weights for gene NCK2...
#> Calculating Hi-C weights for gene ID2...
#> Calculating Hi-C weights for gene GALM...
#> Calculating Hi-C weights for gene TMSB10...
#> Finished calculating Hi-C weights for all genes.
weight[["CTLA4"]]$promoters
#> character(0)
weight[["CTLA4"]]$enhancers
#> [1] "chr2_203623664_203623982" "chr2_203682042_203682957"
#> [3] "chr2_203730093_203731243" "chr2_203786816_203787404"
#> [5] "chr2_203830381_203830563" "chr2_203932737_203935679"
#> [7] "chr2_203945543_203945939" "chr2_203992800_203993979"
#> [9] "chr2_204109915_204110871"
head(weight[["CTLA4"]]$contact)
#>    element1                 element2        score
#>      <char>                   <char>        <num>
#> 1:    CTLA4 chr2_203623664_203623982 0.0009179376
#> 2:    CTLA4 chr2_203682042_203682957 0.0007176351
#> 3:    CTLA4 chr2_203730093_203731243 0.0013151936
#> 4:    CTLA4 chr2_203786816_203787404 0.0013789783
#> 5:    CTLA4 chr2_203830381_203830563 0.0038276615
#> 6:    CTLA4 chr2_203932737_203935679 0.0019456916