scGTM Quickstart • scGTM

Introduction

Modeling single-cell gene expression trends along cell pseudotime is a crucial analysis for exploring biological processes. However, model interpretability and flexibility are both indispensable for understanding biological processes. While existing methods either provide trends too complex to interpret using nonparametric models or use interpretable but restrictive models.\(\\\)

scGTM is designed as an application to capture interpretable gene expression trends along cell pseudotime and reveal molecular insights underlying the biological processes. scGTM also can flexibly accommodate common distributions for modeling gene expression counts.\(\\\)

Here are the explanations on the parameters of scGTM:\(\\\)

t: A numeric vector of the input normalized pseudotime data of a given gene,length equals the numbers of cells\(\\\)

y1: A vector of integers, representing the input expression counts of a given gene,length equals the numbers of cells\(\\\)

gene_name: A single string vector, indicates the gene name used in the model, default=NULL\(\\\)

marginal: A string of the distribution name. One of Poisson, ZIP, NB,ZINB, and Gaussian.default=ZIP\(\\\)

iter_num: A single integer vector, indicates max number of iteration used in the PSO algorithm that estimates model parameters\(\\\)

hill_only: A logical vector, determine whether the curve is hill only or not\(\\\)

seed: A numeric variable of the random seed, affecting parametric fitting of the marginal distribution.default=123\(\\\)

suppressPackageStartupMessages(library(stats))
suppressPackageStartupMessages(library(BiocParallel))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(tibble))
suppressPackageStartupMessages(library(cowplot))
suppressPackageStartupMessages(library(scGTM))

In this quickstart guide, we demonstrate the basic functionality of the scGTM package. scGTM package allows users to specify the () inference method, but we will use () as the example in our analysis.

Read in the reference data

gyrus <- read.csv(file = 'gyrus_sce.csv')

#Example of gene Tmsb10 from gyrus dataset


scGTM(t=gyrus$pseudotime, 
      y1=gyrus$Tmsb10, 
      marginal="Poisson",
      hill_only = FALSE)
#> The need of transformation:  FALSE
#> We are estimating gene with marginal Poisson .
#> Best parameter estimation:
#>  mu , k1 , k2 , t0:
#>  4.28 7.7 11.41 0.7 3.4 
#> The 95% confidence interval of the activation time t0:
#>  t0 : ( 0.683 ,  0.718 )
#> 
#> The 95% CIs for activation strength k1 and k2:
#>  k1 : ( 7.068 ,  8.342 )
#>  k2 : ( 8.905 ,  13.92 )
#> $negative_log_likelihood
#> [1] 8181.274
#> 
#> $mu
#> [1] 4.283738
#> 
#> $k1
#> [1] 7.704972
#> 
#> $k2
#> [1] 11.4126
#> 
#> $t0
#> [1] 0.7001891
#> 
#> $phi
#> [1] NA
#> 
#> $sd
#> [1] NA
#> 
#> $alpha
#> [1] NA
#> 
#> $beta
#> [1] NA
#> 
#> $t0_lower
#> [1] 0.683
#> 
#> $t0_upper
#> [1] 0.718
#> 
#> $t0_std
#> [1] 0.008898592
#> 
#> $k1_lower
#> [1] 7.068
#> 
#> $k1_upper
#> [1] 8.342
#> 
#> $k1_std
#> [1] 0.3247922
#> 
#> $k2_lower
#> [1] 8.905
#> 
#> $k2_upper
#> [1] 13.92
#> 
#> $k2_std
#> [1] 1.279397
#> 
#> $mu_lower
#> [1] 4.171
#> 
#> $mu_upper
#> [1] 4.396
#> 
#> $mu_std
#> [1] 0.05741533
#> 
#> $Fisher
#> [1] "Non-singular"
#> 
#> $Transform
#> [1] 0
#> 
#> $Design_para
#> [1] NA

#Visualization

plot_result(para = c(4.28,7.7,11.41,0.7), 
            t = gyrus$pseudotime, 
            color = c('red', 'darkviolet', 'orange', 'darkgreen'), 
            marginal = "Poisson", 
            flag = FALSE, 
            y1 = gyrus$Tmsb10, 
            hill_only = FALSE,
            gene_name = "Tmsb10")