To download and process the GEO dataset, please utilize the
prepare_geo
function. This function will yield a list
containing count data, sample information, and gene data.
To prepare the TCGA RNA seq data from R package
TCGAbiolinks
, use function prepare_tcga()
.
This function will yield a list containing count data for all samples,
and unstrand fpkm data for tumor samples with information of samples and
features.
Three functions have been crafted for this workflow.
library(TCGAbiolinks)
library(SummarizedExperiment)
query <- GDCquery(
project = "TCGA-CHOL",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification"
)
GDCdownload(query = query)
data <- GDCprepare(query = query)
lt <- prepare_tcga(data)
lt$all$sampleInfo[["group"]] <- fifelse(lt$all$sampleInfo$sample_type %ilike% "Tumor", "Tumor", "Normal")
# limma workflow
x <- dgeList(lt$all$exprCount, lt$all$sampleInfo, lt$all$featuresInfo)
x <- dprocess_dgeList(x, "group", 10)
efit <- limmaFit(x, "group")
CHOL.DEGs <- limma::topTable(fit = efit, coef = 1, number = Inf)
For a comprehensive understanding of this process, refer to the article RNA-seq analysis is as easy as 1-2-3 with limma, Glimma and edgeR.
Subsequently, visualize the differentially expressed genes using the
plotVolcano
function.
You can download the package r4msigdb
to get the MSigDB
gene set to run pathway enrichment analysis such as GO, KEGG
analysis.
To know more details about this package, please see r4msigdb. To get GO pathways in MSigDB:
The core function is derived from the `fgsea`` package, with slight modifications applied to enhance its visual appeal.
Execute the fgsea analysis.
fgseaRes <- fgsea(pathways = examplePathways,
stats = exampleRanks,
minSize = 15,
maxSize = 500)
plotGSEA(
fgseaRes,
pathways = examplePathways,
pwayname = "5991130_Programmed_Cell_Death",
stats = exampleRanks,
save = FALSE
)
#> Warning in fsort(stats, TRUE): New parallel sort has not been implemented for
#> decreasing=TRUE so far. Using one thread.
To achieve optimal visualization, the plot is saved for review.
Perform the Over-Representation Analysis (ORA).
foraRes <- fora(examplePathways, genes=tail(names(exampleRanks), 200), universe=names(exampleRanks))
Examine the results.
# Adjust the pathway position on the y-axis based on the adjusted p-value (padj)
foraRes[, pathway := factor(pathway, levels = rev(pathway))]
plotORA(data = foraRes[1:8], x = -log10(padj), y = pathway, size = overlap, fill = 'constant')