Project Name

RNAseq Data from Differential Expression to Biological Insights

The results of this RNAseq analysis are presented across six comprehensive sections, each focusing on a different aspect of gene expression and pathway analysis. The findings are supported by a variety of visualizations and interactive tools designed to enhance understanding and facilitate deeper exploration of the data.

Differentially Expressed Genes (DEP) Analysis is the first section, summarizing gene expression differences between two classes. It features both standard and interactive volcano plots that highlight genes with significant expression changes. The interactive version allows users to click on specific genes to view detailed expression boxplots between the classes. Additionally, a detailed table provides metrics such as fold change and p-values, giving a thorough overview of the differential expression results.

Sample Clustering Analysis follows, utilizing the top differentially expressed genes (DEP) to explore the structure of the dataset through heatmaps, PCA, and UMAP. These methods collectively provide insights into sample clustering, expression patterns, and potential outliers. An interactive feature allows users to adjust the number of top DEPs used in the analysis, ensuring the results are tailored to specific research questions.

Over Representation Analysis (ORA) identifies and ranks significantly enriched pathways and Gene Ontology (GO) categories in specific gene sets. The analysis is divided across several pathway databases, including Reactome and WikiPathways. Results are presented with options to view all enriched pathways and GO terms or collapse highly overlapping ones, providing flexibility in interpretation. Interactive visualizations and a dynamic network graph further facilitate the exploration of how genes overlap and interact within these pathways and GO terms.

Gene Set Enrichment Analysis (GSEA) ranks pathways based on their enrichment across a ranked list of genes, offering insights into the biological mechanisms underlying gene expression differences. Like the ORA section, GSEA results can be viewed in full or with collapsed overlaps. Interactive tools allow users to delve into the relationships and overlaps between pathways and their associated genes, providing a comprehensive view of pathway enrichment.

Enriched Pathway Graph presents a curated table of pathways identified as significantly enriched from GSEA and ORA, summarizing their statistical significance, source database, and regulation direction. The table also highlights pathways with consistent regulation across GSEA and ORA. Users can select pathways to visualize individual graphs with gene upregulation and downregulation information, enabling an in-depth exploration of the most relevant biological pathways for the study.

The Protein-Protein Interaction Network section illustrates the complex interactions between proteins encoded by the top differentially expressed genes. This network visualization, enhanced by data from the STRING database, highlights clusters of tightly interacting proteins, offering insights into molecular mechanisms and potential therapeutic targets. Interactive features allow users to explore detailed information on individual proteins, making this section a powerful tool for understanding the broader biological context of the findings.

At the bottom of the results, there is a Parameter Setting section, where users can adjust analysis parameters and re-run the analysis. This section also provides the option to download a full report of the analysis, including all data, tables, and figures, ensuring users have comprehensive access to the results and the ability to tailor the analysis to their specific needs.

Differentially Expressed Genes (DEP)

Differentially Expressed Genes (DEP) Analysis

This section presents the DEP analysis, summarizing gene expression differences between two classes. There are two versions of the volcano plot: a standard version and an interactive version. The standard volcano plot highlights genes with significant expression changes, using color gradients to indicate the magnitude and significance of these changes. The interactive volcano plot allows users to click on individual genes to view a floating window with a boxplot detailing gene expression between the two classes. The ordered boxplot showcases the top differentially expressed genes, arranged by statistical significance, providing a clear view of expression patterns across classes. Additionally, Table PX_DEP provides detailed metrics on gene expression, including fold change, p-values, and statistical significance, facilitating a deeper understanding of differential expression results.

Labeled

Volcano Plot of DEP Between Two Classes

This volcano plot illustrates the differential gene expression between the two classes. The x-axis represents the log2 fold change (log2FC), indicating the magnitude of expression difference, while the y-axis shows the negative logarithm of the p-value (-log10 p-value), reflecting the statistical significance of the expression differences. Genes significantly upregulated in either class are displayed. The color gradient, from blue to red, signifies the increasing level of significance, with red dots representing highly significant changes in gene expression. The dotted lines demarcate customary thresholds for significance in terms of p-value (horizontal) and fold change (vertical). Key genes are labeled if they meet two criteria: notable fold changes and significant p-values. This ensures that the labeled genes are not only statistically significant but also have a biological impact due to their expression level change. Users can adjust these thresholds via the Parameter Setting section at the page's bottom, allowing for a customized analysis that targets genes with potential as biomarkers or therapeutic targets.

Ordered Boxplot of Top DEP Between Two Classes

Show more Show less

This figure presents an array of boxplots, each corresponding to one of the top genes that demonstrate the most significant differential expression between the two classes. The genes are ordered by ascending p-value, as shown above each plot, to emphasize the statistical significance of their differential expression. Genes are colored by class, with the y-axis quantifying gene expression on a log2 counts per million (CPM) scale. Data points within the boxplots reveal individual expression levels, offering insight into the variance and distribution for each class. This ordered arrangement not only highlights the genes with the strongest differential expression but also aligns them according to the confidence in these differences, providing a powerful visualization for users to identify and prioritize genes for further analysis.

Table. PX_DEP

Definitions and interpretations of table columns

Gene: Lists gene names and corresponding Ensembl Gene IDs, linking each gene to its unique identifier in the Ensembl database.
log2FC: Log2 fold change between two classes. Positive values indicate upregulation, negative values indicate downregulation, with magnitude reflecting the extent of differential expression.
log2CPM: Log2 Counts Per Million, showing the average log-transformed expression level of a gene, normalized by library size. Higher values indicate higher expression, adjusted for sequencing depth.
Pvalue: The probability of observing the data under the null hypothesis. Lower p-values suggest stronger evidence for differential expression between classes.
FDR: Adjusted p-value using the False Discovery Rate to control for multiple comparisons. Lower values indicate higher confidence in differential expression.
TestStatistic: Value from the statistical test (e.g., Welch’s t-test) comparing gene expression between classes. Higher absolute values indicate stronger differential expression; the sign shows direction (up or downregulation).
S2N: Signal-to-Noise ratio, comparing mean expression differences between groups relative to variability. Higher values indicate more reliable differential expression; positive values suggest upregulation, negative suggest downregulation. Commonly used in GSEA.

At the bottom of the table, there is a "View" button that, when clicked, allows users to explore details on the selected genes using boxplot to examine the expression variability between classes.

Sample Clustering

Sample Clustering Analysis

This section provides a comprehensive view of sample similarities and differences based on gene expression profiles. Through hierarchical clustering, principal component analysis (PCA), and Uniform Manifold Approximation and Projection (UMAP), this section enables the identification of patterns that may distinguish between different sample classes. By applying these clustering techniques, the analysis aids in uncovering underlying biological variation, assessing sample homogeneity, and validating experimental groupings, all of which are crucial for downstream analyses. Users can adjust the analysis results by modifying the number of top DEP, allowing for tailored exploration of sample clustering dynamics.

Sample Clustering Heatmap Using Top DEP

This heatmap displays the normalized expression levels of various top differentially expressed genes (x-axis) across multiple samples (y-axis), with hierarchical clustering applied to both axes to highlight patterns and relationships in the data. Expression levels are color-coded, ranging from blue (low expression) to red (high expression).

Principal Component Analysis (PCA) of Samples Using Top DEP

This PCA plot depicts the distribution of various samples according to the top n DEPs. Each symbol represents a sample, with its color indicating class. The axes correspond to the first two principal components, capturing variability within the dataset as noted by the percentage of variance explained in parentheses. The separation along the principal components may reflect the genetic underpinnings that distinguish the two classes.

Over Representation Analysis (ORA)

Gene Set Enrichment Analysis (GSEA)

The Gene Set Enrichment Analysis (GSEA) conducted in this section identifies and ranks pathways based on their enrichment across a ranked list of genes. This analysis specifically utilizes pathways from the Reactome and WikiPathways databases, helping to uncover which pathways are most associated with differential gene expression between distinct classes, thereby offering insights into underlying biological mechanisms.
The results are presented with two display options: (a) "All," which shows all enriched pathways, and (b) "Overlap Collapsed," which condenses highly overlapping pathways using the collapsePathwaysGSEA function from the fgsea library. The GSEA itself is conducted using the fgsea function from the same library, ensuring a thorough and comprehensive analysis of gene set enrichment.
Users can explore these results in detail through the "PX_GSEA_results" table, which provides key metrics such as p-values, adjusted p-values (padj), and leading-edge genes that drive the enrichment. At the bottom of the table, a "View" button allows users to generate two key visualizations: a Pathway Inclusion and Overlap Heatmap, which shows the relationships and overlaps between selected pathways and their associated genes, and a Gene Set Enrichment Analysis (GSEA) plot, which graphically represents the enrichment score for each pathway. These interactive visualizations offer a comprehensive understanding of how specific pathways are enriched and their potential biological relevance.

All
Overlap Collapsed
GSEA Plot for Selected Pathways

Pathway Enrichment Analysis as Determined by Gene Set Enrichment Analysis

The two plots display pathways upregulated in two classes. The first plot shows pathways enriched in one class, while the second plot shows those enriched in the other. Pathways are ranked by Normalized Enrichment Score (NES), with p-values and adjusted p-values (padj) indicating statistical significance. Genes within each pathway are shown as ticks along the center line, sorted by the significance of their differential expression with direction; ticks above the line indicate positive correlation, and those below indicate negative correlation. The length of the ticks reflects the strength of correlation, while gaps indicate genes not included in the pathway.

Table. PX_GSEA_results

Definitions and interpretations of table columns

pathway: The identifier or name of the pathway being analyzed, typically linked to a specific biological process or function. Clicking the name expands the row and provides a link to external details.
pathway_db: The database from which the pathway originates, such as Reactome or WikiPathways.
pathway_description: A brief description of the pathway, detailing its biological role or the processes it involves.
pval: The p-value for the pathway's enrichment, indicating the probability that the observed enrichment is due to chance. Lower values suggest more statistically significant enrichment.
padj: The adjusted p-value using the Benjamini-Hochberg procedure, which accounts for multiple testing corrections. This helps control the false discovery rate, making it more reliable than the raw p-value.
log2err: The logarithm (base 2) of the error term associated with the enrichment score. It provides insight into the variability or uncertainty in the enrichment score.
ES: The Enrichment Score, a measure of the DEPree to which the pathway is overrepresented at the top (or bottom) of the ranked list of genes. A positive score indicates enrichment in one class, while a negative score indicates enrichment in the other class.
NES: The Normalized Enrichment Score, which adjusts the ES for the size of the gene set, allowing comparisons across gene sets of different sizes. It indicates the strength of enrichment normalized for pathway size. Pathways are ranked by NES.
size: The number of genes in the pathway that overlap with the genes ranked by differential expression. This reflects the pathway's coverage in the data.
Regulation: Indicates whether the pathway is upregulated in the specific class or comparison.
leadingEdge: The subset of genes within the pathway that contribute most to the enrichment score. These are typically the genes at the top or bottom of the ranked list that drive the pathway's enrichment. Clicking this expands the row to show more details, with each gene linkable for further exploration.

At the bottom of the table, there is a "View" button that, when clicked, generates two visualizations. The first is a Pathway Inclusion and Overlap Heatmap, which visually displays the relationships between the selected pathways and their associated genes, highlighting both overlapping and unique gene-pathway associations. The second is a Gene Set Enrichment Analysis (GSEA) plot, showing the enrichment score curve for the selected pathways and indicating where the most significant gene contributions are along the ranked list. These graphs provide a detailed view of how the selected pathways interact with the gene expression data.

Pathway Enrichment Analysis as Determined by Gene Set Enrichment Analysis

This plot depicts a ranked list of pathways based on their association with drug response. For each pathway, genes are represented as ticks along the center line, sorted according to their correlation with drug response; those above the line positively correlate, while those below indicate a negative correlation. The extent of the ticks reflects the magnitude of correlation. Areas without ticks indicate genes not included in the respective pathway. Pathways are ordered on the y-axis by their level of significance, with the most significant at the top.

Table. PX_GSEA_collapsed_results

GSEA Plot for Selected Pathways

Pathway Inclusion and Overlap Heatmap

This visualization displays the relationship between various biological pathways (rows) and individual genes (columns). Green bars indicate gene participation within the respective pathways, elucidating both overlapping and exclusive gene-pathway associations. Pathways are hierarchically clustered based on gene membership similarity. Note: Pathways with complete inclusion relationships may not cluster adjacently if other pathways with shared gene memberships influence the overall distance calculations. The clustering considers the presence or absence of all genes across the pathways, which may affect the proximity of even completely inclusive pathways within the dendrogram.

Gene Set Enrichment Analysis (GSEA) across Selected Pathways

The GSEA plot demonstrates the enrichment of specific pathways based on a ranked list of genes and their differential expression between classes. The x-axis represents the rank of all genes in the dataset, while the y-axis shows the running enrichment score (ES) for the selected pathway. The green line indicates the cumulative ES, the peak of the ES curve indicates the point of maximum enrichment within the ranked gene list. The vertical black lines beneath the ES line represent genes that are part of the pathway, positioned according to the level and direction of differential expression. The red dashed line indicates the maximum positive ES for pathways upregulated in one class or the minimum negative ES for those upregulated in the other class, marking the points of strongest enrichment within the gene list.

Enriched Pathway Graph

Enriched Pathway Graph from GSEA and ORA

This section presents a curated table of biological pathways that have been identified as significantly enriched based on gene expression data. The table outlines each pathway's source database, a detailed description, and their respective statistical significance from Gene Set Enrichment Analysis (GSEA) and Over Representation Analysis (ORA), including p-values and adjusted p-values (p-adj). Additionally, the table provides information on the direction of regulation for each pathway in different biological conditions, such as ones upregulated in a class identified by GSEA, with corresponding ORA statistics when available, or vice versa. Pathways where regulation is consistent across GSEA and ORA are noted with 'Identical' under 'Regulation Match,' ensuring users can easily discern pathways with coherent patterns of regulation. Users can select between one and twenty pathways to visualize individual pathway graphs with colored gene nodes for gene regulation in a class. Pathway overlapping and inclusion information is depicted in a heatmap. This interactive tool allows for an in-depth exploration of key pathways relevant to the study's phenotype or condition of interest.

Table. PX_EnrichedPathways

Definitions and interpretations of table columns

pathway: The name or identifier of the pathway being analyzed, linked to a specific biological process or function. Clicking the name expands the row and provides a link to external details.
pathway_db: The database source of the pathway, such as Reactome or WikiPathways.
pathway_description: A brief summary of the pathway, providing an overview of its biological role.
NES_GSEA: The Normalized Enrichment Score from GSEA, indicating the strength of pathway enrichment. Higher absolute values represent stronger enrichment. Rows are ranked by this column.
size_GSEA: The number of genes in the pathway that overlap with the ranked gene list in GSEA.
Regulation_GSEA: The direction of regulation in the GSEA analysis, showing whether the pathway is upregulated or downregulated in the analyzed class.
padj_ORA: The adjusted p-value from ORA, corrected for multiple comparisons. Lower values indicate higher confidence in pathway significance.
pval_ORA: The p-value from ORA, indicating the statistical significance of pathway enrichment. Lower values suggest stronger evidence for enrichment.
pval_GSEA: The adjusted p-value from GSEA, corrected for multiple comparisons. Lower values indicate higher confidence in pathway significance.
padj_GSEA: The p-value from GSEA, indicating the statistical significance of pathway enrichment. Lower values suggest stronger evidence for enrichment.
Regulation_Match: Indicates whether the regulation direction (upregulated or downregulated) is identical between the ORA and GSEA analyses.

At the bottom of the table, there is a "View" button that generates a pathway diagram similar to the uploaded image. For human RNAseq data, both Reactome and WikiPathways diagrams are produced, while for mouse RNAseq data, only WikiPathways diagrams are available. These visualizations illustrate the expression levels and regulatory status of genes within the selected pathways, using color coding to highlight differences between classes. This provides an intuitive understanding of how specific genes contribute to the biological processes under study across different conditions.

Pathway Graph for Selected Pathways

Representation of Pathway Gene Regulation by Class

This pathway diagram illustrates the expression levels of genes in a class when compared to the other class. Genes upregulated are highlighted in varying shades of red, with color depth corresponding to the magnitude of log2 fold change (log2FC). Conversely, genes downregulated are indicated in shades of blue. A gene depicted without color signifies a low average expression or variation across samples, indicating minimal involvement in the class difference. For Reactome pathways, grey indicates genes that are not a part of this specific pathway but are included in a higher-level parental pathway. This color-coding provides a clear visual distinction of gene regulation within the pathway in relation to the response observed.

Protein-Protein Interaction (PPI) Analysis

Protein-protein Interaction Network

Protein-protein interaction network illustrating the intricate connections between various proteins within a biological system, in this case, the top N differentially expressed genes between the classes. Nodes represent individual proteins, with edges denoting physical or functional interactions. Clusters of nodes, or cliques, represent groups of proteins with tight interactions, highlighting functional modules within the network. The network topology highlights the complexity and interdependency of protein interactions, providing insights into molecular mechanisms and potential targets for therapeutic intervention. Each protein node is clickable to display details retrieved from the STRING database.

Number of DEP by Fold Change and P-value

The stacked bar chart illustrates the stratification of differentially expressed genes (DEP) by their classes, delineated at a predefined p-value threshold. Fold Change (FC) values are plotted on the x-axis against the count of DEP on the y-axis. The bars are bifurcated into two segments for genes upregulated either class. The segmentation is determined by the FC value relative to unity, with bars representing the spectrum of FC values from 1.1 to 10. This visualization highlights the trend of decreasing gene counts with increasing FC magnitudes. A prominent golden arrow indicates the FC threshold applied during the analysis, signifying the minimal change for a gene to be considered significantly differentially expressed. The horizontal dashed line corresponds to the selected cutoff for the number of DEP to be included in the Over Representation Analysis (ORA). Should the actual DEP count fall below this line, an adjustment to the FC threshold is recommended to ensure a sufficient gene pool for ORA, otherwise, the extant DEP count will be utilized. Users can modify three critical parameters: FC cutoff, p-value cutoff, and the target number of DEP, to update the graph for guiding more tailored analyses.

Gene Expression Parameters

Gene Expression Analysis Parameters

Overview:In gene expression analysis, several key variables must be considered, which can significantly influence the outcome and interpretation of the results, in particular, the identification of differentially expressed genes (DEP) and downstream analysis

Parameters:

Method for identifying DEP:Three methods are supported: Welch’s t-test (default), Wilcoxon rank-sum test, and edgeR’s quasi-likelihood F-test (glmQLFTest); see Help for details.
Fold Change (FC) threshold for DEP: FC is a measure of the change in expression level of a gene between two conditions. It impacts almost all analyses by setting the threshold for considering a gene as differentially expressed.
P-value threshold for DEP: It provides a measure of the statistical significance of the observed changes in gene expression. Any genes with P-value, as determined by the statistical test, smaller than the threshold is considered DEP
Number of Top DEP for Visualizations and Analyses:
In Volcano plot: Decide how many of the most differentially expressed genes to display.

In Boxplot: Decides how many of the most significantly differentially expressed genes to display.

In Heatmap/PCA/UMAP: Influences the resolution of clustering and pattern detection in these dimensionality reduction and visualization methods.

For ORA: Affects the scope of pathway and GO term enrichment analysis by determining the gene set size.
FDR Threshold for ORA and GSEA: Sets the minimum False Discovery Rate (FDR) for considering pathways or GO terms in the results, filtering out less significant findings.
Gene Ranking Metric:
Signal2Noise (Default): This metric is used as the default ranking metric in Broad Institute’s GSEA software. It calculates the difference in means between two groups, normalized by the sum of their standard deviations. This ratio helps in identifying genes that have large differences in expression relative to the variability within each group.

TestStatistic: This metric can be Welch t-test statistic, Wilcoxon rank-sum statistic, or the F statistic in edgeR’s QLT, depending on which method is used to identify DEP.
Default Value Usage: When uncertain about which thresholds to set for FC, p-value, and FDR, it is often recommended to use the default settings provided by the analysis software, which are based on standard practices.

Practical Insights: By carefully selecting these variables, researchers can tailor their gene expression analyses to their specific experimental needs and ensure that the interpretations they draw are both robust and relevant to their biological questions.

Method for identifying DEP:
(default is Welch’s t-test)

Fold Change (FC) threshold for DEP：
(default is 2, allowing 1.1 to 10)

P-value threshold for DEP：
(default is 0.01, allowing 0.001 to 0.05)

Number of top DEP in Boxplot/Volcano plot：
(default is 20, allowing 1 to 100)

Number of top DEP in Heatmap/PCA/UMAP：
(default is 20, allowing 2 to 1000)

Number of top DEP for ORA：
(default is 300, allowing 50 to 1000)

FDR threshold for ORA and GSEA：
(default is 0.05, allowing 0.1, 0.05, 0.001)

Gene ranking metric for GSEA:
(default is Signal2Noise)

Project Name

RNAseq Data from Differential Expression to Biological Insights

Differentially Expressed Genes (DEP)

Differentially Expressed Genes (DEP) Analysis

Sample Clustering

Sample Clustering Analysis

Over Representation Analysis (ORA)

Over Representation Analysis (ORA)

Draw network

Gene Set Enrichment Analysis (GSEA)

Gene Set Enrichment Analysis (GSEA)

GSEA Plot for Selected Pathways

Enriched Pathway Graph

Enriched Pathway Graph from GSEA and ORA

Pathway Graph for Selected Pathways

Protein-Protein Interaction (PPI) Analysis

Protein-protein Interaction Network

Gene Expression Analysis Parameters

Report Date