The results of this RNAseq analysis are presented across six comprehensive sections, each focusing on a different aspect of gene expression and pathway analysis. The findings are supported by a variety of visualizations and interactive tools designed to enhance understanding and facilitate deeper exploration of the data.
Differentially Expressed Genes (DEP) Analysis is the first section, summarizing gene expression differences between two classes. It features both standard and interactive volcano plots that highlight genes with significant expression changes. The interactive version allows users to click on specific genes to view detailed expression boxplots between the classes. Additionally, a detailed table provides metrics such as fold change and p-values, giving a thorough overview of the differential expression results.
Sample Clustering Analysis follows, utilizing the top differentially expressed genes (DEP) to explore the structure of the dataset through heatmaps, PCA, and UMAP. These methods collectively provide insights into sample clustering, expression patterns, and potential outliers. An interactive feature allows users to adjust the number of top DEPs used in the analysis, ensuring the results are tailored to specific research questions.
Over Representation Analysis (ORA) identifies and ranks significantly enriched pathways and Gene Ontology (GO) categories in specific gene sets. The analysis is divided across several pathway databases, including Reactome and WikiPathways. Results are presented with options to view all enriched pathways and GO terms or collapse highly overlapping ones, providing flexibility in interpretation. Interactive visualizations and a dynamic network graph further facilitate the exploration of how genes overlap and interact within these pathways and GO terms.
Gene Set Enrichment Analysis (GSEA) ranks pathways based on their enrichment across a ranked list of genes, offering insights into the biological mechanisms underlying gene expression differences. Like the ORA section, GSEA results can be viewed in full or with collapsed overlaps. Interactive tools allow users to delve into the relationships and overlaps between pathways and their associated genes, providing a comprehensive view of pathway enrichment.
Enriched Pathway Graph presents a curated table of pathways identified as significantly enriched from GSEA and ORA, summarizing their statistical significance, source database, and regulation direction. The table also highlights pathways with consistent regulation across GSEA and ORA. Users can select pathways to visualize individual graphs with gene upregulation and downregulation information, enabling an in-depth exploration of the most relevant biological pathways for the study.
The Protein-Protein Interaction Network section illustrates the complex interactions between proteins encoded by the top differentially expressed genes. This network visualization, enhanced by data from the STRING database, highlights clusters of tightly interacting proteins, offering insights into molecular mechanisms and potential therapeutic targets. Interactive features allow users to explore detailed information on individual proteins, making this section a powerful tool for understanding the broader biological context of the findings.
At the bottom of the results, there is a Parameter Setting section, where users can adjust analysis parameters and re-run the analysis. This section also provides the option to download a full report of the analysis, including all data, tables, and figures, ensuring users have comprehensive access to the results and the ability to tailor the analysis to their specific needs.
This section presents the DEP analysis, summarizing gene expression differences between two classes. There are two versions of the volcano plot: a standard version and an interactive version. The standard volcano plot highlights genes with significant expression changes, using color gradients to indicate the magnitude and significance of these changes. The interactive volcano plot allows users to click on individual genes to view a floating window with a boxplot detailing gene expression between the two classes. The ordered boxplot showcases the top differentially expressed genes, arranged by statistical significance, providing a clear view of expression patterns across classes. Additionally, Table PX_DEP provides detailed metrics on gene expression, including fold change, p-values, and statistical significance, facilitating a deeper understanding of differential expression results.
This volcano plot illustrates the differential gene expression between the two classes. The x-axis represents the log2 fold change (log2FC), indicating the magnitude of expression difference, while the y-axis shows the negative logarithm of the p-value (-log10 p-value), reflecting the statistical significance of the expression differences. Genes significantly upregulated in either class are displayed. The color gradient, from blue to red, signifies the increasing level of significance, with red dots representing highly significant changes in gene expression. The dotted lines demarcate customary thresholds for significance in terms of p-value (horizontal) and fold change (vertical). Key genes are labeled if they meet two criteria: notable fold changes and significant p-values. This ensures that the labeled genes are not only statistically significant but also have a biological impact due to their expression level change. Users can adjust these thresholds via the Parameter Setting section at the page's bottom, allowing for a customized analysis that targets genes with potential as biomarkers or therapeutic targets.
Show more Show less
This figure presents an array of boxplots, each corresponding to one of the top genes that demonstrate the most significant differential expression between the two classes. The genes are ordered by ascending p-value, as shown above each plot, to emphasize the statistical significance of their differential expression. Genes are colored by class, with the y-axis quantifying gene expression on a log2 counts per million (CPM) scale. Data points within the boxplots reveal individual expression levels, offering insight into the variance and distribution for each class. This ordered arrangement not only highlights the genes with the strongest differential expression but also aligns them according to the confidence in these differences, providing a powerful visualization for users to identify and prioritize genes for further analysis.
Definitions and interpretations of table columns
At the bottom of the table, there is a "View" button that, when clicked, allows users to explore details on the selected genes using boxplot to examine the expression variability between classes.
This section provides a comprehensive view of sample similarities and differences based on gene expression profiles. Through hierarchical clustering, principal component analysis (PCA), and Uniform Manifold Approximation and Projection (UMAP), this section enables the identification of patterns that may distinguish between different sample classes. By applying these clustering techniques, the analysis aids in uncovering underlying biological variation, assessing sample homogeneity, and validating experimental groupings, all of which are crucial for downstream analyses. Users can adjust the analysis results by modifying the number of top DEP, allowing for tailored exploration of sample clustering dynamics.
This heatmap displays the normalized expression levels of various top differentially expressed genes (x-axis) across multiple samples (y-axis), with hierarchical clustering applied to both axes to highlight patterns and relationships in the data. Expression levels are color-coded, ranging from blue (low expression) to red (high expression).
This PCA plot depicts the distribution of various samples according to the top n DEPs. Each symbol represents a sample, with its color indicating class. The axes correspond to the first two principal components, capturing variability within the dataset as noted by the percentage of variance explained in parentheses. The separation along the principal components may reflect the genetic underpinnings that distinguish the two classes.
The Over Representation Analysis (ORA) performed in this section identifies and ranks pathways and Gene Ontology (GO) categories that are significantly enriched in specific gene sets. The analysis is divided across various pathway databases, including Reactome, WikiPathways, and Gene Ontology categories (Biological Process, Cellular Component, Molecular Function), highlighting those most associated with upregulated genes in distinct conditions or classes. The results are presented with two display options: (a) "All," which shows all enriched pathways or GO terms, and (b) "Overlap Collapsed," which collapses highly overlapping pathways or GO terms using the collapsePathwaysORA function from the fgsea library. The ORA itself is performed using the fora function from the same library.
Interactive visualizations allow users to filter results by FDR thresholds, view the significance of each pathway, and explore overlapping genes, all for better understanding of relevant biological processes. For instance, if a pathway related to immune response is found to be enriched, this could suggest an underlying mechanism of disease progression or treatment response.
The table "PX_ORA_results" allows users to browse and select pathways or GO terms based on their significance, overlap, and regulation status. After selecting up to 20 items from this table, the "View" button provides a deeper exploration. Users can then access a Dynamic Network Graph showing gene and gene set associations and a Heatmap displaying gene involvement across different pathways and GO terms. This feature offers a clear visual representation of how genes overlap and interact within the selected pathways or GO terms.
Show more Show less
The interactive bar plot delineates the results of an Over Representation Analysis (ORA), showcasing pathways and GO categories that are significantly upregulated in either class. The plot is segregated into terms from Reactome, WikiPathways, GO-Biological Process (GO-BP), GO-Cellular Component (GO-CC), and GO-Molecular Function (GO-MF), if available, with each bar's length representing the negative logarithm (base 10) of the False Discovery Rate (FDR). Users can interact with the color-coded square bar to select or deselect specific terms, and customize the visualization by filtering results based on different FDR thresholds, enabling a tailored analysis of the data's statistical robustness.
Definitions and interpretations of table columns
At the bottom of the table, there is a "View" button that, when clicked, allows users to explore details on the selected pathways and Gene Ontology (GO) terms using a Dynamic Network Graph of Gene and Gene Set Associations and a Heatmap Representation of Gene Involvement in Pathways and Gene Ontology Terms to visually display the relationships between genes and the gene sets (pathways or GO terms) they are associated with, and show the overlapping between different pathways and GO terms
Show more Show less
The interactive bar plot delineates the results of an Over Representation Analysis (ORA), showcasing pathways and GO categories that are significantly upregulated in either class. The plot is segregated into terms from Reactome, WikiPathways, GO-Biological Process (GO-BP), GO-Cellular Component (GO-CC), and GO-Molecular Function (GO-MF), if available, with each bar's length representing the negative logarithm (base 10) of the False Discovery Rate (FDR). Users can interact with the color-coded square bar to select or deselect specific terms, and customize the visualization by filtering results based on different FDR thresholds, enabling a tailored analysis of the data's statistical robustness.
At the bottom of the table, there is a "View" button that, when clicked, allows users to explore details on the selected pathways and Gene Ontology (GO) terms using a Dynamic Network Graph of Gene and Gene Set Associations and a Heatmap Representation of Gene Involvement in Pathways and Gene Ontology Terms to visually display the relationships between genes and the gene sets (pathways or GO terms) they are associated with, and show the overlapping between different pathways and GO terms
This interactive force-directed network graph depicts the associations between genes and their corresponding pathways or Gene Ontology (GO) terms. Large nodes, each marked by a unique color, denote either pathways or GO terms, whereas small nodes represent individual genes. Genes linked to a single term share the term's color, while those associated with multiple terms are highlighted in red. Users can click on any node to spotlight its connections, revealing associated nodes and their interrelations. Additionally, nodes can be clicked and dragged to manually adjust the graph's layout, offering a dynamic exploration of the network. It is crucial to understand that not all associated genes are displayed; only those selected as top DEP are shown. Therefore, a pathway or GO term may have additional associated genes that are not visualized here. The proximity of nodes and the thickness of lines are solely representational and do not indicate the strength of the relationship between the nodes.
The heatmap displays the association between specific genes (y-axis) and selected pathways or GO terms (x-axis). Each row represents a unique gene, and each column represents a pathway or GO term. A green cell indicates the presence of the gene within the associated pathway or GO term, while a white cell indicates its absence. Hierarchical clustering on both axes groups genes and pathways with similar patterns of association, thereby providing insight into the gene-pathway network structure within the dataset.
The two plots display pathways upregulated in two classes. The first plot shows pathways enriched in one class, while the second plot shows those enriched in the other. Pathways are ranked by Normalized Enrichment Score (NES), with p-values and adjusted p-values (padj) indicating statistical significance. Genes within each pathway are shown as ticks along the center line, sorted by the significance of their differential expression with direction; ticks above the line indicate positive correlation, and those below indicate negative correlation. The length of the ticks reflects the strength of correlation, while gaps indicate genes not included in the pathway.
Definitions and interpretations of table columns
At the bottom of the table, there is a "View" button that, when clicked, generates two visualizations. The first is a Pathway Inclusion and Overlap Heatmap, which visually displays the relationships between the selected pathways and their associated genes, highlighting both overlapping and unique gene-pathway associations. The second is a Gene Set Enrichment Analysis (GSEA) plot, showing the enrichment score curve for the selected pathways and indicating where the most significant gene contributions are along the ranked list. These graphs provide a detailed view of how the selected pathways interact with the gene expression data.
This plot depicts a ranked list of pathways based on their association with drug response. For each pathway, genes are represented as ticks along the center line, sorted according to their correlation with drug response; those above the line positively correlate, while those below indicate a negative correlation. The extent of the ticks reflects the magnitude of correlation. Areas without ticks indicate genes not included in the respective pathway. Pathways are ordered on the y-axis by their level of significance, with the most significant at the top.
This visualization displays the relationship between various biological pathways (rows) and individual genes (columns). Green bars indicate gene participation within the respective pathways, elucidating both overlapping and exclusive gene-pathway associations. Pathways are hierarchically clustered based on gene membership similarity. Note: Pathways with complete inclusion relationships may not cluster adjacently if other pathways with shared gene memberships influence the overall distance calculations. The clustering considers the presence or absence of all genes across the pathways, which may affect the proximity of even completely inclusive pathways within the dendrogram.
The GSEA plot demonstrates the enrichment of specific pathways based on a ranked list of genes and their differential expression between classes. The x-axis represents the rank of all genes in the dataset, while the y-axis shows the running enrichment score (ES) for the selected pathway. The green line indicates the cumulative ES, the peak of the ES curve indicates the point of maximum enrichment within the ranked gene list. The vertical black lines beneath the ES line represent genes that are part of the pathway, positioned according to the level and direction of differential expression. The red dashed line indicates the maximum positive ES for pathways upregulated in one class or the minimum negative ES for those upregulated in the other class, marking the points of strongest enrichment within the gene list.
This section presents a curated table of biological pathways that have been identified as significantly enriched based on gene expression data. The table outlines each pathway's source database, a detailed description, and their respective statistical significance from Gene Set Enrichment Analysis (GSEA) and Over Representation Analysis (ORA), including p-values and adjusted p-values (p-adj). Additionally, the table provides information on the direction of regulation for each pathway in different biological conditions, such as ones upregulated in a class identified by GSEA, with corresponding ORA statistics when available, or vice versa. Pathways where regulation is consistent across GSEA and ORA are noted with 'Identical' under 'Regulation Match,' ensuring users can easily discern pathways with coherent patterns of regulation. Users can select between one and twenty pathways to visualize individual pathway graphs with colored gene nodes for gene regulation in a class. Pathway overlapping and inclusion information is depicted in a heatmap. This interactive tool allows for an in-depth exploration of key pathways relevant to the study's phenotype or condition of interest.
Definitions and interpretations of table columns
At the bottom of the table, there is a "View" button that generates a pathway diagram similar to the uploaded image. For human RNAseq data, both Reactome and WikiPathways diagrams are produced, while for mouse RNAseq data, only WikiPathways diagrams are available. These visualizations illustrate the expression levels and regulatory status of genes within the selected pathways, using color coding to highlight differences between classes. This provides an intuitive understanding of how specific genes contribute to the biological processes under study across different conditions.
This pathway diagram illustrates the expression levels of genes in a class when compared to the other class. Genes upregulated are highlighted in varying shades of red, with color depth corresponding to the magnitude of log2 fold change (log2FC). Conversely, genes downregulated are indicated in shades of blue. A gene depicted without color signifies a low average expression or variation across samples, indicating minimal involvement in the class difference. For Reactome pathways, grey indicates genes that are not a part of this specific pathway but are included in a higher-level parental pathway. This color-coding provides a clear visual distinction of gene regulation within the pathway in relation to the response observed.
Protein-protein interaction network illustrating the intricate connections between various proteins within a biological system, in this case, the top N differentially expressed genes between the classes. Nodes represent individual proteins, with edges denoting physical or functional interactions. Clusters of nodes, or cliques, represent groups of proteins with tight interactions, highlighting functional modules within the network. The network topology highlights the complexity and interdependency of protein interactions, providing insights into molecular mechanisms and potential targets for therapeutic intervention. Each protein node is clickable to display details retrieved from the STRING database.
The stacked bar chart illustrates the stratification of differentially expressed genes (DEP) by their classes, delineated at a predefined p-value threshold. Fold Change (FC) values are plotted on the x-axis against the count of DEP on the y-axis. The bars are bifurcated into two segments for genes upregulated either class. The segmentation is determined by the FC value relative to unity, with bars representing the spectrum of FC values from 1.1 to 10. This visualization highlights the trend of decreasing gene counts with increasing FC magnitudes. A prominent golden arrow indicates the FC threshold applied during the analysis, signifying the minimal change for a gene to be considered significantly differentially expressed. The horizontal dashed line corresponds to the selected cutoff for the number of DEP to be included in the Over Representation Analysis (ORA). Should the actual DEP count fall below this line, an adjustment to the FC threshold is recommended to ensure a sufficient gene pool for ORA, otherwise, the extant DEP count will be utilized. Users can modify three critical parameters: FC cutoff, p-value cutoff, and the target number of DEP, to update the graph for guiding more tailored analyses.