Our tool will generate a graph using user-specified gene sets, find the maximal k-partite cliques within this graph to discover relationships between genes, gene sets, and ontology terms, and visualize this information in a chord diagram
- K-partite graph: a graph whose vertices may be partitioned into k disjoint and independent sets
- The most common k-partite graphs are bipartite graphs and tripartite graphs
- K-partite clique: a k-partite graph containing at least one vertex from each partite set, and all possible inter-partite edges
- Maximum clique: the clique of largest size in the graph
- Maximal clique: a clique that is not properly contained in any other clique
Bipartite vs. Tripartite cliques
The intersection of maximal bipartite cliques does not yield the same results are partitioning one partite set of a bipartite graph
The graph on the right shows a maximal biclique in partite sets A and B. The red vertices indicate the vertices of the maximal biclique. The graph on the left shows a maximal triclique in the same graph. The red vertices represent a maximal triclique. Note that these vertices are NOT a maximal biclique.
This demonstrates the importance of tricliques in finding meaningful relationships between genesets. After imputing all possible edges between the newly partitioned cliques, the maximal triclique gives the relationship between the genesets.
The first step in the analysis of functional genomics data as we have described above is the generation of a k-partite graph. We have created algorithms that generate graphs from gene set data based on exact gene overlap and Jaccard threshold values. The existence of vertices and edges in the maximal triclique are determined by one of these methods, which is chosen by the user at the Analyze Genesets page.
Note that in this example, the entire collection of gene sets from A and B are shown for clarity. The Triclique Viewer tool will not display these extraneous gene sets.
Exact Gene Overlap
When the user selects the option for Exact Gene Overlap, an edge in the tripartite clique is drawn if there exists (at least one) gene g in the gene sets in A and in B.
Jaccard Threshold Values
When the user selects the option for Jacquard Threshold Values, an edge in the tripartite clique is drawn if two gene sets in A and B have a Jacquard similarity coefficient greater than some user-specified threshold x, i.e. J(A,B) > x
Both the Exact Gene Overlap option, as well as the Jaccard Threshold option display the results to the user as a chord diagram. The interpretation of the chord diagram is slightly different for the two options, and is explained in the sections below. Common to both types of chord diagrams are:
- Each partition is shown as a different color in the chord diagram
- Users may click on the partitions or the chords themselves to be linked to additional information
Exact Gene Overlap
The chord diagram for the Exact Gene Overlap option displays the relationship between genes and gene sets.
Each partition except one represents a gene set, and the last partition represents a set of one or more genes which make up the intersection between the displayed gene sets. Geneweaver takes the projects chosen by the user, treats the relevant gene sets within each project as partitions, and creates another partition containing the intersection genes themselves.
In the example below, TricliqueTest1 is a gene set in the first project, TricliqueTest2 is a gene set in the second project, and the genes FOXP2 and TANK are the intersection between the TricliqueTest1 and TricliqueTest2. Note that the genes and gene sets displayed do not represent the entirety of the genes or gene sets within the chosen projects; rather, the chord diagram displays only those elements contained within the triclique.
The width of each chord represents the relative frequency of the gene in the gene set.
Jaccard Threshold Results
The chord diagram for the Jaccard Threshold option displays the relationship between gene sets within projects.
Each partition represents a project, containing one or more gene sets. Geneweaver takes the projects chosen by the user, finds the relevant gene sets with a Jacccard similarity above the user-given threshold, and displays this information as a chord diagram.
In the example below, three projects (red, blue, and yellow) contain the gene sets J1, J2, and J3, J2, J4, and J5, & J1 and J4, respectively. The gene sets with a Jaccard similarity above the threshold show a chord between them; otherwise, no chords are shown (as in the case of gene set J5).
The width of each chord represents the Jaccard similarity score between the two gene sets.
Investigating a Link between Depression and Alcoholism (Exact Gene Overlap Option)
Let's say we wanted to investigate a possible link between depression and alcoholism in humans. Using Geneweaver's Search functionality, we can find genesets related to depression and alcoholism, and add them to projects, as below:
Shown below are the results when we run the Triclique Viewer tool on the above projects using the Exact Gene Overlap option.
With the Exact Gene Overlap option, we are investigating the relationship between genes and gene sets. The blue project partition, which represents the project "Depression," contained a gene set relating to SNPs associated with Alcoholism in Women (Female Human GWAS Alcohol). The yellow project partition, which represents the project "Anxiety," contained a gene set also relating to alcoholism SNPs (EtOH SNPs). These two gene sets are different (24 vs. 9 genes, respectively), but the genes common to both gene sets (i.e. the exact gene overlap) are shown in the red partition. The width of each chord represents the relative frequency of the gene in the gene set.
Investigating Links between Depression, Anxiety, and Alcoholism (Jaccard Threshold Option)
Let's say we wanted to investigate any possible links between depression, anxiety, and alcoholism in humans, but we weren't as concerned about the existence of an exact gene overlap. Using Geneweaver's Search functionality, we can find genesets related to depression, anxiety, and alcoholism, and add them to projects, as below.
Shown below are the results when we run the Triclique Viewer tool on the above projects using the Jaccard Threshold option with a threshold value of 0.1. Note that different results may be obtained if a different threshold value is used.
The red project partition, which represents the project "Anxiety," contained a gene set relating to longevity that had a Jaccard similarity score with the other gene sets shown that was above the 0.1 threshold. These other gene sets came from the blue project partition ("Alcoholism Studies"), and the yellow project partition ("Depression Studies"). Again, the width of the chords corresponds to the Jaccard similarity score of the two gene sets connected. For instance, the Jaccard similarity score between the NESDA Sample gene set and the Longevity gene set was higher than that between the NESDA Sample gene set and the Long-term Depression gene set, as indicated by a thicker chord width.
In contrast with the Exact Gene Overlap option, a chord between two gene sets does not necessarily represent the presence of a common gene, but rather that the two gene sets are similar enough (based on the given Jaccard threshold).
How to use Triclique Viewer
The Triclique Viewer tool is accessed from the Analyze Genesets page.
Exact Gene Overlap Option
To use the Triclique Viewer tool with the Exact Gene Overlap option, select two projects from your projects list, and ensure that the Exact Gene Overlap option is selected, as shown below.
If you do not select exactly two projects, an error message will be displayed and you will be redirected to the Analyze Genesets page. Similarly, if the genesets for the projects selected had no intersection, a message will be displayed and you will be redirected to the Analyze Genesets page.
Jaccard Threshold Option
To use the Triclique Viewer tool with the Jaccard Overlap option, select three projects from your projects list, and ensure that the Jaccard Overlap option is selected, as shown below. Also choose a threshold value from the drop-down menu of threshold values (defaults to 0.0).
If you do not select exactly three projects, an error message will be displayed and you will be directed to the Analyze Genesets page. Similarly, if the genesets for the projects selected had no maximal triclique, a message will be displayed and you will be redirected to the Analyze Genesets page.
The Triclique Viewer tool uses an algorithm that runs in O(3n/3) time This means that it is not possible to complete execution on projects and genesets that are too large or too numerous in a timely manner. If the user selects genesets that are too large, or if the user selects too many genesets, the Triclique Viewer tool will not run, the error message above will be displayed, and you will be redirected to the Analyze Genesets page.