Clustering Tool (Beta 1.0)
Why Use the Clustering Tool
Clustering is one of the most powerful tools in bioinformatics, where classifications are too strict for data distinction, clustering helps give the user an evaluation that is not so distinct. The strong point of clustering is that its results depend on its wide variety of algorithms for production of said clusters.
Understanding the Clustering Tool
- Figure 1: An example of a Cluster Graph and its produced tree. The tree produced is based on Jaccard Dissimilarity, the lower the dissimilarity, the closer they'll be on the tree.
- Jaccard Dissimilarity is also known as Jaccard Distance, or rather the opposite of Jaccard Similarity (1-Similarity).
Using the Clustering Tool
- Access the Clustering Tool through the My Projects tab under the Analyze Genesets option.
- From there, check the projects (or individual GeneSets if you use the green '+' sign) that you wish to see in the Cluster, use the Options that you want set before you begin.
- If homology is set(included), the tree produced by the cluster will reflect a separation of species. Otherwise (excluded) it will ignore speciation.
- Method illustrates the algorithm that the user can specify, this option is very technical, and should be reserved for users who understand the implications of each algorithm.
- Ward uses Ward's Method to generate the clusters.
- Single uses Single-linkage Clustering or Nearest-Neighbor Clustering to generate the clusters.
- Centroid uses a k-medoids clustering to generate the clusters.
- McQuitty uses McQuitty's method, which is a hierarchical clustering method.
- Average refers to average-linkage clustering, or UPGMA, where unlike single, it takes the average of all opposing values to make the distance of one value.
- Complete refers to complete-linkage clustering, where all data points initially start as clusters and fuse until a certain point.
- Median refers to k-medians clustering, which is an alternate partition-based clustering algorithm from k-medoids.