Clustering

From GeneWeaver Wiki
Jump to: navigation, search

Ambox content.pngThis tool is unfinished, and its functionality is subject to change.

Contents

Clustering Tool (Beta 1.0)

Why Use the Clustering Tool

Clustering is one of the most powerful tools in bioinformatics, where classifications are too strict for data distinction, clustering helps give the user an evaluation that is not so distinct. The strong point of clustering is that its results depend on its wide variety of algorithms for production of said clusters.

Understanding the Clustering Tool

Clustering Example 1.png

Figure 1: An example of a Cluster Graph and its produced tree. The tree produced is based on Jaccard Dissimilarity, the lower the dissimilarity, the closer they'll be on the tree.
Jaccard Dissimilarity is also known as Jaccard Distance, or rather the opposite of Jaccard Similarity (1-Similarity).

Using the Clustering Tool

Access the Clustering Tool through the My Projects tab under the Analyze Genesets option.
From there, check the projects (or individual GeneSets if you use the green '+' sign) that you wish to see in the Cluster, use the Options that you want set before you begin.

Options

Homology

If homology is set(included), the tree produced by the cluster will reflect a separation of species. Otherwise (excluded) it will ignore speciation.

Method

Method illustrates the algorithm that the user can specify, this option is very technical, and should be reserved for users who understand the implications of each algorithm.

Ward

Ward uses Ward's Method to generate the clusters.

Single

Single uses Single-linkage Clustering or Nearest-Neighbor Clustering to generate the clusters.

Centroid

Centroid uses a k-medoids clustering to generate the clusters.

McQuitty

McQuitty uses McQuitty's method, which is a hierarchical clustering method.

Average

Average refers to average-linkage clustering, or UPGMA, where unlike single, it takes the average of all opposing values to make the distance of one value.

Complete

Complete refers to complete-linkage clustering, where all data points initially start as clusters and fuse until a certain point.

Median

Median refers to k-medians clustering, which is an alternate partition-based clustering algorithm from k-medoids.
Personal tools
Namespaces

Variants
Actions
Navigation
Getting Started
Tools
Other
Toolbox