From GeneWeaver Wiki
Jump to: navigation, search

If you have a lot of data you would like to enter into GeneWeaver, please contact us so we can work with you to make it faster or easier.

Single Gene Sets

  • The first step is to create a plain text file with the list of genes and whatever measurement was used to associate them (correlation, p-value, etc), see below. The gene names can be any of the following identifiers (as of Nov 2008): Entrez Gene IDs, Ensembl Gene, Protein, and Transcript IDs, Unigene IDs, the Gene Symbols, the species-specific genome IDs (MGI, RGD, HGNC, ZFIN, or FlyBase), and any of the microarray probesets contained in the listing.
  • When uploading the set, you need to provide the GeneSet name, a short label for figures, and a more detailed description. See curation standards for guidelines.
  • You can use the "Accessibility" option to decide who can access your data set. If you do not want to share the data with anyone, click Private. If you only want to share the data with people working on your project(s), select group, and then select the group from the subsequent list (hold ctrl and click to select multiple groups to share with). See below for more information on groups.
  • If the Gene Set is from a published paper, please enter the pubmed id into the box and click "Retrieve Info from PM" which will automatically get all the abstract information for future reference.
  • Finally, you can provide more information about the ontological annotations that match the data set using the last box in the page, before hitting the submit button to upload.

Prepare Your Data For Upload

  • Columnar format: Gene, Value
First, make sure that your file has a header and only 2 columns of data: the gene identifier, followed by the value or score for that gene. For example:
Ensembl ID Correlation
Gene1 0.25
Gene2 0.90
If you don't have any scores, simply put a 1 next to all of the gene ids. If you have more than 2 columns, you will have to delete or combine the extras before uploading. For more information on supported Gene Identifiers or score types, see here.
  • Tab-separated plain text
Save your file as tab-separated plain text. This option should be available in most software. For example, in Excel 2007, use Save As... and in the dialog that pops up, next to Save as type... pick Text (Tab delimited).

Batch Gene Set Upload

  • Select 'Batch Upload' from 'Manage Gene Sets' to upload your Batch file. The file format, itself, needs to be in the following format:
Metadata (header) format is as follows:
           # comments start with '#'
           : GeneSet Abbreviation starts with ':' (required)
           = GeneSet Name starts with '=' (required)
           + GeneSet Description (required)
           +   starts with '+' and can span multiple lines
           P PubmedID (optional)
           A Public or Private (opt, default private)
           ! score type starts with '!' (required)
           ! Binary
           ! P-Value < 0.05 
           ! Q-Value < 0.05 
           ! 0.40 < Correlation < 0.90 
           ! 6.0 < Effect < 22.50 
           @ Species Scientific Name (required) (List updated daily)
           @ Mus musculus
           @ Homo sapiens
           @ Rattus norvegicus
           @ Danio rerio
           @ Drosophila melanogaster
           @ Macaca mulatta
           @ Caenorhabditis elegans
           @ Saccharomyces cerevisiae
           % Gene ID Type (required) (List updated daily)
           % Entrez
           % Ensembl Gene
           % Ensembl Protein
           % Ensembl Transcript
           % Unigene
           % Gene Symbol
           % Unannotated
           % MGI
           % HGNC
           % RGD
           % ZFIN
           % FlyBase
           % Wormbase
           % SGD
           % miRBase
           % microarray Affymetrix C. elegans Genome Array
           % microarray Affymetrix Drosophila Genome 2.0
           % microarray Affymetrix HT Human Genome U133A
           % microarray Affymetrix Human 35K Set
           % microarray Affymetrix Human 35K SubA
           % microarray Affymetrix Human 35K SubB
           % microarray Affymetrix Human 35K SubC
           % microarray Affymetrix Human 35K SubD
           % microarray Affymetrix Human Genome U133A
           % microarray Affymetrix Human Genome U133A 2.0
           % microarray Affymetrix Human Genome U133B
           % microarray Affymetrix Human Genome U133 Plus 2.0
           % microarray Affymetrix Human Genome U133 Set
           % microarray Affymetrix Human HG-Focus Target
           % microarray Affymetrix Mouse Exon 1.0 ST
           % microarray Affymetrix Mouse Expression 430A
           % microarray Affymetrix Mouse Expression 430B
           % microarray Affymetrix Mouse Expression 430 Set
           % microarray Affymetrix Mouse Gene 1.0 ST Array
           % microarray Affymetrix Mouse Genome 430 2.0
           % microarray Affymetrix Mouse Genome 430A 2.0
           % microarray Affymetrix Murine 11K Set
           % microarray Affymetrix Murine 11K SubA
           % microarray Affymetrix Murine 11K SubB
           % microarray Affymetrix Murine Genome U74A
           % microarray Affymetrix Murine Genome U74B
           % microarray Affymetrix Murine Genome U74C
           % microarray Affymetrix Murine Genome U74 Set
           % microarray Affymetrix Murine Genome U74 Version 2
           % microarray Affymetrix Murine Genome U74 Version 2
           % microarray Affymetrix Murine Genome U74 Version 2
           % microarray Affymetrix Murine Genome U74 Version 2 Set
           % microarray Affymetrix Rat Exon 1.0 ST
           % microarray Affymetrix Rat Expression 230A
           % microarray Affymetrix Rat Expression 230B
           % microarray Affymetrix Rat Expression 230 Set
           % microarray Affymetrix Rat Genome 230 2.0
           % microarray Affymetrix Rhesus Macaque Genome
           % microarray Affymetrix Yeast Genome 2.0 Array
           % microarray Affymetrix Yeast Genome S98 Array
           % microarray Affymetrix Zebrafish Genome
           % microarray Agilent Mouse G4121A (Toxicogenomics)
           % microarray Agilent Mouse Whole Genome G4122F
           % microarray Illumina Human-6 v2.0
           % microarray Illumina MouseRef-8 v2.0
           % microarray Illumina MouseWG-6 v1.1
           % microarray Illumina MouseWG-6 v2.0
           After the metadata, leave a blank line. Then list all data points in
           the following format: gene id <tab> data value <enter>.  
           At the end of the data points leave another blank line.  Then you may start 
           another metadata section and keep repeating for all datasets in the same file.
           The 'P', 'A', '!', '@', and '%' sections may be ommitted in later sections if
           they do not differ from the first, they will default to the last seen value.
           The ':', '=', and '+' sections are required for all datasets.

An example file can be found here.

Personal tools

Getting Started