What is Suspects?
The aim of Suspects is to efficiently automate the first steps of the candidate gene approach.
In more depth - Suspects is a system for matching Gene Ontology terms, interpro domains and gene expression data built on top of the PROSPECTR candidate prioritization system. PROSPECTR uses sequence features to rank genes in order of their likelihood of involvement in disease; with Suspects you can drill down further to rank genes involved in specific complex traits and syndromes.
How do I use it?
Go to the search page. Enter the markers or coordinates flanking a region of interest. In the next box, enter either the name of the disease phenotype that you are interested in (i.e. hypertension, arthritis, schizophrenia...) or a list of genes that you believe have the same phenotypic effect as the gene that you are looking for.
Elsewhere on the search page you can choose to search the region around a marker or a gene or to search genes that fulfil certain criteria (i.e. all genes with dopamine receptor domains or all genes with a reference to "alcoholism" in their associated literature.)
A Flash based quickstart guide which gives you a brief tour of the interface is available.
How does it work?
Suspects operates on the assumption that genes involved in a complex trait will belong to similar pathways and should thus be more likely to share domains, annotation and patterns of expression.
The server takes two inputs - firstly, the coordinates of the genomic region that you are interested in. You can specify this using markers, bands, chromosomal coordinates or genes. The second input is a list of genes involved in the same complex disease as the one you are interested in (as a shortcut, you may simply enter the name of the disease; Suspects will find appropriate genes for you from OMIM, the HGMD and GAD). This list is known as the "match set".
Suspects retrieves a list of genes in the region requested and scores them in order of likelihood of involvement in disease by looking at their sequence features.
For each gene Suspects then looks for Gene Ontology terms that are semantically similar at a significant level to terms associated with genes in the match set. Each gene is scored according to how well its GO annotation compares to the annotation found in the match set. We use the information content of the terms in question to determine how big or small a score to give for each match.
Suspects then looks for Interpro domains shared with the match set. The score given to each gene depends on how signficant the match is, based on how often the domain in question is found in the genome.
Finally, Suspects examines the gene expression profile and compares it to the profiles from the match set using Spearman's rho rank-order correlation. Scores depend on how well correlated any matching profiles are.
A weighted average is then calculated and a ranked list of genes is displayed. Genes near the top of the list are - in theory - better candidates than those further down.
Disease genes are far better annotated than other genes and you should bear this in mind when interpreting results. Different types of match are weighed differently; the weights assigned are arbitrary.
The server will sometimes time-out when dealing with complex queries. If this happens to you, try restricting the match set.
You could try out Endeavour at the Katholieke Universiteit Leuven.
Please contact us for up-to-date citation information. If you only rank on sequence features then you may cite
Speeding Disease Gene Discovery by
Sequence Based Candidate Prioritization
Euan J Adie, Richard R Adams, Kathryn L Evans, David J Porteous and Ben S Pickard
BMC Bioinformatics. 2005 Mar 14;6(1):55 [link to free fulltext]