NewsDecember / January 2005
- Changed the way that training sets are assembled. Consolidated features and added new analyses (mainly more species). Added GO, Interpro and expression data from Ensembl.
- You can now search for matches in functional annotation by using the SUSPECTS server. You may specify a list of genes (by symbol or identifier) or a disorder already in the database. Matches are made on the basis of significant semantic similarity. Genes are still ranked using their sequence features but are then given a bonus for sharing interpro domains, GO terms or similar expression profiles with a given set of "target" genes.
- Retrained the classifier using data from Ensembl Mart v27.
What is Prospectr?
It can be shown that genes implicated in disease share certain patterns of sequence based features like larger gene lengths and broader conservation through evolution.
Prospectr (PRiOrization by Sequence & Phylogenetic Extent of CandidaTe Regions) is an alternating decision tree which has been trained to differentiate between genes likely to be involved in disease and genes unlikely to be involved in disease. By using sequence-based features like gene length, protein length and the percent identity of homologs in other species as input a classification can be obtained for a gene of interest.
The alternating decision trees outputs a classification ("likely to be involved in disease" or "unlikely to be involved in disease"), a score (which is a measure of confidence in the classification) and a breakdown of which factors contributed most to that score.
Given this score we can also roughly estimate how much more or less likely it is that a particular gene is involved in human hereditary disease.
What can it be used for?
Prospectr can be used to enrich lists of genes found at a suspected disease locus. Given a list of genes, Prospectr will return a ranked list ordered by the likelihood of involvement in disease.
Tests on an independent data set of genes taken from the Human Gene Mutation Database suggest that Prospectr will, on average, enrich a list of ~ 200 genes two-fold 74% of the time, five-fold 33% of the time and twenty-fold 8% of the time. 95% of the time the list was enriched one and a half fold - that is to say that the target gene was in the top three-quarters of the ranked list.
How can I use it?
To search a particular locus, use the search page. To download a flat file containing gene ids and scores, go to the download section, where you can also download standalone binaries and perl scripts.
More information & citing ProspectrFor more information, refer to
BMC Bioinformatics 2005, 6:55 doi:10.1186/1471-2105-6-55
(Free full-text available)