Perhaps it was the frustration over the slow speed of progress in the identification of complex disease genes, or maybe the fact that we live in an era where Big Science has become routine, or even the rapid improvements and cost reductions in the facilitating genetic technologies. Whatever it was, someone woke up one morning and said â€œHow about solving seven genetic disorders at onceâ€?. The results of this seeming pipe dream reached fruition recently in the form of a titanic Nature paper and its gargantuan accompanying supplementary online data. Of relevance to this blog is the fact that bipolar disorder is among the list of diseases which also number coronary artery disease, hypertension, Crohnâ€™s disease, rheumatoid arthritis, type 1 and type 2 diabetes.
Genome-wide association (GWA) is the name of the procedure which has been used here â€“ not a new technique, as such, but a new scale with which to apply the familiar case-control association study approach that I have mentioned previously. Instead of the hundreds of cases and handful of markers tested in previous case-control association studies, GWA experiments use thousands of samples (for the bipolar study it was 2000 cases and 3000 controls) and over half a million markers, the idea being that it is both an objective and also statistically rigorous screen. The grand scale of this approach cuts both ways: the greater sensitivity and coverage is accompanied by the greater risk of false positives. Such large numbers mean that, by chance alone, particular markers will appear to be associated but in reality are not. To this end, the statisticians have been busy trying to figure out the thresholds which have to be achieved to separate the real from the spurious. Here, the statistics are pretty much reduced to rank ordering of the p-values associated with each marker and some comparison between disease.
The Wellcome Trust coughed up Â£8 million to fund this study but it is not the first to reach publication.
The Malhotra group published a schizophrenia WGA earlier this year (admittedly with a rather small sample size) which identified one candidate gene, CSF2RA (colony stimulating factor, receptor 2 alpha) on the X/Y pseudoautosomal region.
More relevant to the Nature paper is last monthâ€™s paper from the McMahon et al. group detailing the results of their bipolar disorder GWA.
In that paper 461 cases and 563 controls were tested from the US population and 772 cases and 876 controls from the German population but using a pooling protocol rather than an individual genotyping approach. Positive findings from this first stage (1887/550,000 SNP markers) had fulfilled criteria such as being reasonably common frequency, of reasonable strength of effect and located near a gene. These are somewhat arbitrary, especially the last one (regulatory mutations have been found up to 2000000 base pairs away from genes), but are a necessary start in terms of cost feasibility at the small-lab scale. The positive findings were replicated through individual genotyping of a large set of German samples and the surviving SNP markers identified.
Before we look at the results I have to register my concerns over the use of entirely family-derived samples in the US population group. Not only do I believe such samples are inappropriate for use in a protocol designed to find low penetrance general risk factors (see a previous post), but I also think that the fact that the German sample was only 13% family-derived meant that it was not an ideal comparative study group.
However, having said this, it is up to me to explain how the experiment came up with positive findings. 88/1887 US positive SNPs were replicated between the two geographic populations and a proportion of a subset of these also survived being genotyped individually too. My current thoughts are that perhaps the success of these studies derives just as much from the power of the controls (not subject to familial influences but necessarily reduced in population risk alleles) .
80 genes were identified and, of these, Diacylglycerol Kinase Eta (DGKH) and SORCS2 seem to contain the most positive SNPs each and reasonable odds ratio values (a measure of their strength of effect). The former of these genes can be connected to the lithium-sensitive phosphatidyl inositol pathway (thus providing a potential link to a commonly used treatment regime) whereas the latter is much less well characterised.
So how does this compare with the mother of all GWAs from Wellcome? Well, there is no clear evidence of large-scale overlap between the McMahon and Wellcome results although, to be fair, the papers were published so close together that no comparisons were actually formally carried out. Even though DGKH and SORCS2 are absent from the top-ranking gene list there are some very interesting points of overlap (see below). But before that, some bad news: of the seven diseases tested, bipolar disorder was, on the face of it, among the least productive. For many of the others, previously suspected genes were nicely confirmed and those fields now also have a set of novel genes to analyse â€“ some intriguingly spanning disease boundaries. Bipolar disease failed to have any such big-hitting genes identified. All we are left with are some moderately associated genes. Before I go into the properties of those genes, we must address the possible reasons for the lack of dramatic success. Again, I am not sure of the sample selection criteria for the study in terms of familial versus sporadic cases but, perhaps more importantly, the other diseases studied all have bona fide quantifiable diagnostic criteria â€“we just donâ€™t have this in psychiatry where itâ€™s not possible to take a reading like blood pressure, lung capacity, blood sugar levels etc. So I think there’s clearly an issue of non-homogeneity of the bipolar classification. This is a very hard problem to solve: the irony is that perhaps the only biomarkers for psychiatric illness will be the genetic markers that we have yet to clarify. Despite this rather circular problem, Iâ€™m very much against the notion, proposed in some quarters, that psychiatric disorders have some special genetic qualities that render them immune to such genetic approaches.
Iâ€™ve had a look at the markers which show strong/moderate and moderate association with bipolar disorder in the Wellcome study. The paper did very little in the way of overspeculation on the function of the identified genes. Dynactin 5 was mentioned because it interacts with one of our lab’s key genes, DISC1. KCNC2 which encodes a potassium channel, GABRB1 and GRM7 both encoding neurotransmitter receptors, and SYN3, a synaptic protein were also briefly discussed. This was a very general paper and I guess there wasn’t the space to expand beyond these. For your amusement, Iâ€™ve annotated the top-most genes in a pretty haphazard way. For those in the field, it might prove useful to be able to quickly scan down this list for any of interest.
The list starts off with the chromosome number, then the SNP marker i.d. (rs number) and finally the rough description of genes in the region. If you type the SNP i.d. or gene name into the human genome browser dialogue box and click â€˜submitâ€™, youâ€™ll be able to look at the genomic locale and link out to other information on the gene (especially OMIM for neat biographies of the genes). See how many of the associations are nowhere near known genes (or occasionally near ‘ests’ which are uncharacterised possible genes).
Strong or moderate associations
1 rs2989476 Nothing near
2 rs4027132 LIPIN1 and est
2 rs7570682 est
2 rs1375144 DPP10, dipeptidyl peptidase 10 isoform long
2 rs11888446 Nothing near
2 rs4673905* DNA polymerase-transactivated protein 6 (DNAPTP6) mRNA
2 rs2953145 ANKMY1, DUSP28, MPEPL1, CAPN10, GPR35
3 rs4276227 CMTM8, CKLF-like MARVEL transmembrane domain containingâ€¦chemokine like
3 rs9834970 Serine/threonine-protein kinase DCAMKL3 (EC 184.108.40.206) (Doublecortin- like and CAM kinase-like 3) and KIAA0342 protein (Fragment).
3 rs683395 LAMP3, lysosomal-associated membrane protein 3
6 rs6458307 TBCC (beta-tubulin cofactor C|), KIAA0240
6 rs6901299 TRDN, triadin1 calcium receptor interacting
7 rs1405318 KIAA0960 protein
8 rs2609653 Nothing near
9 rs10982256 DFNB31= CASK-interacting protein CIP98 isoform 1, =whirler mouse mutant
14 rs10134944 SLC35F4, solute carrier family 35 member F4,
14 rs11622475 TDRD9, tudor domain containing 9,
16 rs420259 PALB2, (partner and localizer of BRCA2) and DCTN5 (dynactin5)
16 rs1344484 quite a way from CHD9, chromodomain helicase DNA binding protein 9
20 rs3761218 CDC25B, cell division cycle 25B isoform 2,
X rs975687 CAPN6, calpain6
Moderate strength associations
1 rs10888879 PARS2 (prolyl-tRNA synthetase), ttc22 (tetratricopeptide repeat domain 22)
1 rs10889189 Nothing near
1 rs4916031 AK3L1 (adenylate kinase 3-like 1 isoform 7)
1 rs6691577 LRRC1 (leucine rich repeat containing 7)
1 rs1776905 Nothing near
1 rs10779279 ESRRG (estrogen-related receptor gamma isoform 2)
1 rs12070036 zinc finger protein 678
2 rs2049674 TMEM17 quite a way away
2 rs17029753 Nothing near
2 rs13386690 DPP10, dipeptidyl peptidase 10 isoform long
2 rs4407218 not in database
2 rs4673905* DNA polymerase-transactivated protein 6 (DNAPTP6) mRNA
3 rs1485171 GRM7, metab glut receptor
3 rs6762678 ZNF659, zinc finger 659
3 rs711715 Nothing near
3 rs4858594 THRB, thyroid hormone receptor beta
3 rs33460 CCK1(cholecystokinin preproprotein), lyzl4(lysozyme-like 4)
3 rs13074575 PTPRG1, protein tyrosine phosphatase receptor type G
4 rs7680321 GABRB1, gaba receptor
4 rs1996755 DKFZp586K0717
5 rs5009031 Nothing near
5 rs1428006 Nothing near
5 rs17701996 FBL3B/FBXL21( F-box and leucine-rich repeat protein 21â€¦ubiquitin ligase), LECT2 (leukocyte cell-derived chemotaxin 2 precursor) cluster
5 rs999580 Nothing near
6 rs365237 NHLRC1 (malin ubiquitin ligase), tpmt1(thiopurine S-methyltransferase), AOF1 (amine oxidase (flavin containing) domain 1), DEK oncogene
6 rs6926599 ests
6 rs17739564 TRDN, triadin1 calcium receptor interacting
6 rs6906574 MOXD1, monooxygenase DBH-like 1 isoform 2, senescence protein
6 rs2763025 SYNE1, = synaptic nuclear envelope protein 1=nesprin 1 isoform longer,
7 rs2286492 FAM126A = down-regulated by Ctnnb1 = myelination gene involved in congenital cataract
8 rs2875734 Nothing near
8 rs16919670 Nothing near
8 rs9643449 Nothing near
8 rs10097578 ZNF706 quite a way away
8 rs1993980 TRAP25, TRAP/Mediator complex component TRAP25, thyroid hormone receptor-associated protein 6
9 rs7030123 Nothing near
9 rs1573257 PAX5, paired box 5
9 rs10993698 SYK, spleen tyrosine kinase
9 rs4978927 SVEP1, sushi, von Willebrand factor type A, EGF and pentraxin domain containing 1
9 rs10982246 DFNB31= CASK-interacting protein CIP98 isoform 1, =whirler mouse mutant
10 rs788261 Nothing near
10 rs10826258 Nothing near
10 rs1866437 similar type of clusters as
10 rs7896131 HHEX1 AND EXOC1 quite a way away
10 rs2096285 PTPRE, protein tyrosine phosphatase receptor type E
11 rs858719 ZBTB44, BTB (POZ) domain containing 15
12 rs7136898 SOX5, SRY (sex determining region Y)-box 5 isoform b
12 rs17309820 Nothing near
13 rs4770394 Nothing near
13 rs2806922 KIAA0853=znf protin?
13 rs12584910 Nothing near
14 rs221703 DHRS2=dehydrogenase/reductase (SDR family) member 2
14 rs17108400 FLJ43028 fis
14 rs17113911 Nothing near
14 rs10146912 KLHDC1=kelch domain containing 1
14 rs3784005 FLVCR2=feline leukemia virus subgroup C cellular
14 rs10438244 FLJ25257 fis,
15 rs7163502 TBC1D21=TBC1 domain family member 21
16 rs1420239 Nothing near
16 rs4567706 Nothing near
16 rs12149894 Nothing near
16 rs7184080 Nothing near
16 rs10220973 FLJ43761 fis
17 rs203466 AKAP10. A-kinase anchor protein 10 precursor
18 rs7243929 Nothing near
18 rs1893146 Nothing near
19 rs12979795 ZNF490
19 rs7408169 not found in database
19 rs2061332 ZNF224/ZNF225
19 rs7248493 ZNF274
20 rs4815603 CENPB, CDC25B
20 rs6031991 KCNS1=potassium voltage-gated channel
21 rs2833193 Nothing near
22 rs11089599 SYN3=synapsin III isoform IIIc
22 rs16997510 CSF2RB=colony stimulating factor 2 receptor beta
There are also a few interesting points about the list which were not properly covered by the paper: a number of the SNPs pick up the same four genes which are:
DPP10, dipeptidyl peptidase 10
TRDN, triadin1 calcium receptor interacting
DFNB31= CASK-interacting protein CIP98
CDC25B, cell division cycle 25B isoform 2
Even more exciting is that one of these genes, DFNB31, together with the GRM7 gene are to found in both GWA bipolar studies. The DFNB31 gene is especially provocative because it has been implicated in deafness/blindness previously. Hard to see how that relates to bipolar disorder until you realise that a skin condition and a form of deafness can be caused by the same gene AND when you read abstracts like this.
You heard it here first (actually, SRF has a broadly similar perspective) - perhaps these genes will be the Next Big Things in bipolar disorder genetics, surely a clear justification for Big Science.