The challenge
I’m open to comments from those that think the following isn’t correct…..
So you have a lab and DNA samples donated with consent from individuals diagnosed with a psychiatric illness and a matched group without illness. How would you go about telling us something about the genetics of psychiatric illness? You’d think the strategy would be already well worked out…a simple recipe of experimentation and analysis which would give you the answer.
To borrow from W. Somerset Maugham:
“There are three rules for cloning complex disease genes. Unfortunately, no one knows what they are.”
The problem is circular…until we have a few of these genes in the bag we won’t know the best way to get them. Here, ‘in the bag’ means proper mutations, not inferred involvement.
In this post, I will tell you what I think you should NOT do (the horse-flogging bit) and some suggestions on what you should do (the horses for courses bit) given the nature of your starting materials.
Don’t try this at home (or in the lab)
Genome-wide linkage studies….or more precisely, multiple family genome-wide linkage studies. In essence, linkage looks for regions of chromosomes which seem to always be found in family members who have the particular condition being studied. The implication is that these regions contain the faulty gene. The snag comes when you go beyond looking at a single family and instead ask what regions of chromosomes are linked with the disease in all families, or most…or even a detectable proportion. It hasn’t and doesn’t work when you go beyond the single family. An Aesop fable may help you see why (although it should be blindingly obvious).
The King of Clonia loved his orchard and the bountiful fruit it produced each year. It covered a sizeable area of his palace grounds and consisted of a multitude of fruiting tree species collected and nurtured from all over his Kingdom. One day, in a fit of ill-advised enthusiasm, he issued an edict to his court alchemists.
“Identify the purest essence of each individual fruit so that I might bottle them all as a gift to the Queen”.
The Alchemists debated over this task and finally came up with this solution. They would collect one example of each fruit, place it in a single barrel and grind them all up. Then they would extract, filter and fractionate the essences in one glorious process.
“That’s crazy”, cried one dissenting Alchemist, “you are just making the problem more difficult…how will you tell orange essence from apple or from plum?”.
“Fool!”, they replied witheringly, “we have the power to detect all the essences simultaneously…the court arithmetician has decreed it so”.
They failed of course, but as is the wont of those convinced by statistical models, they decided that scaling up was the answer.
“More fruit for the barrel!!” went out the command. “Still more fruit!!”, as they failed again.
The moral of the story is made apparent when the ‘fool’ took the fruit of a single, large tree and extracted the essence as required. Rather than praise his efforts, the other Alchemists accused him of plucking low-hanging fruit and producing an essence that was not relevant to the orchard as a whole.
“Your essence looks nothing like ours…and doesn’t even look like that you got from looking at another big tree”, they said.
“Exactly! That’s the point! Isn’t it good!”, he exclaimed excitedly.
“Actually, we think that means you have made a mistake. Or the problem is intractable. Or that fruit beetles must be invoked as the guardians of the essence and must be factored into future extractions”, the Alchemists said without a hint of irony.
The King placed all of them in a specially commissioned barrel and left them to stew in their own juices.
The end.
Are families bad? A graphical answer
No! They are good. But you have to know how to use them correctly to get any meaningful genetic information from them. It all comes down to my favourite phrase, ‘genetic architecture’. How many mutations are there for a disease in the population, how common are they in the population, how strong is each one’s effect and do they run in families or are they ’sporadic’? The graphs below are my attempt to illustrate a genetic model that tries to explain that all of these questions are, in fact, just flip-sides of the same coin….and what is more, this genetic model can tell us what analysis techniques we should use when given a particular DNA sample set.

This first graph shows that the amount of each mutation in the population (its frequency) is directly related to how ’strong’ its effect is (names like Odds Ratios are just genetics terms for measuring mutation effect strength). This is the critical concept which helps explain the rest. Common mutations have weak effects and rare mutations often have strong effects….and every combination in between - as shown by the wide yellow band.

The pretty obvious fact (you would think) is that if a mutation has a strong effect then it will be apparent in most individuals it is passed on to. Hence, it will be recorded as evidence of a family history. In terms of ascertainment bias (shiny things grab our attention), it will highlight a family, catching the eye of a physician collecting DNA samples for a gene-hunting exercise. Weak effect mutations will probably have to work together to push an individual down the route of illness - but that’s OK as they are generally common and likely to be inherited together by chance in unlucky people (note the use of inherited here…it is not a process any different from the nominally familial individuals). But these individuals won’t be part of the families - they will appear as ’sporadic’ cases, caused by the random convergence of population risk factors.
Methods are horses and DNA sets are courses

So this figure is the crunch. We have this model but how do we apply it? Well each experimental technique/method of data analysis has its strengths and weaknesses. I think that researchers should be very, very careful when looking at the DNA sample set in front of them and deciding what they want to do. If it is a familial heavy set (left-hand side of the graph) then you would want to do single large family genome-wide linkage studies, the related technique of following the co-segregation of candidate gene alleles and illness through a family or, finally, deep resequencing. This final technique is not really applied much as it requires the selection of a candidate gene and then sequencing this gene in many DNA samples from individuals with a family history. The idea is that you may be lucky and hit a rare, strong effect (’highly penetrant’ in genetic-speak) mutation.
If, on the other hand (right of the graph), you have a predominantly sporadic DNA sample set than this opens up the possibility of case-control association studies. This technique compares the frequency of candidate gene alleles (’flavours’ of the same gene) between people with illness and unaffected people. If a difference is seen then that gene allele indicates the nearby presence of a mutation. This approach requires the alleles to be reasonably common in the population otherwise they will not be detectable. Hence, case-control association studies and familial (rare allele) samples shouldn’t be combined…..if you want anything out the other end. This is entirely analagous to the multiple family genome-wide linkage problem….here the issue is the number of different fruit types combined in the barrel - you won’t taste the bad apple in your fruit smoothie.
We are entering the age of the whole-genome case-control association study where big-science does away with subjective choice of candidate genes and just screens everything. While the rather poor coverage of each gene (and its constituent LD blocks) might sometimes work against the aims of the experiment - a story for another time, perhaps - let’s hope that nobody applies the wrong DNA sample set and then looks puzzled when results are lacking……..