One of the rewards of the avalanche of genome information are the findings that there are associations between diseases, behaviour patterns and genotype, the genotype- phenotype association. The genotype-genotype association .
However one should take such findings with a pinch of salt until
1. The report has been repeated and hence more likely to be true, ie replication
2. that vigorous statistical analyses have been performed
3. The study information is sufficient to make a judgement on the work
4. The data presented is sufficient to make independent statistical analysis.
5. The methodologies used are proper.
6. The control population is truly a control population and sufficiently large to make a significant comparison.
An expert working party has reported in Nature with criteria for accepting studies of genotype-phenotype associations assessed by genome-wide or candidate-gene approaches.
• Statistical analyses demonstrating the level of statistical significance of a finding should be published or at least available so that others can attempt to reproduce the reported results
• Explicit information should be provided about the study’s power to detect a range of effects
■The study should be epidemiologically sound, with careful accounting for potential biases in selection of subjects, characterization of phenotypes, comparability of environmental exposures’ (when possible) and underlying population structure in cases and controls
■ Phenotypes should be assessed according to standard definitions provided in the report
• Associations should be consistent (within the range of expected statistical fluctuation) and reported for the same phenotypes across study subgroups or across similar phenotypes in the entire study group
• Significance should not depend on altering the quality control methods beyond standard approaches that could change inclusion or exclusion of large numbers of samples or loci
■ Measures to assess the quality of genotype data should include results of known study sample duplicates or publicly available samples
• The results for concordance between duplicate samples (if applicable) as well as completion and call rates per SNP and per subject should be disclosed, along with ratesof missing data
• A subset of notable SNPs should be evaluated with a second technology that verifies thesame result with excellent concordance, because no technology is error-free
■ Associations with nearby SNPs in strong linkage disequilibrium with the putatively associated SNP should be reported (and should be similar)
The results of replication studies of previous findings should be reported even if the results are not significant.
■ Testing for differences in underlying population structure in case and controlgroups should be performed and reported
■ Appropriate correction for multiple comparisons across all statistical testsexamined should be reported.
■ Comparison to genome-wide thresholds should be described. Similarly, for bayesian approaches, the choice of prior probabilities should be described
Not easy stuff, but do not accept any of these widely publicised associations without thinking, is this sensible.
NGI-NHGRI working Group on Replication in Association studies (2007 Replicating genotype –phenotype associations Nature vol 447, 655-660
- Martin Eastwood