Thesis
Methods to jointly analyze multiple phenotypes
- Abstract:
-
Linear mixed models (LMMs) have re-emerged as a central tool in statistical genetics. Fixed effects capture genetic variants tested for association. Random effects leverage aggregate relatedness while remaining agnostic to specific genetic mechanisms, naturally modeling heritability and controlling for polygenic background and confounding from population or family structure in genome-wide association studies (GWAS). Multiple random effects can partition heritability amongst many biologically meaningful variance components (VCs).
Concurrently, genetic studies have begun to analyze multiple traits. This can improve power by adding data and can inform the path from genotype to phenotype, e.g. with graphical models, pleiotropy detection or endophenotyping. Multi-trait analyses are natural for biobanks and high-throughput phenotypic measurements like gene expression, medical images or metabolites.
Following these advances, this thesis develops three multi-trait mixed models. phenix imputes missing phenotype data by modifying probabilistic matrix factorization to incorporate genetic relatedness. General linear mixed models (GLMMs) generalize and unify multi-VC, multi-trait and likelihood-penalized mixed models. Finally, compressive mixed models (CMMs) combine the two, obtaining the imputation and computational benefits of phenix and the heritability estimation and multi-VC capabilities of GLMMs.
phenix essentially always outperforms all competitors in imputation and can improve GWAS power. GLMMs accurately estimate heritability despite (measured) confounders, can improve phenotype prediction, and increase gene-based, multi-trait association signal. CMMs regularly improve prediction, scale to thousands of phenotypes, and can uncover plausible GWAS hits entirely missed by LMMs. Altogether, multi-trait mixed models are invaluable for intrinsically multitrait tasks, like phenotype imputation and low-rank decomposition, and, surprisingly, can be much faster than LMMs; however, I find only small, and inconsistent, benefits for single-trait-oriented objectives like heritability estimation and out-of-sample prediction.
The challenges in this thesis are primarily computational. Naively, multi-trait approaches model an N × P matrix of P phenotypes measured on N samples as a long Gaussian vector, inducing prohibitive O(N3P3) computations. Fortunately, the parsimonious matrix normals underlying mixed models enable simpler O(N3+P3) expressions. This is summarized by a new decomposition for positive semidefinite tensor products that, under a crucial assumption, facilitates cheap evaluation of ubiquitous low-level operations like multiplication.
Actions
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- UUID:
-
uuid:ed466a17-e96f-482b-b164-aa7ceefd94d4
- Deposit date:
-
2017-04-19
If you are the owner of this record, you can report an update to it here: Report update to this record