Interpretable and predictive embeddings from EHR data using canonical polyadic tensor decomposition
A large amount of available electronic health records nowadays provide a great opportunity for modern medicine to disentangle hidden connections between diseases, as well as to make better predictions of disease development and potential medication. However, due to the high-dimensional and sparse nature of the data, it appears challenging to use the records to their full extent.
In this work, I propose a tensor decomposition method on the UK Biobank linked electronic health records. I show that the resulting embeddings are interpretable on the patient as well as the disease level. Additionally, the vectorized patient topic associations add value to common hazard models in the context of disease prediction.
I find that the accuracy of the predictions thereby is strongly connected to the quality of the decomposition. As the data increases above certain levels, the quality suffers and therefore the significance of the prediction change compared to the baseline.