Interpretation and visualization of non-linear data fusion in kernel space: study on metabolomic characterization of progression of multiple sclerosis.
Publication year
2012Source
PLoS One, 7, 6, (2012), article e38163ISSN
Publication type
Article / Letter to editor
Display more detailsDisplay less details
Organization
Biophysical Chemistry
Biochemistry (UMC)
Analytical Chemistry
Former Organization
Physical Chemistry/Biophysical Chemistry
Journal title
PLoS One
Volume
vol. 7
Issue
iss. 6
Subject
Analytical Chemistry; Biophysical Chemistry; IGMD 8: Mitochondrial medicineAbstract
BACKGROUND: In the last decade data fusion has become widespread in the field of metabolomics. Linear data fusion is performed most commonly. However, many data display non-linear parameter dependences. The linear methods are bound to fail in such situations. We used proton Nuclear Magnetic Resonance and Gas Chromatography-Mass Spectrometry, two well established techniques, to generate metabolic profiles of Cerebrospinal fluid of Multiple Sclerosis (MScl) individuals. These datasets represent non-linearly separable groups. Thus, to extract relevant information and to combine them a special framework for data fusion is required. METHODOLOGY: The main aim is to demonstrate a novel approach for data fusion for classification; the approach is applied to metabolomics datasets coming from patients suffering from MScl at a different stage of the disease. The approach involves data fusion in kernel space and consists of four main steps. The first one is to extract the significant information per data source using Support Vector Machine Recursive Feature Elimination. This method allows one to select a set of relevant variables. In the next step the optimized kernel matrices are merged by linear combination. In step 3 the merged datasets are analyzed with a classification technique, namely Kernel Partial Least Square Discriminant Analysis. In the final step, the variables in kernel space are visualized and their significance established. CONCLUSIONS: We find that fusion in kernel space allows for efficient and reliable discrimination of classes (MScl and early stage). This data fusion approach achieves better class prediction accuracy than analysis of individual datasets and the commonly used mid-level fusion. The prediction accuracy on an independent test set (8 samples) reaches 100%. Additionally, the classification model obtained on fused kernels is simpler in terms of complexity, i.e. just one latent variable was sufficient. Finally, visualization of variables importance in kernel space was achieved.
This item appears in the following Collection(s)
- Academic publications [244262]
- Electronic publications [131202]
- Faculty of Medical Sciences [92892]
- Faculty of Science [37138]
- Open Access publications [105225]
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.