Development of tensorial signal processing and machine learning tools tailored to the analysis of urine metabolomics: Application to colorectal cancer screening

Group: Signal and information processing for sensing systems
Group leader: Santiago Marco (smarco@ibecbarcelona.eu)

Project description:

Volatolomics is a strongly emerging research field that mainly focuses on health-related volatile organic compounds (VOCs) and gases and holds the promise to deliver non-invasive diagnostic tools. These compounds can be obtained from some body fluids such as breath, urine, faeces, flatus, wound drains, etc. The main goal of computational biomarker discovery is to identify molecular markers hidden in complex omics data that can be used for screening.Colorectal cancer (CRC) is the second most frequent neoplasia. It has been shown that the faecal occult blood test (FOBT) applied in the population aged >50 years reduces the mortality from CRC by a 30%. It is estimated that in a standard population, compliance with CRC screening is ~30-50% since it is a rather cumbersome test, requiring the collection of faeces. It is considered that alternative diagnostic methods based on urine samples will increase the enrolment on the screening program. The analysis of the huge amount and complex raw data that come from urine analysis using spectrometry requires sophisticated signal and data processing approaches. Despite years of research, the analysis of these signals remains an open problem. The analytical instrumentation signals feature the mix of hundreds of chemical sources (metabolites), with a large dynamical range in noisy backgrounds. This is particularly relevant for global, untargeted biomarker di scovery studies in metabolomics. It is often the case that the discriminant information appears in low signal/noise ratio components in the presence of more abundant compounds with no information. Due to physiological variability of living beings, these non-informative compoundsmay vary strongly hiding the informative variation of smaller components. In this thesis tensorial signal processing will be explored for metabolite unmixing. The host group at IBEC has long standing experience in the analysis of complex signals from chemical instrumentation for diverse applications in food technology, biomedicine, environmental health, etc. The group has participated in numerous national, European and industrial projects involving the customized development of data processing workflows from raw data to knowledge.

Job position description

The main role of the PhD candidate will be to analyze highly complex metabolomics data using computational methods. Building on the knowledge of the host group, the candidate will improve our current workflow for the analysis of metabolomics data. The emphasis will be on the analysis of Gas Chromatography – Ion Mobility Spectrometry data and partially on Liquid Chromatograhy – Mass Spectrometry. Data analysis will combine raw signal preprocessing for feature extraction followed by biomarker discovery by machine learning techniques. In the algorithm development the candidate will make use of state-of -the-art techniques from the fields of signal processing, statistics and machine learning. A good intuition for abstract mathematical concepts will help. The thesis focus will be the latest analysis techniques for tensorial data (so called n-way methods) within a full metabolomics workflow. The candidate will be introduced to the best practices in predictive model validation and the management of confounding factors. Good programming skills in R are desired, but candidates able to program in MATLAB or Python will be also considered. The chemical interpretation of the results will be carried out in collaboration with analytical chemist experts in metabolomics (group members)and the clinical interpretation as well as study design will be handled by Dr. Josep Gumà (head of oncology at Hospital Sant Joan de Reus) co-director of this thesis. This proposal is funded by a National Research Project in collaboration with Universitat Rovira I Virgili and Hospital Sant Joan de Reus. Good communication skills in English (both oral and written) are essential. The candidate should be able to work in a interdisciplinary project at the frontier of data analysis, chemistry and biomedicine. During the project he will need to collaborate effectively with people from different cultures and backgrounds.