Boosting the copy number analysis of cancer genomesmethods
- MOSEN ANSORENA, DAVID
- Ana Maria Aransay Bañares Director
- Begoña M. Jugo Director
Universidade de defensa: Universidad del País Vasco - Euskal Herriko Unibertsitatea
Fecha de defensa: 07 de xullo de 2014
- Julio A. Rozas Liras Presidente/a
- Arkaitz Carracedo Pérez Secretario/a
- David Posada González Vogal
- Joaquín Dopazo Blázquez Vogal
- Iñaki Inza Cano Vogal
Tipo: Tese
Resumo
Deviations in the amount of genomic content that arise during tumorigenesis, called copy number alterations (CNAs), are structural rearrangements that can critically affect gene expression patterns. Additionally, CNA profiles allow insight into cancer discrimination, progression and complexity. Methods for the modelling and detection of CNAs with high throughput technologies need to account for three levels of data characterization: genome variability, cancer and technology. The main genomic variability factors are sequence uniqueness and GC content. Tumor-specific factors are normal cell contamination, tumor heterogeneity and aneuploidy. Lastly, other issues are specific of either single nucleotide polymorphism (SNP) microarrays or high-throughput sequencing (HTS). Four novel methodological advancements in the characterization and analysis of CNAs are presented. First, a comprehensive model that characterizes CNA data from SNP microarrays is introduced, so that recently developed tools for CNA detection could be thoroughly benchmarked on synthetic data derived from such model. Such survey reveals room for improvement in the segmentation process of some of the tools. Hence, second, a generic framework for bivariate segmentation of SNP array data is introduced and a concrete implementation of the framework is evaluated on the same synthetic data. Lessons learned from CNA analysis on SNP array data are easily ported to HTS so, third, an implemented workflow for the analysis of CNAs on HTS data that includes novel methodology on filtering and GC content correction is presented. Fourth, a software package to perform association analysis between CNAs and cancer-related phenotypes is introduced, where CNAs could have been detected through any high-throughput technology. The package also enables gene enrichment, thanks to wise examination of associated genes and their distribution in the genome.