Boosting the copy number analysis of cancer genomes: methods

MOSEN ANSORENA, DAVID

Boosting the copy number analysis of cancer genomesmethods

MOSEN ANSORENA, DAVID

Dirixida por:

Ana Maria Aransay Bañares Director
Begoña M. Jugo Director

Universidade de defensa: Universidad del País Vasco - Euskal Herriko Unibertsitatea

Fecha de defensa: 07 de xullo de 2014

Tribunal:

Julio A. Rozas Liras Presidente/a
Arkaitz Carracedo Pérez Secretario/a
David Posada González Vogal
Joaquín Dopazo Blázquez Vogal
Iñaki Inza Cano Vogal

Tipo: Tese

Teseo: 117592 DIALNET

Resumo

Deviations in the amount of genomic content that arise during tumorigenesis, called copy number alterations (CNAs), are structural rearrangements that can critically affect gene expression patterns. Additionally, CNA profiles allow insight into cancer discrimination, progression and complexity. Methods for the modelling and detection of CNAs with high throughput technologies need to account for three levels of data characterization: genome variability, cancer and technology. The main genomic variability factors are sequence uniqueness and GC content. Tumor-specific factors are normal cell contamination, tumor heterogeneity and aneuploidy. Lastly, other issues are specific of either single nucleotide polymorphism (SNP) microarrays or high-throughput sequencing (HTS). Four novel methodological advancements in the characterization and analysis of CNAs are presented. First, a comprehensive model that characterizes CNA data from SNP microarrays is introduced, so that recently developed tools for CNA detection could be thoroughly benchmarked on synthetic data derived from such model. Such survey reveals room for improvement in the segmentation process of some of the tools. Hence, second, a generic framework for bivariate segmentation of SNP array data is introduced and a concrete implementation of the framework is evaluated on the same synthetic data. Lessons learned from CNA analysis on SNP array data are easily ported to HTS so, third, an implemented workflow for the analysis of CNAs on HTS data that includes novel methodology on filtering and GC content correction is presented. Fourth, a software package to perform association analysis between CNAs and cancer-related phenotypes is introduced, where CNAs could have been detected through any high-throughput technology. The package also enables gene enrichment, thanks to wise examination of associated genes and their distribution in the genome.