Study design
The project has a nested case-control design, using biosamples coming from 3 existing prospective cohorts: The cancer-related studies are based on the Northern Sweden Health & Disease Study (NSHDS) and EPIC-Italy, while the study on neurological and immune diseases in children is based on the Rhea mother-child cohort in Crete. All cohorts are providing biosamples (buffy coat, plasma, erythrocytes and, in some cases, urine), as well as data on dietary and other exposures and clinical outcomes.
The first two cohorts are providing biosamples collected from adult subjects at a time in the past when they were without disease. Samples from subjects who have since developed breast cancer (N=600) or NHL (N=300), along with equal numbers of samples from matched, disease-free controls, are being analysed for the biomarkers of interest. For a fraction of the study subjects from NHSDS, repeat samples collected at different times are also being analysed.
The Rhea cohort is providing biosamples collected from the cord blood (plus urines from the pregnant mothers), as well as repeats collected at age 2 years, from 600 children who display different scores in clinical tests for the diseases of interest conducted at age 4 years.
Intermediate biomarkers
Two-stage approach
A two-stage strategy is being employed using –omics technologies:
a) discovery phase: analysis of a limited number of samples (approx. 15-30% of the total) using untargeted or wide-target screening methods to search for new biomarkers;
b) validation phase: analysis of the bulk of biosamples by high-throughput methods targeted on a small set of candidate biomarkers, identified during the first stage.
Biomarker validation studies
In view of the limited experience with the use of –omics technologies in large-scale population studies, a pilot technical validation study, using a limited number of biobank samples, will evaluate intralaboratory reproducibility/repeatability and the influence of sample collection centre, handling prior to freezing, length of storage time and be used for planning the execution of the remaining analyses.
Biomarker analyses
a) Metabonomics: Full, untargeted analysis will be applied to all samples (i.e. no discovery phase) based on analysis of plasma by reverse-phase C18 LC-QTOF-MS in both positive and negative modes. Subsets of samples will be pooled for acquisition of LC-MS/MS data to facilitate metabolite annotation..
b) Epigenomics: For a selection of cases of each cancer type and corresponding controls, genome-wide analysis using CpG island microarrays will be conducted. In the case of the childhood disease study, this will be done groups of children representing the highest and lowest clinical scores for each of the three disease endpoints of interest. Based on the outcome of this phase, a limited number of sequences will be selected for subsequent analysis using pyrosequencing on the full set of samples, including repeat samples.
c) Proteomics: The same approach as for epigenomics is being followed, i.e., a selection of samples will be used initially for wide-target proteomic analyses using the Luminex Multianalyte Profiling system, for the simultaneous assay of 47 inflammation-related proteins including cytokines, chemokines, growth and angiogenic factors. Based on the outcome of this search, a limited number proteins will be selected for further analysis using the same method on the full set of samples.
d) Transcriptomics: The same approach as for epigenomics is being followed, i.e., a selection of samples will be used initially for genome-wide analysis using microarrays, followed by targeted analyses of selected targets on the bulk of samples.
Biomarkers of exposure
PCBs: serum concentrations of a number of PCB congeners, as well as p,p’-DDE, using GC/MS/SIM. If necessary, sample pooling will be considered, in order to obtain information on the presence of a larger range of congeners.
cadmium and lead: erythrocytes concentrations using inductive couple plasma-mass spectrometry (ICP-MS).
PAHs: PAH-DNA adducts in blood leukocytes using a recently developed, ultrasensitive ELISA which utilises antisera raised against benzo[a]pyrene diol epoxide-modified DNA.
phthalates: urinary concentrations of phthalate metabolites, using HPLC/MS/ESI (negative ionization mode) and quantification by isotope dilution.
PBDEs (polybrominated diphenyl ethers): serum or urine concentrations of a number of congeners , using GC/MS (electron capture negative ion mode).
Other exposure data
In addition to the use of biomarkers of exposure, exposure assessment is also being conducted based on the use of traditional, questionnaire-based tools for dietary intake of macro- or micronutrients as well as modern, GIS-based approaches for exposure to air pollutants. Furthermore, a substantial amount of exposure information exists for many of the study subjects, coming from other previous or ongoing projects, including
dietary intake of a number of carcinogens, as well as biomarkers of genotoxicity (micronuclei, DNA adducts) via the European NewGeneris project,
ambient air pollutants (NOx, PM2.5 through the European project ESCAPE and
water disinfection by-products via the European HiWate project.
Genotyping
Although a genome-wide analysis of genetic variation would be highly desirable, for budgetary reason it is not currently included in the project for reasons. However, a genome-wide scan is currently on-going for about 6 000 breast cancers in EPIC, which includes a fraction of the Italian and the NSHDS subjects.
Data management, bioinformatics and systems biology analyses, risk assessment, public health implications
All exposure, biomarker and clinical data will be used to evaluate, using appropriate statistical models, the relationships between external exposures, biomarkers of exposure, intermediate markers and disease phenotypes. Mathematical models that encompass the different stages between exposure and disease will be elaborated using biostatistical and bioinformatic high-level skills. Initially emphasis will be given to the comparison of intermediate biomarkers and disease risk so as to identify common denominators in -omics responses which indicate biomarkers with risk predictivity. Once such biomarkers are identified, their relationships with the various exposures will also be evaluated.
The contribution of both exposure biomarkers and intermediate biomarkers in predicting disease will be estimated and compared with other epidemiological approaches. The overall population impact of the exposures investigated will be estimated with and without the inclusion of biomarkers into causal models. In all cases, information on diet, lifestyle, reproductive history, anthropometric data etc will be evaluated as causative and as modulating or confounding factors.