Jump to: Authorized Access | Attribution | Authorized Requests

Study Description

The Resource for Genetic Epidemiology Research on Aging (GERA) Cohort was created by a RC2 "Grand Opportunity" grant that was awarded to the Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH) and the UCSF Institute for Human Genetics (AG036607; Schaefer/Risch, PIs). The RC2 project enabled genome-wide SNP genotyping (GWAS) to be conducted on a cohort of over 100,000 adults who are members of the Kaiser Permanente Medical Care Plan, Northern California Region (KPNC), and participating in its RPGEH. The purpose of the RPGEH is to facilitate research on the genetic and environmental factors that affect health and disease by linking together clinical data from electronic health records, survey data on demographic and behavioral factors, and environmental data from various sources, with genetic data derived from biospecimens collected from participants.

At the time of the award of the RC2 project in late 2009, the RPGEH had established a cohort of about 140,000 individuals who had answered a detailed survey, provided saliva samples for extraction of DNA, and given broad consent for the use of their data in studies of health and disease. To maximize the diversity of the resulting sample, the GERA cohort was formed by including all racial and ethnic minority participants with saliva samples (N = 20,925; 19%); the remaining participants were drawn sequentially and randomly from white non-Hispanic participants (89,341; 81%). A total of 110,266 participant samples were included to ensure that at least 100,000 were successfully assayed.

The resulting GERA cohort is 42% male, 58% female, and ranges in age from 18 to over 100 years old with an average age of 63 years at the time of the RPGEH survey (2007). The sample is ethnically diverse, generally well-educated with above average income. Approximately 69% of the participants are married or living with a partner. Length of membership in KPNC averages 23.5 years. UCSF and RPGEH investigators worked with the genomics company Affymetrix to design four custom microarrays for genotyping each of the four major race-ethnicity groups included in the GERA Cohort, described in detail in Hoffmann et al., 2011a and 2011b. Following genotyping and quality control procedures, and after removal of invalid, discordant, or withdrawn samples, about 103,000 participants were successfully genotyped. The resulting genotypic data were linked to survey data and data abstracted from the electronic medical records. As described below, all RPGEH participants were mailed new consent forms with explicit discussion of the placement of data in the NIH-maintained dbGaP. About 77% of participants returned completed consent forms, resulting in a final sample size of 78,486 participants in the GERA Cohort with data for deposit into dbGaP.

Origins of the RPGEH GERA Cohort

The goal in creating the RPGEH GERA cohort was to create a large, multiethnic, and comprehensive population-based resource for research into the genetic and environmental basis of common age-related diseases and their treatment, and factors influencing healthy aging and longevity. The GERA Cohort consists of a diverse cohort of more than 100,000 adults who are members of the Kaiser Permanente Medical Care Plan, Northern California Region (KPNC), and participating in its Research Program on Genes, Environment and Health (RPGEH). KPNC is an integrated health care delivery system with a population of about 3.3 million people in northern California. The membership of KPNC is representative of the general population in the 14 county area in which facilities are located, although the membership is underrepresented for the extremes of income at both ends of the spectrum. The RPGEH utilizes the longitudinal electronic health records (EHR) of KPNC to obtain clinical, laboratory, imaging and pharmacy information on all cohort members, to which personal demographic, behavioral and health characteristics have been added through member surveys. The GERA Cohort comprises a subsample of the RPGEH participant cohort, and was created through the RC2 award from the NIA, NIMH, and NIH Common Fund as described above.

GERA Study Design

The GERA Cohort is a subsample, as described above, of the longitudinal cohort enrolled in the Kaiser Permanente RPGEH. The RPGEH cohort includes about 400,000 survey participants of whom about 200,000 have provided broad consent and a sample of saliva or blood for use in studies of genetic and environmental factors in health and disease. The GERA Cohort was developed from a mailed survey sent to all adult members of KPNC who had been members for two years or more in 2007. All survey respondents were contacted and asked to complete a consent form; those who completed consent forms were asked to provide a saliva sample. Additional male participants were added to the RPGEH through inclusion of the Northern California sample of the California Men's Health Study (CMHS) cohort of about 40,000 men from KPNC, ages 45-69 years old at the time of the CMHS survey in 2002-2003. The CMHS participants contributed about 15,400 saliva samples to the RPGEH and were eligible for inclusion in the GERA Cohort. CMHS participants were included according to the same sampling design as for the RPGEH cohort as a whole. Specifically, all minority participants were selected for inclusion in order to maximize representation of minorities in the GERA Cohort, and Non-Hispanic White participants were selected at random to complete the sample of 110,266 GERA Cohort participants.

GERA Genotypic Data

High-density genotyping was conducted at UCSF using custom designed Affymetrix Axiom arrays, as described in Hoffmann et al. (2011a; 2011b). To maximize genome-wide coverage of common and less common variants, four specific arrays were designed for individuals of Non-Hispanic White (EUR), East Asian (EAS), African-American (AFR), and Latino (LAT) race/ethnicity. There was broad overlap among the SNPs on the arrays, which were designed using a hybrid greedy imputation algorithm (Hoffmann et al., 2011b) applied to genotype information validated by Affymetrix from the 1000 Genomes Project. However, in order to capture low frequency variants specific to particular race-ethnicity groups, SNP content varies between arrays. A more detailed description of the process of genotyping and results is included in Genotyping of DNA Samples. Description of the analyses of population structure and development of principal components for adjustment of population structure is included in Population Structure Analysis.

GERA Phenotypic Data

RPGEH and CMHS Survey Data. The sources of data on demographic and behavioral factors deposited in dbGaP for the GERA Cohort are the RPGEH and CMHS surveys. Data on common demographic factors such as gender, race/ethnicity, marital status, and education and on behavioral factors such as smoking, alcohol consumption, and body mass index, have been cleaned, edited, reconciled between the two surveys, and compiled into summary indices, where appropriate, for deposition into dbGaP. A more complete description of the survey variables is included in Survey Variables Documentation. Please note that the terms of use of the GERA Cohort Data, as specified in the Data Use Certification (DUC), prohibit the use of survey variables as outcomes in analyses. For example, a genome-wide association study (GWAS) of education or smoking is not permitted as specified by the DUC. Only health conditions can be used as outcome variables in analyses.

Health Conditions derived from Kaiser Permanente Electronic Medical Records. Data on the occurrence of health conditions in participants in the GERA Cohort have been derived from summarizing ICD-9 coded diagnoses in Kaiser Permanente's electronic medical records. An algorithm that aggregates specific ICD-9 codes into appropriate diagnostic groups for selected conditions is applied to outpatient and inpatient databases; see Disease and Conditions Definitions Documentation for details. The criterion for including a condition as "present" for a participant is the occurrence of two or more diagnoses within a diagnostic category occurring on separate days. Two or more is used as the criterion in order to reduce false positives due to mistakes or rule-out diagnoses. When compared with validated disease registries, the criterion of 2+ diagnoses yields high specificity and good sensitivity. ICD-9 codes in the electronic records are specified in several ways. For outpatient visits occurring during the period 1995 to 2006, diagnoses were assigned by the treating physician who endorsed specific diagnoses on an optically scanned list that varied by specialty. Beginning in 2006 with the advent of an integrated, fully electronic medical record, outpatient diagnoses are made by physicians/ providers using a pull down menu. Discharge diagnoses from inpatient stays are specified by physicians and coded by specially trained coders.

Databases of ICD-9 codes for diagnoses assigned at outpatient visits, or as one of the discharge diagnoses following inpatient stays, are complete and available for all KPNC members dating back to 1995. Although the average length of KPNC membership among GERA cohort members is 23.5 years in 2007, not all have been members since 1995, so the history for some conditions, such as those that are not chronic or recurrent, may not be complete for all cohort members. The year of first membership in KPNC is included as a variable in the list of survey variables, enabling investigators to estimate the number of years of observation of each Cohort member.

RPGEH Access and Collaborations Website and Procedures

The RPGEH maintains a web portal for inquiries and applications for collaboration and access to data. The url is: https://rpgehportal.kaiser.org/. RPGEH has an application process and an Access Review Committee that reviews applications for collaboration and use. For more details, please contact RPGEH through the website.

Authorized Access
Publicly Available Data (Public ftp)
Study Inclusion/Exclusion Criteria

Inclusion criteria for the GERA Cohort data deposited in dbGaP include all of the following:

  1. Eligible for RPGEH survey
    1. ≥ 18 years of age at time of survey mailing (2007)
    2. KP Northern California Region enrollee for at least 2 years prior to survey
  2. Consented to contribute biospecimen to RPGEH and returned saliva sample by cutoff date for GERA genotyping
  3. All available samples from minorities were included, plus Non-Hispanic Whites selected at random to reach 110,266 participants with extracted DNA whose samples were submitted for genotyping
  4. Successfully genotyped (DQC ≥ 0.82; call rate ≥ 0.97) from extracted DNA
  5. Consented explicitly to have data deposited in NIH-maintained database

Exclusion criteria for the GERA Cohort data deposited in dbGaP included any of the following:

  1. Subject requested withdrawal from study after DNA extraction and genotyping
  2. Validity of link between biospecimen and study participant questionable because of genotype-phenotype discordance, e.g. gender

Molecular Data
TypeSourcePlatformNumber of Oligos/SNPsSNP Batch IdComment
Whole Genome Genotyping Affymetrix Axiom_KP_UCSF_AFR 893968 N/A See ArrayAnnotationV8.txt and ArrayAnnotationV8Doc.txt (download through Authorized Access) for annotated list of SNPs included on each array
Whole Genome Genotyping Affymetrix Axiom_KP_UCSF_EAS 713412 N/A See ArrayAnnotationV8.txt and ArrayAnnotationV8Doc.txt (download through Authorized Access) for annotated list of SNPs included on each array
Whole Genome Genotyping Affymetrix Axiom_KP_UCSF_EUR 675367 N/A See ArrayAnnotationV8.txt and ArrayAnnotationV8Doc.txt (download through Authorized Access) for annotated list of SNPs included on each array
Whole Genome Genotyping Affymetrix Axiom_KP_UCSF_LAT 818154 N/A See ArrayAnnotationV8.txt and ArrayAnnotationV8Doc.txt (download through Authorized Access) for annotated list of SNPs included on each array
Study History

The Resource for Genetic Epidemiology Research on Aging (GERA) Cohort is part of the larger Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH), a resource developed to facilitate research on genetic and environmental factors on a wide variety of common diseases and healthy aging. To conduct research, the RPGEH links together data from the electronic medical records (EMR) of participants, survey data on demographic and behavioral factors, and environmental data from geographic information system databases, with genetic data derived from biospecimens from large numbers of participating Health Plan members.

Survey and Cohort Recruitment. The RPGEH was first funded in 2005 and has worked to build the cohort and data resources continuously since then. Initially, the RPGEH developed electronic disease registries to enable identification of phenotypes, using algorithms applied to EMR data. In 2007, the RPGEH mailed a four page survey to 1.9 million adult (≥ 18 years old) members of KPNC who had been members for two years or more, to obtain data on demographic and behavioral factors complementary to the clinical data in the EMR. The survey materials included a cover letter introducing the RPGEH, a two page list of Frequently Asked Questions, and the survey, which included questions on demographic factors such as education, race-ethnicity, income and marital status, dietary factors, physical activity, smoking, and alcohol consumption, as well as reproductive history and reproductive health. Members whose electronic records indicated a preference for written communications in Chinese or Spanish received survey materials both in English and a Chinese or Spanish translation. Approximately 400,000 completed surveys were returned.

Saliva Sample Collection. Beginning in July 2008, respondents to the survey were asked to sign and return a consent form and authorization for use and disclosure of protected health information. The consent form authorized broad use of biospecimens, survey data, and data from participants' electronic health records for use in studies of genetic and environmental influences on health and disease. Respondents who returned completed consent forms were mailed (Oragene) saliva collection kits; more than 132,000 saliva samples were collected in two years. Completed saliva kits were scanned and archived in a temporary biorepository at the KPNC Division of Research.

In late 2009, the RPGEH added collection of saliva samples from the California Men's Health Study (CMHS), a cohort that had been previously assembled in 2002-2003 and had been excluded from the RPGEH survey mailing with the intent of later adding CMHS participants to the assembled RPGEH cohort. The CMHS was developed to facilitate research on prostate cancer and other conditions in older men; the study protocol is described in Enger et al., 2006. It enrolled and surveyed more than 40,000 men in KPNC, ages 45-69 years, who were members of KPNC during 2002-2003. CMHS men completed two mailed surveys with demographic and behavioral data similar to that of the RPGEH. The data on analogous variables were reconciled and integrated with the data derived from the RPGEH cohort for use in the RPGEH resource. RPGEH collected approximately 15,400 additional saliva samples from men participating in the CMHS in 2009.

GERA Genotyping project. In September 2009, the RPGEH and UCSF received a Grand Opportunity grant from NIA, NIMH, and the NIH Director's Office (RC2 AG036607) that enabled genome-wide genotyping of over 100,000 participants, selected from the approximately 147,000 RPGEH participants who had provided consent and saliva samples up to that time. The RC2 grant was jointly awarded to Kaiser Permanente Division of Research and the UCSF Institute for Human Genetics (Schaefer / Risch, PIs). This project formed the GERA Cohort that is the basis for the deposition of data in dbGaP. The aims of the project included extraction of DNA from over 100,000 saliva samples; design of four custom microarrays for genotyping (one for each major race-ethnicity group in the cohort); genotyping of over 100,000 DNA samples; linkage of the resulting data with clinical data from the EMR, survey data, and environmental data sources to enable analysis of genetic and environmental influences on many diseases and conditions; development of tools for provision of tailored datasets for specific research projects; and deposit of data in dbGaP.

Four custom arrays were designed for genotyping, one for each of the four major race-ethnicity groups in the RPGEH cohort: African Americans, East Asians, Latinos, and Non-Hispanic Whites. The number of SNPs and SNP content varied by array, with SNP content designed to maximize the genome-wide coverage of low frequency and more common variants specific to the different race-ethnicity groups, including newly identified SNPs from sequencing projects, and SNPs with established associations with disease phenotypes and risk factors. Description of the array designs is provided in two publications: Hoffmann et al., 2011a and Hoffmann et al., 2011b. Genotyping was performed at the Genomics Core Facility of the Institute for Human Genetics at UCSF, under the direction of Pui-Yan Kwok, MD, PhD. Description of the DNA extraction and genotyping processes and QC is provided separately in Genotyping of DNA Samples. Description of the analyses of population structure and development of principal components for adjustment of population structure is included in Population Structure Analysis.

To maximize the diversity of the sample, the GERA cohort was formed by including all racial and ethnic minority participants with saliva samples (N = 20,925; 19%); the remaining participants were drawn sequentially and randomly from white non-Hispanic participants (89,341; 81%). A total of 110,266 participant samples were included to ensure that at least 100,000 were successfully assayed.

Reconsent for dbGaP data deposition. Although the original consent form signed by RPGEH participants provided for sharing of de-identified data with collaborators, it did not provide explicit consent for placement of participants' data in databases with access controlled by NIH or other third parties. To ensure all participants were appropriately consented for placement of data in dbGaP, the RPGEH mailed new consent forms that included a section explaining dbGaP to all participants. Approximately 77% of participants returned the signed, updated consent form. After excluding samples that failed genotyping and small numbers of invalid or duplicate samples, the total number of appropriately consented participants with data for deposition in dbGaP is 78,486. The demographic characteristics of the final GERA cohort for dbGaP are similar to those of the broader GERA genotyped cohort.

Funding. In addition to the NIH funding of the RC2 project that supported the genotyping, the RPGEH has been supported by grants from philanthropic foundations, including the Wayne and Gladys Valley Foundation, the Ellison Medical Foundation, and the Robert Wood Johnson Foundation, as well as support from Kaiser Permanente, for work on disease registries, cohort enrollment, survey collection, and collection of biospecimens.

Selected Publications
Diseases/Traits Related to Study (MeSH terms)
Authorized Data Access Requests
Study Attribution
  • Principal Investigators
    • Catherine Schaefer, PhD. Kaiser Permanente Research Program on Genes, Environment and HealthKaiser Permanente Division of Research, Oakland, CA, USA.
    • Neil Risch, PhD. Institute for Human Genetics, University of California, San Francisco, CA, USA.
  • Co-Investigators
    • Elizabeth Blackburn, PhD. Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA.
    • Pui-Yan Kwok, MD, PhD. Institute for Human Genetics, University of California, San Francisco, CA, USA.
    • Yambazi Banda, PhD. Institute for Human Genetics, University of California, San Francisco, CA, USA.
    • Thomas Hoffmann, PhD. Institute for Human Genetics, University of California, San Francisco, CA, USA.
    • Carlos Iribarren, MD, PhD. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Mark Kvale, PhD. Institute for Human Genetics, University of California, San Francisco, CA, USA.
    • Charles Quesenberry, PhD. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Sarah Rowell, MPH. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Carol Somkin, PhD. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Stephen Van den Eeden, PhD. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Larry Walter, MA. Kaiser Permanente Division of Research, Oakland, CA, USA.
    • Rachel Whitmer, PhD. Kaiser Permanente Division of Research, Oakland, CA, USA.
  • Funding Sources
    • RC2 AG036607. National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
  • Funding Source Contact
    • Winifred K. Rossi, MA. National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
  • Genotyping Center
    • Genomics Core Facility, Institute for Human Genetics, University of California, San Francisco, CA, USA.
  • Genotyping Quality Control
    • Mark Kvale, PhD. Institute of Human Genetics, University of California, San Francisco, CA, USA.