Abstract

Protein C is an important endogenous anticoagulant in hemostasis. Deficiencies of protein C due to genetic mutations or a low level of circulating protein C increase the risk of venous thromboembolism. We performed a genome-wide association scan for plasma protein C antigen concentration with approximately 2.5 million single-nucleotide polymorphisms in 8048 individuals of European ancestry and a replication analysis in a separate sample of 1376 individuals in the Atherosclerosis Risk in Communities Study. Four independent loci from 3 regions were identified with genome-wide significance: 2p23 (GCKR, best SNP rs1260326, P = 2.04 × 10−17), 2q13-q14 (PROC, rs1158867, P = 3.77 × 10−36), 20q11 (near and within PROCR, rs8119351, P = 2.68 × 10−203), and 20q11.22 (EDEM2, rs6120849, P = 7.19 × 10−37 and 5.23 × 10−17 before and after conditional analysis, respectively). All 4 loci replicated in the independent sample. Furthermore, pooling the discovery and replication sets yielded an additional locus at chromosome 7q11.23 (BAZ1B, rs17145713, P = 2.83 × 10−8). The regions marked by GCKR, EDEM2, and BAZ1B are novel loci that have not been previously reported for association with protein C concentration. In summary, this first genome-wide scan for circulating protein C concentration identified both new and known loci in the general population. These findings may improve the understanding of physiologic mechanisms in protein C regulation.

Introduction

Protein C, a vitamin K–dependent plasma glycoprotein synthesized in the liver, is one of the most important endogenous anticoagulants.1  Upon activation by the thrombin-thrombomodulin complex, it inactivates factor Va and FVIIIa and thus reduces the coagulation reaction and consequently formation of thrombi. Hereditary protein C deficiencies, characterized by reduction of protein C antigen/activity due to rare genetic mutations, contribute to familial venous thrombosis.2-5  In the general population, a low level of circulating protein C as well as common variants in the protein C gene are associated with increased risk of venous thromboembolism.6-9  Activated protein C also exerts other physiologic effects including anti-inflammatory and antiapoptotic properties and endothelial barrier stabilization.10  Treatment with activated protein C is effective for patients with severe sepsis and acute organ dysfunction.10  Plasma levels of protein C are influenced by genetic factors, with a heritability of 0.36 and 0.50 in Spanish and Mexican-American families, respectively.11,12  To date, only a few candidate genes studies of protein C focusing on a few variants have been reported.9,13-15  A comprehensive investigation of genomic variants influencing protein C is not available in the literature. We performed a genome-wide association (GWA) scan for plasma protein C concentration with approximately 2.5 million single-nucleotide polymorphisms (SNPs), based on the data from a large population of individuals of European ancestry in the Atherosclerosis Risk in Communities (ARIC) study.

Methods

Study population and phenotype measurement

The ARIC study includes a longitudinal epidemiologic cohort recruiting by probability sampling 15 792 African American and European American adults aged 45 to 64 years in 1987 through 1989 from Forsyth County, NC; Jackson, MS; suburbs of Minneapolis, MN; and Washington County, MD.16  Participants of European ancestry, by self-report, were recruited from the 3 field centers not including Jackson. The Jackson center recruited only African Americans. Three follow-up exams and hospital and death surveillance were conducted to ascertain the development of cardiovascular diseases. The ARIC study was approved by the institutional review board of each field center institutes and participants gave informed consent in accordance with the Declaration of Helsinki.

Baseline measures of demographic and clinical characteristics, including anthropometry, lifestyle variables, medical history, and medication use, were collected by standardized protocols during a home interview and clinical examination in which fasting blood was drawn. Aliquots of citrated plasma were obtained by centrifugation at 4°C and stored at −70°C for protein C measurement within a few weeks. Protein C antigen was measured by commercial enzyme-linked immunosorbent assay (ELISA) kits (Asserachrom Protein C, Diagnostica Stago) at a central laboratory. The coefficient of variation was 12%; the reliability coefficient (between-subject variance divided by total variance) obtained from repeated testing of individuals over several weeks was 0.56.17  DNA samples were extracted from blood samples and consent was obtained for genetic testing.

Genotyping and imputation

Details on genotyping, quality control, and imputation have been described elsewhere.18  In brief, genome-wide SNPs were measured using Affymetrix SNP array 6.0 in an initial set of 8861 participants of European ancestry. Individuals were excluded based on the following criteria: (1) self-reported sex mismatched with genotypic sex; (2) substantial genotype discordance with previous reference panel, (3) all but one in each set of suspected first-degree relatives based on genome-wide genotype data, or (4) genetic outliers using a principal components approach as calculated by EIGENSTRAT,19  resulting in a sample of 8127 genotyped individuals of European ancestry. Of these, 8052 had protein C measures available and were not using a coumarin-based anticoagulant at the time of protein C measurement (ie, baseline), constituting the GWA scan discovery set in this study. SNPs were screened for call rates < 90%, minor allele frequencies (MAF) ≤ 1%, or Hardy-Weinberg (HW) equilibrium P < 10−6, resulting in 602 642 variants for inclusion in the imputation. Imputation was performed with the use of the phased data from the haplotype map for Centre d'Etude du Polymorphisme Humain samples of Utah residents with ancestry from Northern and Western Europe (HapMap-CEU) human genome release 21 (build 35) and the program MACH Version 1.00.16 (http://www.sph.umich.edu/csg/abecasis/MACH/download/).20  Imputation quality for each SNP was reflected by the ratio of empirically observed variance to the expected binomial variance of the allele dosage at HW equilibrium.21  In addition to the above quality control screens, we excluded from the analysis SNPs with imputation quality score < 0.3 or MAF ≤ 1%, resulting in a total of 2 461 269 SNPs in the analysis for protein C concentration. Physical positions for SNPs were mapped to the HapMap build 36.

In silico replication was conducted in an additional sample of 1376 ARIC participants of European ancestry who were genotyped with the same Affymetrix array in a second set at a later time and not on anticoagulant treatment at baseline. This set finished the genotyping task of the whole ARIC population including any reruns that were necessary. The 2 sets were not selected based on any phenotype characteristics, but rather, on convenience related to DNA readiness. Quality control screens and imputation were conducted with similar procedures in both sets.

Statistical analysis

Untransformed protein C values were analyzed. Four participants with values > 5.5 standard deviations from the mean were excluded, resulting in 8048 participants in the GWA scan discovery set. The distribution of protein C was approximately normal (skewness = 0.61, kurtosis = 0.89). The genetic association analysis was conducted in a linear regression model with ProbABEL v.0.1-0,22  which uses “allele dosage” for each SNP as a predictor assuming an additive genetic effect (http://mga.bionet.nsc.ru/∼yurii/ABEL/). The analysis was adjusted for age, gender and field center to reduce nongenetic variation in the distribution of protein C levels. A linear relationship was assumed between age and protein C and this assumption held for the ARIC data. Age and gender were significantly associated with protein C (P < .0001) and explained approximately 4% of its variation. The a priori threshold of P < 5.0 × 10−8 was used to judge genome-wide significance for SNP associations. When more than 1 SNP clustered at a region, we conducted conditional analyses to additionally adjust for the top SNP from that region; if there remained significant signals after the conditional analysis, the top SNP after the first adjustment was added to the model until there were no significant signals. In addition, linkage disequilibrium (LD) between SNPs, represented by r2, was used to evaluate the independence of associations from a region. Independent SNPs identified from the GWA scan discovery set were tested for replication in the additional sample using the same analytic approach. Finally, the program FASTSNP (http://fastsnp.ibms.sinica.edu.tw/pages/input_CandidateGeneSearch.jsp) was used to predict the impacts of the identified variants on the structure and function of proteins coded by the corresponding variants.23 

Results

Table 1 presents selected demographic and lifestyle characteristics for the GWA scan discovery and replication samples. Manhattan and quantile-quantile (Q-Q) plots of P value distribution from the GWA scan are shown in Figure 1 and supplemental Figure 1 (available on the Blood Web site; see the Supplemental Materials link at the top of the online article), respectively. The genomic inflation lambda coefficient was 1.04, suggesting negligible test statistic inflation by potential population stratification or other technical factors. A total of 504 SNPs from multiple genes exceeded the genome-wide significance threshold of 5 × 10−8 and marked 3 regions: chromosomes 2p23 (spanning 204 000 bp), 2q13-q14 (spanning 448 000 bp), and 20q11 (spanning 3.3 million bp). Detailed association results for the 504 SNPs are presented in supplemental Table 1. Details of the top SNP associations at the 3 regions are presented in Table 2.

Table 1

Characteristics of participants in the GWA scan discovery and replication samples in ARIC in 1987-1989

Characteristic GWA scan discovery sample Replication sample 
8048 1376 
Age, years 54.3 ± 5.7 54.3 ± 5.7 
Female, % 52.9 53.2 
Body mass index, kg/m2 27.0 26.7 
Prevalent coronary heart disease, % 5.0 4.3 
Hypertension*, % 27.0 24.9 
Diabetes, % 8.5 9.5 
Current smoker, % 25.1 21.8 
Current alcohol drinker, % 66.3 61.7 
Total cholesterol, mg/dL 214.7 ± 40.7 215.7 ± 41.6 
Triglycerides, mg/dL 137.1 ± 91.6 137.8 ± 94.9 
Protein C, μg/mL 3.2 ± 0.6 3.2 ± 0.6 
Median protein C (IQR), μg/mL 3.1 (2.8-3.5) 3.1 (2.8-3.6) 
Characteristic GWA scan discovery sample Replication sample 
8048 1376 
Age, years 54.3 ± 5.7 54.3 ± 5.7 
Female, % 52.9 53.2 
Body mass index, kg/m2 27.0 26.7 
Prevalent coronary heart disease, % 5.0 4.3 
Hypertension*, % 27.0 24.9 
Diabetes, % 8.5 9.5 
Current smoker, % 25.1 21.8 
Current alcohol drinker, % 66.3 61.7 
Total cholesterol, mg/dL 214.7 ± 40.7 215.7 ± 41.6 
Triglycerides, mg/dL 137.1 ± 91.6 137.8 ± 94.9 
Protein C, μg/mL 3.2 ± 0.6 3.2 ± 0.6 
Median protein C (IQR), μg/mL 3.1 (2.8-3.5) 3.1 (2.8-3.6) 

Data are stated as mean ± SD or percentage, unless otherwise stated.

*

Hypertension was defined based on systolic blood pressure ≥ 140 mm Hg, diastolic blood pressure ≥ 90 mm Hg, or treatment for hypertension.

Diabetes defined as fasting glucose ≥ 126 mg/dL, nonfasting glucose ≥ 200 mg/dL, self-reported physician diagnosis of diabetes, or treatment for diabetes.

Figure 1

Manhattan plot showing the genome-wide −log10P values against physical position for protein C concentration. The y-axis is truncated at 50 and 53 SNPs on chromosome 20 are above this limit.

Figure 1

Manhattan plot showing the genome-wide −log10P values against physical position for protein C concentration. The y-axis is truncated at 50 and 53 SNPs on chromosome 20 are above this limit.

Table 2

Top SNP associations for protein C at genome-wide significance level (P < 5.0 × 10−8 based on the GWA scan discovery set or discovery + replication sets)

SNP Position Region Gene Function A1/A2 AFA1 GWA scan discovery
 
Replication
 
Imput 
β/SE P Var% β/SE P 
rs1260326 27584432 2p23 GCKR cns C/T 0.59 0.082/0.010 2.04 × 10−17 0.85 0.059/0.023 .010 0.98 
rs1158867 127893824 2q13-q14 PROC intron T/C 0.58 −0.123/0.010 3.77 × 10−36 1.94 −0.154/ 0.023 7.83 × 10−11 0.94 
rs1799810 127892480 2q13-q14 PROC utr A/T 0.58 −0.123/0.010 4.35 × 10−36 1.93 −0.154/0.023 7.83 × 10−11 0.94 
rs8119351 33218064 20q11 Interg – G/A 0.90 0.480/0.015 2.68 × 10−203* 10.9 0.492/0.035 3.42 × 10−41 0.99 
rs867186 33228208 20q11.2 PROCR cns T/C 0.90 0.468/0.015 2.00 × 10−200* 10.4 0.491/0.035 3.02 × 10−41 – 
rs6120849 33194048 20q11.22 EDEM2 intron C/T 0.77 −0.141/0.011 7.19 × 10−37 1.85 −0.121/0.027 6.70 × 10−6 – 
rs17145713 72542746 7q11.23 BAZ1B intron C/T 0.80 −0.062/0.012 2.50 × 10−7 0.33 −0.079/0.029 .007 0.99 
SNP Position Region Gene Function A1/A2 AFA1 GWA scan discovery
 
Replication
 
Imput 
β/SE P Var% β/SE P 
rs1260326 27584432 2p23 GCKR cns C/T 0.59 0.082/0.010 2.04 × 10−17 0.85 0.059/0.023 .010 0.98 
rs1158867 127893824 2q13-q14 PROC intron T/C 0.58 −0.123/0.010 3.77 × 10−36 1.94 −0.154/ 0.023 7.83 × 10−11 0.94 
rs1799810 127892480 2q13-q14 PROC utr A/T 0.58 −0.123/0.010 4.35 × 10−36 1.93 −0.154/0.023 7.83 × 10−11 0.94 
rs8119351 33218064 20q11 Interg – G/A 0.90 0.480/0.015 2.68 × 10−203* 10.9 0.492/0.035 3.42 × 10−41 0.99 
rs867186 33228208 20q11.2 PROCR cns T/C 0.90 0.468/0.015 2.00 × 10−200* 10.4 0.491/0.035 3.02 × 10−41 – 
rs6120849 33194048 20q11.22 EDEM2 intron C/T 0.77 −0.141/0.011 7.19 × 10−37 1.85 −0.121/0.027 6.70 × 10−6 – 
rs17145713 72542746 7q11.23 BAZ1B intron C/T 0.80 −0.062/0.012 2.50 × 10−7 0.33 −0.079/0.029 .007 0.99 

A1 indicates allele 1 (major allele); A2, allele 2 (minor allele); AFA1, allele frequency for A1; β, change in protein C level per 1-allele increase in the minor allele for both GWA scan and replication analyses; SE, standard error; Var%, percentage of variance explained by the SNP; imput, ratio of observed to expected variance as a measure of imputation quality (– for genotyped SNPs); cns, coding-nonsynonymous; utr, within an exon but not translated; and interg, intergenic.

*

r2 = 1.0 between rs8119351 and rs867186 in HapMap Ceu.

P = 5.23 × 10−17 after adjustment for rs867186, r2 = 0.022 for both rs6120849-rs8119351 and rs6120849-rs867186 in HapMap-Ceu.

P = 2.83 × 10−8 in the pooled GWA scan of discovery and replication sets.

Twenty-eight SNPs at the 2p23 region, covering 5 genes (supplemental Figure 2), were associated with plasma protein C levels at P < 5 × 10−8. The strongest signal was observed for rs1260326, a coding-nonsynonymous SNP in exon 1 of the glucokinase (hexokinase 4) regulatory protein (GCKR or GKRP) gene encoding a leucine to proline substitution (P446L). Each copy of the minor T allele was associated with a 0.082 μg/mL greater plasma protein C concentration (P = 2.04 × 10−17, 0.85% variance explained; Table 2). Adjustment for rs1260326 abolished the associations for the remaining 27 SNPs (smallest P > .05). The signal for rs1260326 was replicated in the replication sample (P = .010; Table 2).

At the 2q13-q14 region, 112 SNPs reached the genome-wide significance and covered 6 genes (supplemental Figure 3). Of the 112 SNPs, 2 are coding-synonymous and none are nonsynonymous. The strongest association was observed for a locus marked by SNP rs1158867 (Table 2), which is intronic to the protein C structure gene (PROC). Each copy of the minor C allele was associated with a 0.123 μg/mL lower plasma protein C concentration (P = 3.77 × 10−36, 1.94% variance explained; Table 2). Another SNP rs1799810, located within an exon of PROC but not translated, showed similar signal as rs1158867 (β = −0.123, P = 4.35 × 10−36, 1.93% variance explained). This SNP is in high LD with rs1158867 (r2 = 0.85 and 0.99 in HapMap-CEU and ARIC, respectively). After adjusting for rs1158867, none of the remaining 111 SNPs was significant at the genome-wide level (smallest adjusted P = .03). The associations for both rs1158867 and rs1799810 were strongly replicated in the additional sample (Table 2).

At the 20q11 region, 364 SNPs covering 40 genes exceeded the genome-wide significance threshold of P < 5 × 10−8 (supplemental Figure 4). Of the 364 SNPs, the top 4 are located within a 0.8 kb window and showed similar signals: rs8119351 (intergenic, P = 2.68 × 10−203; Table 2), rs2069940 (near 5′ of protein C receptor (PROCR) or endothelial protein C receptor (EPCR) gene, P = 1.24 × 10−201), rs867186 (coding-nonsynonymous in PROCR, S219G substitution, P = 2.00 × 10−200; Table 2), and rs11167260 (intergenic, P = 6.78 × 10−202). The missense variant rs867186 is in high LD with the other 3 SNPs (r2 = 1.0 and 0.95 in HapMap-CEU and ARIC, respectively). Therefore, the signals represented by the 4 SNPs may be attributable to the single signal from rs867186. This SNP was associated with a 0.468 μg/mL higher plasma protein C level per minor C allele and explained 10.4% of its variance (Table 2). In conditional analysis adjusting for rs867186, 37 SNPs (covering 5 genes) remained statistically significant at P < 5 × 10−8 (supplemental Table 2, supplemental Figure 5). Thirty-four of these SNPs were not in LD with rs867186 (r2 < 0.05 in HapMap-CEU); the other 3 were in low LD (r2: 0.08-0.32 in HapMap-CEU). Of the 37 SNPs, the strongest signal in the conditional analysis was observed for rs6120849 associated with 0.141 (P = 7.19 × 10−37, 1.85% variance explained) and 0.089 μg/mL (P = 5.23 × 10−17, 0.74% variance explained) lower protein C level per minor T allele before and after the adjustment, respectively. This SNPs is intronic to the endoplasmic reticulum (ER) degradation enhancer, mannosidase alpha-like 2 (EDEM2) gene and not linked with rs867186 (r2 = 0.022 and 0.029 in HapMap-CEU and ARIC, respectively). Notably, rs6120849 is in a moderate LD (r2 = 0.54) with a missense mutation in EDEM2: rs3746429 (T456A substitution). Rs3746429 was also significantly associated with protein C concentration (P = 1.25 × 10−27 and 3.481 × 10−13 before and after adjusting for rs867186). Including both rs867186 and rs6120849 as covariates in the analyses yielded no further signals at the genome-wide significant level (smallest adjusted P = .000012). Replacing rs867186 by rs8119351 in the above conditional analysis yielded similar results with a minor change in the SNP ranking: the second top SNP rs6060266, another intronic variant in EDEM2, became the top one and rs6120849 moved to the 8th. Rs6120849, rs6060266, rs3746429, and the top 4 SNPs within or near the PROCR gene were strongly replicated in the additional sample (P < 10−5; Table 2 shown for rs8119351, rs867186, and rs6120849). The signals for the 3 SNPs in EDEM2 remained significant in the replication analysis after additional adjusting for rs867186 (P = .04, .04, and .005 for rs6120849, rs6060266, and rs3746429, respectively).

Furthermore, a GWA scan based on the pooled discovery and replication sets yielded an additional locus at chromosome 7q11.23. Seven SNPs from this region reached the genome-wide significance (supplemental Figure 6, supplemental Table 1). Of the 7, 5 are in the bromodomain adjacent to zinc finger domain 1B (BAZ1B) gene and 2 intergenic. The top 2 SNPs, rs17145713 and rs1178977, are intronic variants in BAZ1B: β ± standard error = −0.063 ± 0.011, P = 2.83 × 10−8, 0.33% variance explained for both SNPs. All the 7 SNPs showed suggestive signal in the GWA scan of the discovery set (P < 2 × 10−6) and replicated in the additional sample (P < .007, rs17145713 shown in Table 2).

Discussion

To the best of our knowledge, this is the first report of a GWA scan for plasma protein C levels in European Americans, based on a GWA scan discovery set of 8048 subjects and an independent replication sample of 1376 subjects. We identified genome-wide significant signals from novel loci (GCKR, EDEM2, and BAZ1B) as well as from candidate genes known to play a role in protein C regulation (PROC and PROCR). Of the 5 independent associations, 4 were replicated in the 1376 ARIC participants not included in the GWA scan discovery set.

The first novel locus is the region marked by the variants from the GCKR gene. GCKR has not been previously implicated in the regulation of protein C, nor have its genetic variants been associated with plasma protein C levels. Interestingly, rs1260326 (P446L), the top SNP in the GCKR gene for protein C levels, has previously been associated with circulating levels of C-reactive protein (CRP),24  triglycerides,25  fasting glucose,25  and factor VII (FVII) antigen/activity26  from other GWA scan reports. The minor allele T, associated with a higher plasma protein C level, was associated with higher levels of CRP,24  triglycerides,25  FVII antigen/activity,26  and lower fasting glucose.25  FASTSNP predicted the rs1260326 variant to break the exonic splicing site with moderate to high risk. A study of 21 Spanish extended families reported significant genetic correlation (0.42) between plasma levels of protein C and FVII.27  Therefore, the common associations of rs1260326 with FVII and protein C could be due to pleiotropic effects of the GCKR gene. The protein encoded by GCKR inhibits glucokinase (hexokinase 4) in liver and pancreatic islet cells. It may also serve as an anchor to sequester glucokinase in the hepatocyte nucleus under fasting conditions, which provides a protective mechanism for glucokinase degradation.28  In a GCKR knock-out mouse, there was a loss of both glucokinase protein and activity in the hepatocytes of the mutant mouse, possibly due to the disruption of nucleus sequestration.28  Glucokinase catalyzes the initial step in utilization of glucose by the pancreatic β cell and liver, providing glucose-6-phosphate for the synthesis of glycogen. Both protein C and FVII are vitamin K-dependent glycoproteins synthesized in the liver, and bear substantial sequence and structural homology.1,27,29  One of the key posttranslational modifications for protein C and FVII is glycosylation at several residuals, which requires glucose.30,31  We speculate that GCKR may exert its pleiotropic influence on protein C and FVII by modulating the use of glucose by liver during the glycosylation process.

Variants from EDEM2 marked the second novel locus for plasma protein C levels. This locus emerged from conditional analysis after adjusting for rs867186, a missense variant in PROCR and one of the top SNPs at this region. Rs867186 was directly genotyped in ARIC, is in HW equilibrium, and had similar MAF as in HapMap-CEU and other populations of European ancestry.15,26  Therefore, residual signal due to genotyping error in rs867186 is an unlikely explanation for the remaining associations at this region. Moreover, adjustment for rs8119351, which was tightly linked with rs867186 and excellently imputed (imputation quality score = 0.99), yielded similar results. Search in FASTSNP for the top SNP in EDEM2 (rs6120849) returned with “no known function” while rs3746429, the T456A substitution that was in moderate LD with rs6120849, was predicted as a conservative missense variant involved in splicing regulation. The EDEM2 gene has not been reported previously as a candidate gene for protein C level. The protein product encoded by EDEM2 is a member of the EDEM family involved in ER-associated degradation (ERAD) of glycoproteins in which misfolded glycoproteins are retrotranslocated from ER to the cytosol and degraded by the proteasome.32,33  Up-regulation of EDEM2 accelerates the ERAD of terminally misfolded glycoproteins.33  In Chinese hamster ovary cells transfected with protein C mutants, cotransfection of EDEM accelerated the degradation of glycosylated protein C.29  Therefore, it is possible that EDEM2 may influence protein C levels by modulating its degradation.

Variants from BAZ1B marked the third novel locus that reached the genome-wide significance in the combined analysis of the discovery and replication sets. The gene product BAZ1B is an enzyme that plays a central role in chromatin remodeling and is also involved in the modulation of transcription. This enzyme has not yet been previously related to protein C and it might influence protein C levels by regulating its transcription. Interestingly, the top SNP in this gene (rs17145713) has previously been associated with triglyceride levels,34  suggesting the possibility of a pleiotropic effect. Nevertheless, the signals detected at the BAZ1B region need to be replicated in independent populations.

The top PROC SNPs identified from our study, rs1158867 and rs1799810, have not been reported previously in other genetic studies of plasma protein C levels. However, another 2 SNPs, rs1799808 and rs1799809, which are 5′ near the PROC gene, were previously associated with plasma protein C levels.9  In our study, rs1799808 was associated with protein C to a lesser extent (β = −0.085, P = 6.03 × 10−17) than rs1158867; the signal for this SNP was no longer significant after adjustment for rs1158867 (adjusted P = .51). There was modest LD between rs1158867 and rs1799808 (r2 = 0.26 in HapMap-CEU). The other reported SNP rs1799809 was not included in our GWA scan dataset, but has been found in high LD with the top 2 SNPs of our study (r2 = 0.95 with both rs1799810 and rs1158867 in HapMap-CEU). It is unknown whether rs1799809 or the top SNPs identified in our study was responsible for the observed associations at the PROC region. In FASTSNP, the 2 SNPs (rs1799808 and rs1799809) reported by other studies were predicted to have “no known function” while rs1158867 and rs1799810 identified in our study were predicted to break a consensus splicing site sequence with moderate risk.

Of the top 4 SNPs that showed similar signals at the 20q11 region, rs867186 is the only functional variant, resulting in a serine to glycine substitution at position 219 of the PROCR (ie, EPCR) protein (ie, S219G). This SNP explained 10.4% of variation in plasma protein C level. Because this variant is tightly linked with the other 3 top SNPs, it is possible that the signals showed by the first independent locus at this region are mainly driven by a single signal from rs867186. FASTSNP predicted the S219G change to be missense conservative with similar protein structure characteristics, or a splicing regulation with low to moderate risk. The association for rs867186 agreed with a previous report in which rs867186 was significantly associated with plasma protein C in 336 European-Americans, explaining 13% of its phenotypic variation.15  Interestingly, in that study the same allele that increased plasma protein C level was also strongly and positively associated with plasma levels of soluble EPCR, explaining 75% of its phenotypic variation.15  EPCR serves as a receptor for activated protein C and further enhances its activation. It was speculated that the PROCR S219G associates with plasma protein C level because soluble EPCR might be able to stabilize circulating protein C by binding to it15 ; another possibility is that increased shedding of the EPCR from the endothelial surface due to the influence of this variant results in less cell-bound EPCR to bind protein C, leading to higher levels of protein C in the circulation. More interestingly, rs867186 was also associated with plasma FVII antigen and activity in 2 other studies.26,35 

In conclusion, we report the first GWA study for plasma protein C level in a large sample of European Americans. We identified 5 independent loci associated with plasma levels of protein C, marked by GCKR, EDEM2, BAZ1B, PROC, and PROCR. Variants in GCKR, EDEM2, and BAZ1B are newly identified loci that have not been reported previously for association with protein C. Moreover, the top SNPs in GCKR and PROCR were also reported for FVII antigen/activity in other studies, suggesting pleiotropic effects. These findings provide a greater understanding of physiologic mechanisms in protein C regulation, potentially improving the prevention and treatment of disorders in which protein C deficiency is implicated.

The online version of this article contains a data supplement.

The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

Acknowledgments

We thank the University of Minnesota Supercomputing Institute for use of the blade supercomputers. The authors thank the staff and participants of the ARIC study for their important contributions.

This work was supported by National Heart, Lung, and Blood Institute contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, and N01-HC-55022, and grants R01-HL-087641, R01-HL-59367, and R01-HL-086694; National Human Genome Research Institute contract U01-HG-004402; and the National Institutes of Health (NIH) contract HHSN268200625226C. The infrastructure was partly supported by grant UL1-RR-025005, a component of the NIH and NIH Roadmap for Medical Research. The Longitudinal Investigation of Thromboembolism Etiology was funded by grant R01-HL59367. Part of the work was supported by grant R01-HL095603.

National Institutes of Health

Authorship

Contribution: W.T., M.C., E.B., and A.R.F. designed the research; N.A., E.B., and A.R.F. collected the data; W.T., S.B., X.K., J.S.P., and A.T. analyzed and interpreted the data; W.T. wrote the manuscript; and all authors edited the manuscript for scientific content.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Weihong Tang, Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, 1300 South Second St, WBOB 300, Minneapolis, MN 55454; e-mail: tang0097@umn.edu.

References

References
1
Tuddenham
 
EGD
Cooper
 
DN
The Molecular Genetics of Haemostasis and Its Inherited Disorders
1994
New York, NY
Oxford University Press
2
Broekmans
 
AW
Veltkamp
 
JJ
Bertina
 
RM
Congenital protein C deficiency and venous thromboembolism: a study of three Dutch families.
N Engl J Med
1983
, vol. 
309
 
6
(pg. 
340
-
344
)
3
Griffin
 
JH
Evatt
 
B
Zimmerman
 
TS
Kleiss
 
AJ
Wideman
 
C
Deficiency of protein C in congenital thrombotic disease.
J Clin Invest
1981
, vol. 
68
 
5
(pg. 
1370
-
1373
)
4
Emmerich
 
J
Vossen
 
CY
Callas
 
PW
, et al. 
Chronic venous abnormalities in symptomatic and asymptomatic protein C deficiency.
J Thromb Haemost
2005
, vol. 
3
 
7
(pg. 
1428
-
1431
)
5
Lane
 
DA
Mannucci
 
PM
Bauer
 
KA
, et al. 
Inherited thrombophilia: part 1.
Thromb Haemost
1996
, vol. 
76
 
5
(pg. 
651
-
662
)
6
Koster
 
T
Rosendaal
 
FR
Briet
 
E
, et al. 
Protein C deficiency in a controlled series of unselected outpatients: an infrequent but clear risk factor for venous thrombosis (Leiden Thrombophilia Study).
Blood
1995
, vol. 
85
 
10
(pg. 
2756
-
2761
)
7
Folsom
 
AR
Aleksic
 
N
Wang
 
L
Cushman
 
M
Wu
 
KK
White
 
RH
Protein C, antithrombin, and venous thromboembolism incidence: a prospective population-based study.
Arterioscler Thromb Vasc Biol
2002
, vol. 
22
 
6
(pg. 
1018
-
1022
)
8
Smith
 
NL
Hindorff
 
LA
Heckbert
 
SR
, et al. 
Association of genetic variations with nonfatal venous thrombosis in postmenopausal women.
JAMA
2007
, vol. 
297
 
5
(pg. 
489
-
498
)
9
Pomp
 
ER
Doggen
 
CJ
Vos
 
HL
Reitsma
 
PH
Rosendaal
 
FR
Polymorphisms in the protein C gene as risk factor for venous thrombosis.
Thromb Haemost
2009
, vol. 
101
 
1
(pg. 
62
-
67
)
10
Jackson
 
CJ
Xue
 
M
Activated protein C–an anticoagulant that does more than stop clots.
Int J Biochem Cell Biol
2008
, vol. 
40
 
12
(pg. 
2692
-
2697
)
11
Souto
 
JC
Almasy
 
L
Borrell
 
M
, et al. 
Genetic determinants of hemostasis phenotypes in Spanish families.
Circulation
2000
, vol. 
101
 
13
(pg. 
1546
-
1551
)
12
Warren
 
DM
Soria
 
JM
Souto
 
JC
, et al. 
Heritability of hemostasis phenotypes and their correlation with type 2 diabetes status in Mexican Americans.
Hum Biol
2005
, vol. 
77
 
1
(pg. 
1
-
15
)
13
Spek
 
CA
Koster
 
T
Rosendaal
 
FR
Bertina
 
RM
Reitsma
 
PH
Genotypic variation in the promoter region of the protein C gene is associated with plasma protein C levels and thrombotic risk.
Arterioscler Thromb Vasc Biol
1995
, vol. 
15
 
2
(pg. 
214
-
218
)
14
Aiach
 
M
Nicaud
 
V
Alhenc-Gelas
 
M
, et al. 
Complex association of protein C gene promoter polymorphism with circulating protein C levels and thrombotic risk.
Arterioscler Thromb Vasc Biol
1999
, vol. 
19
 
6
(pg. 
1573
-
1576
)
15
Reiner
 
AP
Carty
 
CL
Jenny
 
NS
, et al. 
PROC, PROCR and PROS1 polymorphisms, plasma anticoagulant phenotypes, and risk of cardiovascular disease and mortality in older adults: the Cardiovascular Health Study.
J Thromb Haemost
2008
, vol. 
6
 
10
(pg. 
1625
-
1632
)
16
The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives
The ARIC investigators.
Am J Epidemiol
1989
, vol. 
129
 
4
(pg. 
687
-
702
)
17
Chambless
 
LE
McMahon
 
R
Wu
 
K
Folsom
 
A
Finch
 
A
Shen
 
YL
Short-term intraindividual variability in hemostasis factors. The ARIC Study. Atherosclerosis Risk in Communities Intraindividual Variability Study.
Ann Epidemiol
1992
, vol. 
2
 
5
(pg. 
723
-
733
)
18
Psaty
 
BM
O'Donnell
 
CJ
Gudnason
 
V
, et al. 
Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: Design of prospective meta-analyses of genome-wide association studies from 5 cohorts.
Circ Cardiovasc Genet
2009
, vol. 
2
 
1
(pg. 
73
-
80
)
19
Price
 
AL
Patterson
 
NJ
Plenge
 
RM
Weinblatt
 
ME
Shadick
 
NA
Reich
 
D
Principal components analysis corrects for stratification in genome-wide association studies.
Nat Genet
2006
, vol. 
38
 
8
(pg. 
904
-
909
)
20
Li
 
Y
Abecasis
 
GR
Mach 1.0: rapid haplotype reconstruction and missing genotype inference.
Am J Hum Genet
2006
, vol. 
S79
 pg. 
2290
 
21
de Bakker
 
PI
Ferreira
 
MA
Jia
 
X
Neale
 
BM
Raychaudhuri
 
S
Voight
 
BF
Practical aspects of imputation-driven meta-analysis of genome-wide association studies.
Hum Mol Genet
2008
, vol. 
17
 
R2
(pg. 
R122
-
128
)
22
Aulchenko
 
YS
Ripke
 
S
Isaacs
 
A
van Duijn
 
CM
GenABEL: an R library for genome-wide association analysis.
Bioinformatics
2007
, vol. 
23
 
10
(pg. 
1294
-
1296
)
23
Yuan
 
HY
Chiou
 
JJ
Tseng
 
WH
, et al. 
FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization.
Nucleic Acids Res
2006
, vol. 
34
 (pg. 
W635
-
641
(Web Server issue)
24
Ridker
 
PM
Pare
 
G
Parker
 
A
, et al. 
Loci related to metabolic-syndrome pathways including LEPR, HNF1A, IL6R, and GCKR associate with plasma C-reactive protein: the Women's Genome Health Study.
Am J Hum Genet
2008
, vol. 
82
 
5
(pg. 
1185
-
1192
)
25
Orho-Melander
 
M
Melander
 
O
Guiducci
 
C
, et al. 
Common missense variant in the glucokinase regulatory protein gene is associated with increased plasma triglyceride and C-reactive protein but lower fasting glucose concentrations.
Diabetes
2008
, vol. 
57
 
11
(pg. 
3112
-
3121
)
26
Smith
 
NL
Chen
 
MH
Dehghan
 
A
, et al. 
Novel associations of multiple genetic loci with plasma levels of factor VII, factor VIII, and von Willebrand factor: The CHARGE (Cohorts for Heart and Aging Research in Genome Epidemiology) Consortium.
Circulation
2010
, vol. 
121
 
12
(pg. 
1382
-
1392
)
27
Souto
 
JC
Almasy
 
L
Blangero
 
J
, et al. 
Genetic regulation of plasma levels of vitamin K-dependent proteins involved in hematostatis: results from the GAIT Project. Genetic Analysis of Idiopathic Thrombophilia.
Thromb Haemost
2001
, vol. 
85
 
1
(pg. 
88
-
92
)
28
Farrelly
 
D
Brown
 
KS
Tieman
 
A
, et al. 
Mice mutant for glucokinase regulatory protein exhibit decreased liver glucokinase: a sequestration mechanism in metabolic regulation.
Proc Natl Acad Sci U S A
1999
, vol. 
96
 
25
(pg. 
14511
-
14516
)
29
Nishio
 
M
Koyama
 
T
Nakahara
 
M
Egawa
 
N
Hirosawa
 
S
Proteasome degradation of protein C and plasmin inhibitor mutants.
Thromb Haemost
2008
, vol. 
100
 
3
(pg. 
405
-
412
)
30
Griffin
 
JH
Fernandez
 
JA
Gale
 
AJ
Mosnier
 
LO
Activated protein C.
J Thromb Haemost
2007
, vol. 
5
 
Suppl 1
(pg. 
73
-
80
)
31
Lazarus
 
RA
Olivero
 
AG
Eigenbrot
 
C
Kirchhofer
 
D
Inhibitors of tissue factor: factor VIIa for anticoagulant therapy.
Curr Med Chem
2004
, vol. 
11
 
17
(pg. 
2275
-
2290
)
32
Mast
 
SW
Diekman
 
K
Karaveg
 
K
Davis
 
A
Sifers
 
RN
Moremen
 
KW
Human EDEM2, a novel homolog of family 47 glycosidases, is involved in ER-associated degradation of glycoproteins.
Glycobiology
2005
, vol. 
15
 
4
(pg. 
421
-
436
)
33
Olivari
 
S
Galli
 
C
Alanen
 
H
Ruddock
 
L
Molinari
 
M
A novel stress-induced EDEM variant regulating endoplasmic reticulum-associated glycoprotein degradation.
J Biol Chem
2005
, vol. 
280
 
4
(pg. 
2424
-
2428
)
34
Talmud
 
PJ
Drenos
 
F
Shah
 
S
, et al. 
Gene-centric association signals for lipids and apolipoproteins identified via the HumanCVD BeadChip.
Am J Hum Genet
2009
, vol. 
85
 
5
(pg. 
628
-
642
)
35
Ireland
 
HA
Cooper
 
JA
Drenos
 
F
, et al. 
FVII, FVIIa, and downstream markers of extrinsic pathway activation differ by EPCR Ser219Gly variant in healthy men.
Arterioscler Thromb Vasc Biol
2009
, vol. 
29
 
11
(pg. 
1968
-
1974
)