Epigenome-wide meta-analysis of DNA methylation differences in prefrontal cortex implicates the immune processes in Alzheimer’s disease

Study cohort characteristics

Our meta-analysis included 1030 prefrontal cortex brain samples from four independent cohorts (Table 1), previously described in the ROSMAP14, Mt. Sinai15, London8, and Gasparoni16 methylation studies. To assess the diagnostic utility of DNA methylation as clinical biomarkers, we also compared methylation differences in brain samples with premortem whole blood samples from a subset of subjects in the London cohort. Among the four cohorts, the mean age at death ranged from 73.6 years to 86.3 years, and the percentage of females ranged from 51.8 to 63.5%.

Table 1 Sample characteristics of the brain and blood cohorts included in the meta-analysis.

Meta-analysis identified methylation differences significantly associated with AD Braak stage at individual CpGs and co-methylated genomic regions

Adjusting for estimated cell-type proportions (i.e., the proportion of neurons), age at death, sex, and batch effects, our meta-analysis of single CpGs in the four cohorts identified 3979 statistically significant individual CpGs at 5% false discovery rate (FDR). After eliminating CpGs associated with smoking17 or overlapping with cross-reactive probes18, we obtained 3751 significant CpGs (Supplementary Data 1), of which 47.8% is located in noncoding regions. This proportion is lower than the proportion of noncoding probes (53.2%) on the array (P-value = 2.18 × 10−11). Among the 3751 CpGs, 339 also reached genome-wide significance level (see details “Genomic inflation and sensitivity analysis” section).

The DMRs were identified by both coMethDMR13 and comb-p12 software. In the coMethDMR approach, we tested 40,010 pre-defined genomic regions to identify co-methylated and differentially methylated regions associated with Braak stage, adjusting for estimated neuron proportions, age at death, sex, and batch effects for each cohort separately. Next, we combined the cohort-specific P-values for these genomic regions using inverse-variance weighted regression models for meta-analysis. Alternatively, in the comb-p approach, we used meta-analysis P-values of individual CpGs as input, and comb-p was then used to scan the genome for regions enriched with a series of adjacent low P-values (Fig. 1).

Fig. 1

Workflow of meta-analysis for individual CpGs and DMRs.

The coMethDMR and comb-p based meta-analysis approaches identified 478 and 187 significant DMRs associated with AD Braak stage, respectively, with 143 being identified by both methods. After eliminating those DMRs containing cross-reactive probes18 or smoking-associated probes17, we obtained 119 co-methylated DMRs at 5% FDR (Supplementary Data 2). The average number of CpGs per DMR is 5.16 ± 2.80 CpGs. Notably, 118 out of the 119 DMRs included FDR significant individual CpGs. On the other hand, only 421 out of the 3751 FDR significant individual CpGs overlapped with the FDR significant DMRs. Therefore, methylation differences at individual CpGs and DMRs did not completely overlap, so analyzing both individual CpGs and DMRs provided a more complete picture of the Braak-associated methylome in the prefrontal cortex. Our final set of DNA methylation differences included these 3751 individual CpGs along with the 119 DMRs (Fig. 2). The top 20 most significant CpGs and DMRs are shown in Tables 2 and 3, respectively. As previous studies have noted8,14,19,20, in both individual CpG and DMR analyses, we observed that the majority of the significant methylation differences were hyper-methylated in AD, for which methylation levels were increased as AD stage increased. More specifically, 58.6% of significant CpGs and 73.9% of significant DMRs were hyper-methylated in AD (Supplementary Data 1 and 2).

Fig. 2: Manhattan plot of significant methylation differences in individual CpGs and DMRs identified in meta-analysis using inverse-variance weighted regression models.

The X-axis indicates chromosomes 1–22 and the Y-axis indicates −log10 (P-value), with the horizontal red line indicating a 5% FDR (false discovery rate) adjusting for multiple comparisons.

Table 2 Top 20 most significant differentially methylated CpGs associated with Braak stage in meta-analysis.
Table 3 Top 20 most significant differentially methylated regions (DMRs) associated with Braak stage identified by both coMethDMR and comb-p in meta-analysis.

Enrichment analysis of significant DNA methylation differences in pathological AD highlights immune-related processes and polycomb repressive complex 2 (PRC2)

The probes on the Illumina 450k array are annotated according to their locations with respect to genes (TSS1500, TSS200, 5′UTR, 1st Exon, gene body, 3′UTR, and intergenic) or to CpG islands (island, shore, shelf, and open sea). We tested enrichment of the significant methylation differences associated with pathological AD in these different types of genomic features by analyzing individual CpGs and DMRs separately using Fisher’s exact test. Interestingly, hypermethylation in AD at individual CpGs and DMRs was enriched in different features of genes across the genome (Fig. 3a, b and Supplementary Data 3). Significant hypermethylated individual CpGs were over-represented in CpG island and shore but under-represented in open sea and shelf (Fig. 3a). In contrast, the hypermethylated DMRs were under-represented in CpG island but enriched in open sea and shelf (Fig. 3b). In terms of genic features, the hypermethylated individual CpGs were enriched in 1st Exon, 5′UTR, and gene body but under-represented in intergenic regions and TSS200 (Fig. 3a). On the other hand, the hyper-methylated DMRs were only slightly under-represented in intergenic regions (Fig. 3b). In contrast, there was more agreement between hypomethylated changes at individual CpGs and DMRs. Both significant hypomethylated individual CpGs and DMRs were enriched in open sea, but depleted in CpG islands.

Fig. 3: Enrichment of CpGs significantly associated with AD Braak stage in meta-analysis of individual CpGs and DMRs at 5% FDR.

A two-sided Fisher’s test was used to determine over or under-representation of the significant CpGs in individual CpGs analysis and CpGs mapped within significant DMRs in various a, b genomic features and c, d chromatin states. ***P-value < 0.001, **P-value < 0.01, *P-value < 0.05, uncorrected for multiple comparisons.

In addition, we also compared our results with epigenomic annotations including chromatin states and transcription factor binding sites. Using combinations of histone modification marks, computational algorithms such as ChromHMM21 segment and annotate the genome with different chromatin states (repressed, poised and active promoters, strong and weak enhancers, putative insulators, transcribed regions, and large-scale repressed and inactive domains), which were shown to vary across sex, tissue type, and developmental age22. For our analysis, we used the 15-state ChromHMM annotation of the Roadmap Epigenomics project23. Our enrichment analysis with respect to chromatin states showed that significant hypermethylated DMRs and CpGs were both enriched in flanking active promoter regions (TssAFlnk), enhancers (Enh), Transcr. at gene 5′ and 3′ (TxFlnk), and polycomb repressed regions (ReprPC), but under-represented in promoter regions (TssA), strongly transcribed regions (Tx), and repressed regions (Quies, ReprPCWk). In contrast, hypomethylated DMRs and CpGs were enriched in repressed regions (Quies, ReprPCWk) and weakly transcribed regions (TxWk), but under-represented in promoter regions (TssA, TssBiv) (Fig. 3c, d and Supplementary Data 4). Notably, among the 151 significant CpGs located in the group of Hox genes on chromosome 7, the majority (135 out of 151) were located in polycomb repressed regions, and the rest in bivalent enhancer and bivalent/poised TSS regions.

Similarly, enrichment tests for regulatory elements using the LOLA software24 also supported the potential functional relevance of these significant changes in DNA methylation. In particular, the significant CpGs were enriched in the binding sites of 26 transcription factors and chromatin proteins assayed by the ENCODE project25 (Supplementary Data 5). Notably, the top hits included EZH2 and SUZ12, both are subunits of polycomb repressive complex 2 (PRC2), consistent with the observed enrichment of methylation differences in PRC2 repressed regions (Fig. 3c, d) and previous observations that DNA methylation often interact with PRC2 binding26,27,28. Another top enriched TF is PU.1, which is critical for the differentiation, proliferation, and survival of microglia, the resident macrophages of the brain29. Evidence from a recent GWAS study also suggested PU.1 as a master regulator for a number of genes associated with delayed onset of AD, including TREM2, CD33, and ABCA730.

As pathological AD-associated genes can harbor both significant individual CpGs and significant DMRs, we performed a pathway analysis by considering the significant CpGs and DMRs jointly. The test of KEGG pathways showed that hematopoietic cell lineage, phagosome, Cytokine–cytokine receptor interaction, and chemokine signaling pathways were significantly enriched with methylation differences in pathological AD at 5% FDR (Table 4). Similarly, gene ontology (GO) analysis showed strong enrichment in biological processes involving inflammatory response, immune cell differentiation, and cytokine production, recapitulating the prominence of immune processes in AD31,32. Other significant GO terms involved cellular processes previously shown to be important in AD including cell adhesion, phagocytosis, cell migration, and synapse pruning.

Table 4 Gene set enrichment analysis of significant methylation differences associated with AD Braak stage identified in meta-analysis using Wallenius’ noncentral hypergeometric test which adjusted for different number of CpGs associated with each gene.

Prioritizing significant DNA methylation differences with sample matching

DNA methylation levels are known to be influenced by aging, which is also the strongest risk factor for AD. To prioritize significant methylation differences in pathological AD and minimize confounding effects due to aging, we also explored an alternative strategy by matching each pathological AD case with a control subject of the same sex and age at death in the same cohort. We obtained a total of 346 subjects (173 cases and 173 controls) for the matched sample set from the London (n = 46), Mount Sinai (n = 56), and ROSMAP (n = 244) cohorts. We did not include the Gasparoni cohort for this analysis because too few age and sex-matched samples (n = 12) were present in this dataset. The age and sex matched samples were then analyzed in the same way as described above, except for removing age at death and sex effects in the linear models. We identified a total of 151 CpGs and 32 DMRs that were significantly different after matching cases and control samples by sex and age at death (Supplementary Datas 6 and 7). Among them, 85% (n = 129) of CpGs and 50% (n = 16) of genomic regions overlapped with the significant CpGs and DMRs in our main analysis described above with the same direction of change. In particular, we found that methylation differences at a number of AD-related genes such as HOXA3, SLC44A2, AGAP2, CDH9, and MAMSTR were significant in both analyses.

Correlation of AD-associated CpGs and DMRs methylation levels in blood and brain samples

To evaluate the diagnostic potential of the identified methylation differences, we computed Spearman rank correlations between inter-individual variations of the DNA methylation levels in the brain and blood using the London cohort dataset8, which included 69 pairs of matched brain and blood samples passing quality control (Table 1). The difference between age at pre-mortem blood draw and age at death ranged from 0 to 10 years, with an average of 3.81 ± 2.61 years. We performed both an adjusted correlation analysis based on methylation residuals (left( {r_{{rm{resid}}}} right)), in which we adjusted estimated neuron proportions for brain samples (or estimated blood cell-type proportions), array, age at death (for brain samples) or at blood draw (for blood samples), and sex, and an unadjusted correlation analysis based on beta values (left( {r_{{rm{beta}}}} right)) (Online Methods). The correlation between methylation levels in brain and blood were modest at the majority of CpG sites (mean Pearson r = 0.069, SD = 0.165), which is similar to those reported in Yu et al.33 for CD4+ lymphocytes and other previous reports33,34. Among CpGs mapped within the 119 significant DMRs (n = 728), only 11 showed moderate to strong association in brain and blood in both adjusted and unadjusted analyses (absolute (r_{{rm{beta}}} ge 0.5,;{rm{FDR}}_{{rm{beta}}} < 0.05,) absolute (r_{{rm{resid}}},ge,0.5,{rm{FDR}}_{{rm{resid}}},<,0.05)). Similarly, among the 3751 significant individual CpGs, only 39 showed moderate to strong associations (absolute (r_{{rm{beta}}},ge,0.5,;{rm{FDR}}_{{rm{beta}}},<,0.05,) absolute (r_{{rm{resid}}},ge,0.5,;{rm{FDR}}_{{rm{resid}}},<,0.05)) in brain and blood. Remarkably, all 50 CpGs showed significant positive correlations, corroborating previous analyses34 that also observed a significant negative correlation between brain and blood is relatively rare.

To further validate these brain-blood correlations, we performed an additional analysis using the BeCon software35, which computed correlations between brain and blood methylation levels in an independent cohort with 16 subjects. We compared our results from the London cohort with correlation results between whole blood and Brodmann area 10 (anterior PFC) in BeCon. Among the 50 CpGs described above, 5 CpGs (cg03765423, cg16106427, cg18776287, cg22595230, cg00445443) mapped to HOXA2, intergenic, STK32C, MRPS2, and CENPB genes also exhibited moderate to strong correlation (absolute R > 0.5) in BA10 (Supplementary Data 8). Again, all these BA10-blood correlations were positive.

Correlation of methylation levels of significant CpGs and DMRs in AD with expressions of nearby genes

Among the four cohorts of brain samples, the ROSMAP study had matched RNA-seq gene expression data and DNA methylation data available for 529 samples (428 cases and 101 controls). We therefore evaluated the role of significant DMRs or CpGs by correlating methylation levels of the significant DMRs or CpGs with the expression values of genes found in the vicinity (±250 kb from the start or end of the DMR, or location of CpG). To reduce the effect of potential confounding effects, when testing for methylation-gene expression associations, we first adjusted for age at death, sex, cell-type proportions, and batch effects in both DNA methylation and gene expression levels separately and extracted residuals from the linear models. Then we tested for association between methylation residuals and gene expression residuals, adjusting for Braak stage.

We found that out of the 118 DMRs that were linked to a gene transcript (+/−250 kb), 73 (62%) DMRs were associated with gene expression levels at 5% FDR (Supplementary Data 9). Similarly, out of the 3642 CpGs that were linked to a nearby gene (+/−250 kb), we found 652 (18%) CpGs were associated with gene expression levels at 5% FDR (Supplementary Data 10). Among the significant DMR–RNA and CpG–RNA associations, 41.1% and 49.2% were negative associations, respectively. We next compared the strengths of DMR–RNA with CpG–RNA associations using a generalized estimating equations (GEE) model where values for −log10 (P-value) from each DMR or CpG were treated as clusters (Online Methods). In general, we found the effects of DMRs on gene expression to be larger than those for single CpGs (P = 2.25 × 10−10), consistent with the notion that DMRs often have a more relevant biological role than isolated CpGs.

Correlation and co-localization with genetic susceptibility loci

To identify methylation quantitative trait loci (mQTLs) for the significant DMRs and CpGs, we tested associations between the methylation levels with nearby SNPs, using the ROSMAP study dataset (imputed to the Haplotype Reference Consortium r1.1 reference panel)36, which had matched genotype data and DNA methylation data for 688 samples. To reduce the number of tests, we focused on identifying cis mQTLs located within 500 kb from the start or end of the DMR (or position of the significant CpG)37. Among 166,797 SNPs that are associated with AD, 11,670 were also significantly associated with methylation levels, after correcting for confounding effects age, sex, cell type, batch effects and PCs in methylation data. Among the 119 DMRs and 3751 CpGs significantly associated with Braak stage, 37 DMRs and 1010 CpGs had at least one corresponding mQTL in brain samples, respectively (Supplementary Data 11 and 12).

To evaluate if the significant methylation differences overlap with genetic risk loci implicated in AD, we compared enrichment of significant CpGs and DMRs identified in this study with the 24 LD blocks of genetic variants reaching genome-wide significance in a recent AD meta-analysis3. We found that while no DMRs overlapped with the 24 LD blocks, 24 FDR significant individual CpGs overlapped with genetic variants mapped to the HLA-DRB1, TREM2, NYAP1, SPI1, MS4A2, ADAM10, ACE, and ABCA7 genes (Supplementary Data 13).

Given the observed overlap between AD pathology associated CpGs and AD genetic risk loci, we next sought to determine whether the association signals at the GWAS loci (variant to AD status as determined by clinical consensus diagnosis of cognitive status, and variant to CpG methylation levels) are due to a single shared causal variant or to distinct causal variants close to each other. To this end, we performed a co-localization analysis using the method described in Giambartolomei et al.38. The results of this co-localization analysis strongly suggested39 (i.e., PP3 + PP4 > 0.90, PP4 > 0.8 and PP4/PP3 > 5, Online Methods) that 2 out of the 24 regions included a single causal variant common to both phenotypes (i.e., AD status and CpG methylation levels). The CpGs associated with these causal variants are located on the SPI130 and ADAM1040 genes (Supplementary Data 14).

Genomic inflation and sensitivity analysis

In this study, we chose to use the false discovery rate, instead of the more stringent genome-wide significance threshold, to select significant DNA methylation differences for enrichment analysis. This was motivated by the observation that in brain disorders such as AD, the DNA methylation differences in the epigenome are often found to be modest, so they might be missed by using the more conventional genome-wide significance threshold. To assess the potential inflation in our results, we estimated genomic inflation factors using both the conventional and the bacon method41, specifically proposed for EWAS. As shown by simulation studies41, real datasets41, and theory42, the conventional genomic inflation factor (lambda or λ used interchangeably below) is dependent on the expected number of true associations. Because in a typical EWAS it is expected that small effects from many CpGs might be associated with the phenotype, the genomic inflation factor would overestimate actual test–statistic inflation. To estimate genomic inflations more accurately in EWAS, Iterson et al.41 developed a Bayesian method that estimates inflation in EWAS based on empirical null distributions, which is implemented in the Bioconductor package bacon. The estimated genomic inflation factors for individual cohorts in our study were modest and comparable to other recent EWAS profiling brain tissues (for example, Supplementary Fig. 10 in Viana et al.43 showed that lambdas for different brain regions ranged from 1.02 to 1.23). In our study, lambdas (λ) by conventional approach ranged from 1.002 to 1.249, and lambdas based on the bacon approach (left( {lambda .bacon} right)) ranged from 0.986 to 1.082 (Supplementary Fig. 1). In particular, for the ROSMAP cohort with the largest sample size, genomic inflation factors were close to 1 by both approaches (λ = 1.016, λ.bacon = 1.002) Genomic inflation factors for the meta-analysis were λ = 1.264 and λ.bacon = 1.086.

In addition, we conducted sensitivity analyses to evaluate the impact of inflation on our enrichment analysis results. To this end, we performed inflation correction for single cohort effect sizes and standard errors, and then meta-analyzed the bacon-corrected effect sizes and standard errors. Enrichment analysis was then conducted for the 2767 CpGs that reached FDR significance and the 339 CpGs that reached the 2.4 × 10−7 genome-wide significance level44 (Supplementary Data 15). Our results showed that the functional enrichment results in these sensitivity analyses were largely congruent with our main analysis results (Supplementary Figs. 2–3 and Supplementary Data 16–18). In particular, we still observed significant enrichment in polycomb repressed regions for hyper-methylated CpGs in AD, as well as in binding sites of polycomb repressive complex 2 subunits EZH2 and SUZ12. Moreover, pathway analysis still showed significant enrichment in immune system related pathways (Supplementary Data 19–20) for both FDR significant and genome-wide significant CpGs after bacon correction.

Source Article