- 1.1 Introduction
- 1.2 Beyond DNA: A Broad Picture of Epigenetic Mechanisms
- 1.3 Gene Regulation and the Impact of Chromatin Assembly
- 1.4 Chromatin Structure and the Basis of Epigenetic Mechanisms
- 1.5 Gene Regulation in Chromatin: The Role of Epigenetic Mechanisms
- 1.5.1 Chromatin Remodelling
- 1.5.2 Histone Variants
- 1.5.3 Histone Modification
- 1.5.4 DNA Methylation
- 1.6 Epigenetic Crosstalk – Integrating Histone Modification and DNA Methylation
- 1.7 Summary
CHAPTER 1: Epigenetics – What it is and Why it Matters
-
Published:20 Nov 2015
-
Series: Drug Discovery Series
K. P. Nightingale, in Epigenetics for Drug Discovery, ed. N. Carey, The Royal Society of Chemistry, 2015, pp. 1-19.
Download citation file:
Over the last decade there has been a revolution in our understanding of gene regulation, and how patterns of gene expression are established and maintained in eukaryotic cells. We now know that many factors – including the chemical modification of chromatin, many of the proteins involved in packaging DNA, and even where a gene is located in the nucleus – will influence transcriptional activity. These ‘epigenetic’ mechanisms are essential in ensuring short-term gene activity is appropriate for a cell's environment, and that cell-type specific patterns of gene expression are maintained over the longer term. As these are integral aspects of gene regulation, epigenetic mechanisms are inevitably involved in the misregulation of genes in disease, and occasionally act as the initiating step. As such, the promise of ‘epigenetic therapies’, based on drugs that target these processes, is huge. This chapter is aimed at a non-expert reader, and acts as an introduction to: (i) broadly define epigenetic phenomena; (ii) discuss how genes are regulated in higher organisms, and how epigenetic mechanisms play a crucial role in this, including their deregulation in disease; and (iii) explain how epigenetic processes represent an important new class of targets for clinical intervention.
1.1 Introduction
The last few years have witnessed explosive growth in the broad field of epigenetics, with the emergence of new paradigms, and the discovery of novel regulatory processes and molecules. This offers new approaches to understanding fundamental biology, but also promises new insight into disease processes, and the potential to modulate the genes involved.
What then are epigenetic phenomena? In the broadest sense ‘epigenetics’ describes a layer of information and processes, which act in combination with the DNA sequence to determine an organism’s characteristics (i.e. the ‘phenotype’ – hair colour, height, etc.). This concept is encompassed in the word epigenetics itself, where the Greek prefix ‘epi-’ suggests these are processes that are ‘on top of’ or ‘in addition to’ genetic effects. As such, genetic effects describe a change in gene activity and phenotype, due to a change in the DNA sequence (i.e. when genes are rearranged or mutated). In contrast, an epigenetic change is where gene expression and the phenotype are altered, but without a change in the DNA sequence – although there may be a change in the DNA’s chemical modification or its packaging within the nucleus.
There are three reasons for the intense interest in this field: (i) epigenetic mechanisms are involved in many fundamental areas of biology, and are often underpinned by novel, interesting mechanisms. (ii) Epigenetic regulation plays a central role in gene expression, so is involved in (and occasionally responsible for) initiating disease processes. (iii) A large number of novel enzymes and protein complexes are involved in epigenetic processes, and are potential targets for small-molecule inhibition. To date, two classes of drugs with epigenetic targets have been approved for cancer chemotherapy, but similar ‘epigenetic therapies’ are likely to be developed for a broad range of diseases in the medium term (Figure 1.1).
Genetic and epigenetic processes link the genotype and phenotype. Epigenetic mechanisms are an integral part of gene regulation, and play a role in initiating and maintaining gene activation or silencing, the misregulation seen in many disease processes, and determining the responses of genes to changes in the environment. Epigenetic therapies are designed to impact on gene activity by targeting these processes.
Genetic and epigenetic processes link the genotype and phenotype. Epigenetic mechanisms are an integral part of gene regulation, and play a role in initiating and maintaining gene activation or silencing, the misregulation seen in many disease processes, and determining the responses of genes to changes in the environment. Epigenetic therapies are designed to impact on gene activity by targeting these processes.
1.2 Beyond DNA: A Broad Picture of Epigenetic Mechanisms
The term ‘epigenetics’ was originally developed by Conrad Waddington in the 1940s to describe the processes taking place during cell differentiation to develop functionally and morphologically distinct cell types (e.g. neurones, red blood cells). In this concept, the ‘epigenetic landscape’ described the network of gene regulatory decisions that a cell takes when progressing from a pluripotent stem cell to a terminally differentiated cell (Figure 1.2). This concept was developed before the recognition that DNA was the genetic material, but its insight remains valid, as mechanisms must exist to allow the hundreds of distinct cell types present in typical multicellular organisms to be expressed from a single genotype. This requires that the common DNA sequence is interpreted differently in different types of cells – to establish a single cell-type specific pattern of gene activity, and maintain this throughout the cell’s lifespan. Similarly, processes like ongoing tissue turnover require the replacement of differentiated cells, so cell-type specific patterns of gene activity must be duplicated and passed onto daughter cells. It is now clear that epigenetic processes underpin the establishment and maintenance of these patterns of gene activity, and the ‘cell memory’ that allows them to be passed through cell division.
Patterns of gene activity change during cell differentiation. Differentiation from a pluripotent stem cell to terminally differentiated cells is associated with changes in the transcriptional activity of multiple genes. For example, differentiation is associated with the silencing of genes associated with the pluripotent state (i.e. Gene A, B), and the up-regulation of genes required in specialised cells (i.e. Genes C, D and F). In contrast, some gene products are required in all cell types (i.e. Gene E, termed ‘housekeeping genes’), and these are expressed at a constant level.
Patterns of gene activity change during cell differentiation. Differentiation from a pluripotent stem cell to terminally differentiated cells is associated with changes in the transcriptional activity of multiple genes. For example, differentiation is associated with the silencing of genes associated with the pluripotent state (i.e. Gene A, B), and the up-regulation of genes required in specialised cells (i.e. Genes C, D and F). In contrast, some gene products are required in all cell types (i.e. Gene E, termed ‘housekeeping genes’), and these are expressed at a constant level.
Epigenetic mechanisms have a role in determining the interactions that link genotype and phenotype at the cellular level, but this influence is also apparent on the human scale. Our inherited parental DNA is the basis of our genetic identity and defines many of our characteristics, but it is also clear that it does not predetermine all the elements of who we are. This is most obvious with identical (i.e. monozygotic) twins – people who were initially a single fertilised egg, but by an accident of nature ultimately became two separate but genetically identical individuals. If the DNA sequence is wholly responsible for determining our physical appearance and personalities, then identical twins would indeed be indistinguishable, but in fact they very rarely are. This suggests there is a role for something other than the genome in determining phenotype. This is typically attributed to the environment an organism is exposed to, and is assumed to be the result of a range of factors in the environment (e.g. nutrition in the womb, environmental toxins, etc.).
One aspect of the phenotype where twin studies show a clear role for the environment is in the susceptibility to multifactorial disease. For example, identical twins have an equal chance of developing diabetes mellitus, indicating its strong genetic basis, whereas the risk of both twins experiencing epilepsy or schizophrenia has a much lower concordance, showing a greater role for environmental factors in the development of these diseases (Figure 1.3).
Genetic and environmental factors and their impact on disease. Diseases can be wholly genetic or wholly environmental in their origin, but more typically arise due to the combined effect of multiple genes, and the impact of environmental factors on their activity. Epigenetic mechanisms play a role in transmitting environmental signals to determine gene output.
Genetic and environmental factors and their impact on disease. Diseases can be wholly genetic or wholly environmental in their origin, but more typically arise due to the combined effect of multiple genes, and the impact of environmental factors on their activity. Epigenetic mechanisms play a role in transmitting environmental signals to determine gene output.
This, and the finding that some forms of epigenetic information associated with individual genes diverge between twins as they age (DNA methylation, discussed later1 ), suggests that epigenetic processes play a central role in how genes respond to the distinct environmental conditions each twin experiences. As such, epigenetics is believed to play a role in translating the effect(s) of the environment to determine many aspects of cellular and, ultimately, organismal phenotype.
This suggests that epigenetics plays a role in both the minute-by-minute regulation of gene activity in response to changing environmental cues, and the long-term maintenance of patterns of gene expression throughout a cell’s lifespan. In both these cases the pattern of gene expression is determined in response to extra- or intracellular conditions. However, this is not the only way that genes can be regulated – developmental or cellular processes that are determined by stochastic (or random) switches are also maintained by epigenetic mechanisms. This is clearest in the case of X chromosome inactivation, a process driven by the need to balance transcriptional activity at genes located on the X chromosome in males (XY, i.e. 1 X chromosome) and females (XX, i.e. 2 X chromosomes). In most mammals this is achieved by the random silencing of all of the genes on one of the X chromosomes in females, in a process driven by epigenetic changes in the chromatin and DNA.2 Likewise, position effect variation, the switching of a gene’s activity, determined by its proximity to transcriptionally silent regions of the chromosomes,3 is a stochastic ‘on’ or ‘off’ process that is initiated and maintained by epigenetic processes. The next section discusses the mechanistic basis of these processes.
1.3 Gene Regulation and the Impact of Chromatin Assembly
Gene regulation is an inherently complicated process given the large number of proteins involved in the mechanics of transcription. However, the fundamental regulatory decision(s) that drive gene activity: ‘How much of gene product X does the cell need at this point in time?’ and thus, ‘Should gene X be active or silent?’ seem straightforward. The complexity arises due to (i) the size of eukaryotic genomes (i.e. ∼23000 genes in the human genome), (ii) the distinct, coordinated patterns of transcriptional activity required to generate specialised cell types, and (iii) the range of factors in the cellular environment to which cells need to respond. Higher eukaryotes manage this by multiple regulatory steps throughout the transcription–translation process, but for many genes, control is primarily applied at the initial step – the recruitment of RNA polymerase and its subsequent release from the transcriptional start site.
Genes consist of two functionally distinct types of DNA: (i) the coding sequence that defines the amino acid sequence of its protein product, and (ii) the regulatory sequences that ensure it is expressed appropriately. It is this second class of sequences that allow genes to respond to a range of intra- and extracellular factors, and ensure they are transcriptionally active or silent at the right place and time. This control is performed by DNA-binding proteins that recognise specific sequence motifs within the gene-regulatory regions, which are typically located adjacent to the transcription start site (termed ‘promoters’) or at considerable distances from the gene of interest (‘enhancers’). Thus, transcription at individual genes is activated or silenced by the binding of regulatory ‘transcription factors’ (Figure 1.4A), where the act of DNA binding is regulated by signals from the cellular environment (e.g. following steroid hormone binding). In this way the activity of a gene is regulated by a ‘committee’ of transcription factors that interact with RNA polymerase and the general transcription machinery to ensure initiation takes place. As such, gene activity is set at an appropriate level as it reflects the coordinated response to multiple transcription factors, allowing the integration of information from a variety of intra- and extracellular signals (Figure 1.4A).
Gene regulation and the impact of chromatin assembly. (A) Gene regulation modelled on a protein-free DNA template. Gene regulation is driven by a number of gene-specific transcription factors (TF1–3), which bind to sequence binding sites and together ensure the efficient recruitment and release of RNA polymerase. (B) The impact of chromatin on this process. The assembly of the DNA template into chromatin presents a barrier to the binding of regulatory factors and RNA polymerase to their binding sites. This is overcome by epigenetic processes.
Gene regulation and the impact of chromatin assembly. (A) Gene regulation modelled on a protein-free DNA template. Gene regulation is driven by a number of gene-specific transcription factors (TF1–3), which bind to sequence binding sites and together ensure the efficient recruitment and release of RNA polymerase. (B) The impact of chromatin on this process. The assembly of the DNA template into chromatin presents a barrier to the binding of regulatory factors and RNA polymerase to their binding sites. This is overcome by epigenetic processes.
Undergraduate biologists are introduced to gene regulation using a simplified model – that transcription takes place on protein-free DNA, suggesting that DNA sequences throughout the genome are equally available for protein binding. This is useful for understanding the protein–DNA interactions that underpin transcription, but ignores a fundamental aspect of gene regulation in eukaryotes – that the DNA template is assembled into chromatin. In vitro experiments show this plays a key regulatory role, as many transcription factors will not bind their binding sites when assembled into chromatin (Figure 1.4B), suggesting this is a highly repressive environment for transcription. This has advantages for gene regulation as: (i) it suppresses ‘adventitious’ transcription (i.e. from spurious RNA polymerase binding at non-genic regions), but also (ii) introduces an unavoidable and powerful regulatory step – that chromatin-mediated repression has to be overcome for transcription to occur. The next section gives an overview of chromatin structure, before describing how epigenetic processes exploit this to generate regulatory steps.
1.4 Chromatin Structure and the Basis of Epigenetic Mechanisms
Cells in higher eukaryotes are characterised by the presence of a nucleus, the division of the genome into chromosomes, and the packaging of the DNA into an organised protein superstructure termed chromatin. This consists of an enormous range of proteins, which either contribute to chromosome structure or the functional processes that use the DNA template. At its core, chromatin has a straightforward modular structure, based on a single repeating building block – the nucleosome (Figure 1.5). This large (∼100 kD) globular protein complex consists of eight arginine and lysine-rich histone proteins (two each of histones H2A, H2B, H3 and H4), around which 147 base pairs of DNA wrap in almost two complete turns of DNA.4 Nucleosomes have a degree of preference for specific DNA sequences, but this is relatively weak, allowing their assembly at ∼200 base pair intervals throughout the genome.5 This regular array is disrupted at regulatory sites (i.e. promoters, enhancers), which often adopt starkly different patterns of transcription factor and nucleosome binding, depending on whether a gene is active or silenced.6 These differences are driven by factor binding, with the resultant assembly of positioned nucleosomes at defined sequences and chromatin looping to allow long-range interaction between distant regulatory sites.
The hierarchy of chromatin structures in the interphase nucleus. This cartoon indicates the nucleosome, the unit of chromatin, and how this is assembled into a higher-order solenoid termed the 30 nm fibre. The molecular detail of the relationship between this structure and the stable topological domains that define the interactions of large regions of chromatin and/or how chromosomal DNA is constrained into ‘territories’ remains unclear.
The hierarchy of chromatin structures in the interphase nucleus. This cartoon indicates the nucleosome, the unit of chromatin, and how this is assembled into a higher-order solenoid termed the 30 nm fibre. The molecular detail of the relationship between this structure and the stable topological domains that define the interactions of large regions of chromatin and/or how chromosomal DNA is constrained into ‘territories’ remains unclear.
Much of our current knowledge of the nucleosome comes from structural studies,4 which revealed the histone–histone and histone–DNA interactions that build the central globular core and scaffold the assembled DNA. In contrast, the structure of the histone N- and C-terminal ‘tails’ are unclear – suggesting that these highly conserved domains are flexible, and extend beyond the bound DNA to interact with other components of chromatin. Residues on these domains stabilise the assembly of higher-order structures in chromatin,7,8 initially forming a coiled solenoid termed the 30 nm fibre (Figure 5). The structure of this solenoid, and its subsequent refolding to generate as yet poorly defined higher-order chromatin structures, remains an area of active research, as does the nature of the chromosomal loops, which appear to contribute to the condensation of chromatin in metaphase chromosomes.
This focus on the role of the nucleosome in assembling DNA and stabilising higher-order chromatin structures emphasises one of the main roles of chromatin – in packaging the DNA, and enabling the condensation of the human genome (i.e. 3 × 109 bases, approximately 2 metres in length) into the eukaryotic nucleus (typically 10µm diameter). However, this compaction must also be compatible with the functional processes that use the DNA as a template. This requires that different regions of chromatin are maintained in transcriptionally active or silent states, but there must be mechanisms to allow dynamic changes between these states in response to cellular conditions (i.e. the activation of a silenced gene). Likewise, all areas of the genome must remain accessible to the ongoing processes of replication and repair. Eukaryotic cells have evolved several ways to resolve this apparent tension, by building on and adapting the framework of the repeating nucleosome array by recruiting chromatin-binding proteins in a functionally-sensitive context. This generates specialised chromatin domains, but which are both dynamic and reversible – responding to the functional requirements of the underlying DNA template. Several abundant families of architectural chromatin-binding proteins, including linker histones and high mobility group proteins, are thought to stabilise and regulate the accessibility of chromatin and the formation of higher-order chromatin structures, whereas other protein complexes, notably the Polycomb group (PcG) or Trithorax group (TxG) proteins are known to stabilise transcriptionally silent or active chromatin.
Finally, an emerging area of research interest focuses on the role that nuclear structure and 3D chromatin interactions play in regulating gene activity. Early light microscopy studies recognised that chromatin is not uniformly distributed in the nucleus, but forms morphologically distinct regions, which were later shown to reflect the function of the underlying DNA (i.e. transcriptionally active euchromatin, silent heterochromatin). More recently, immunofluorescence microscopy has shown that nuclei have a small number of punctate ‘transcription factories’ containing high concentrations of RNA polymerase II, suggesting that the DNA template is mobile, and genes move to fixed structures when transcribed.9 Similar approaches show that individual chromosomes are restricted to defined regions of the nucleus (‘chromosomal territories’), and that transcriptional activation is associated with genes ‘looping out’ from their corresponding territory, confirming that gene regulation is associated with intra-nuclear movement.10 This, and findings that adjacent regions of chromosomes interact to form large (∼880 kb), apparently stable 3D ‘topological domains’,11 stresses that chromatin adopts large, functionally related structures in the interphase nucleus, yet regions of chromatin remain mobile, even within short timescales.
1.5 Gene Regulation in Chromatin: The Role of Epigenetic Mechanisms
The complexity and nature of regulation at individual genes is driven by the cellular requirement for the gene product and the range of environmental factors that influence this demand. This varies from gene to gene, such that the transcription factors acting at specific promoters and enhancers will have evolved to match a gene’s regulatory needs. These factor-binding sites, and the propensity of the DNA sequence to assemble positioned nucleosomes, will ultimately define the promoter architecture and function, but the DNA is also overlaid by several forms of epigenetic information which contributes to gene regulation. As such, genetic and epigenetic information routinely integrate and complement each other at all genes – acting at the same regulatory sequences, often via common protein regulators in coordinated patterns of activity. In this context, distinguishing ‘epigenetic’ from ‘genetic’ processes is misleading, as it does not capture the seamless nature of gene regulation. The next four sections introduce the key epigenetic processes that act at most genes, and discuss how they interact with transcription factors and each other to generate an integrated process of gene control.
1.5.1 Chromatin Remodelling
In vitro experiments show that nucleosomes are sensitive to aspects of the DNA sequence, and adopt tightly defined positions on some sequences.5 This is also seen in vivo, but the promoter architecture at gene-regulatory regions frequently changes upon gene activation, corresponding with the binding of transcription factors.6,12 This initial recruitment phase represents a regulatory step, as many transcription factors cannot bind their recognition sites when these are assembled on nucleosomes. Thus a class of enzymes, chromatin remodelling complexes, are required to rearrange (or ‘remodel’) the positions of occluding nucleosomes to allow factors to bind.13 Several families of remodelling complexes have been identified, and use ATP hydrolysis to reposition or evict nucleosomes, allowing transcription factors to bind their sites in the newly accessible DNA (Figure 1.6). These large, multi-subunit complexes (e.g. NURF: 520 kD complex;14 ACF: 450 kD15 ), contain a core ATPase subunit that performs the nucleosome remodelling, and additional components involved in targeting and/or regulating this activity, and which recruit the complexes to target promoters by interacting with transcription factors. Other classes of remodellers (human SWI/SNF, dCHD1) are found to co-localise with RNA polymerase, and play a role in facilitating RNA polymerase II elongation through chromatin.16,17 Interestingly, chromatin remodelling is reversible, as in S. cerevisiae the transcriptional repressor SSn5-Tup1 recruits the remodeller yISW2 to reassemble nucleosome arrays and evict transcription factors, thereby generating transcriptionally repressive chromatin.18
Chromatin remodelling facilitates transcription factor access. Steric factors, and the distortion of DNA associated with nucleosome assembly, prevents many transcription factors gaining access to their binding sites [yellow box]. Chromatin remodelling complexes are large, multi-subunit complexes that use ATP hydrolysis to slide or evict nucleosomes from these sites, and thereby allow factor binding.
Chromatin remodelling facilitates transcription factor access. Steric factors, and the distortion of DNA associated with nucleosome assembly, prevents many transcription factors gaining access to their binding sites [yellow box]. Chromatin remodelling complexes are large, multi-subunit complexes that use ATP hydrolysis to slide or evict nucleosomes from these sites, and thereby allow factor binding.
1.5.2 Histone Variants
Eukaryotic cells contain a diverse range of highly conserved histone variants, which contain small changes in primary amino acid sequence (e.g. the variant H3.3 differs from H3.1 by 4–5 amino acids), or substantial differences (e.g. macro-H2A contains an additional 25 kD C-terminal domain), from the standard ‘canonical’ histone sequence.19 Variants represent a small percentage of the total histones in the cell, but are concentrated in chromatin regions with specific functional roles,20 including marking the inactive X chromosome (macroH2A), or centromeric heterochromatin (CENP-A), or are positioned adjacent to transcriptional start sites (H2A–Z) and sites associated with transcriptionally active or paused RNA polymerase II (H3.3).
The standard ‘canonical’ histone isoforms are assembled into chromatin in a replication-dependent manner, and are deposited onto the DNA by components of the replication machinery.21 In contrast, histone variants are deposited by variant-specific chaperone complexes in a replication-independent manner,22,23 generating localised regions of chromatin where nucleosomes have distinct physicochemical properties or novel sites for post-translational modification. For example, H2A–Z differs from H2A by an ‘acidic patch’, which appears to make variant-containing nucleosomes more unstable, and thus easily displaced by transcription factors.24 Likewise, H3.3 contains a novel site in the N-terminal for post-translational phosphorylation, although the key differences from H3.1/2 appear to be those that permit replication-independent assembly following RNA polymerase II-mediated eviction of nucleosomes.19 In contrast, the extended N-terminal domain of macro-H2A is thought to stabilise chromatin and prevent transcription factors gaining access to their binding sites,25 consistent with the transcriptional silencing of the inactive X chromosome. In summary, histone variants are used to pre-set the structural and functional properties of chromatin, to facilitate or preclude subsequent factor binding, and their associated regulatory processes.
As such, chromatin remodelling and histone variants act to modulate local chromatin structure (i.e. altering nucleosome positions, establishing regions of specialised chromatin), and exert functional effects indirectly in a non-specific manner to permit or occlude the binding of a broad range of regulators. There is evidence that some classes of histone modification (i.e. acetylation, phosphorylation) also work in this way, but these seem to be exceptions – typically these modifications act in a specific and directed manner, by recruiting a single protein, or narrow class of partner proteins, to chromatin.
1.5.3 Histone Modification
Histones are post-translationally modified at sites within the central nucleosome core, and on the N- and C-terminal ‘tails’ which are subject to unprecedented levels of modification, with a large proportion of residues subject to modification by a diverse range of chemical moieties. These domains are highly conserved, yet do not appear to adopt a stable structure4 (although this remains unclear26 ), suggesting they are dynamic, and evolved to interact with a broad range of proteins that contribute to the structural and functional properties of chromatin. This is consistent with our current understanding of histone modifications – that these act as regulatory ‘marks’ that modulate the ability of the histone tails to interact with other proteins, and regulate their recruitment to chromatin.
How do histone modifications exert effects in chromatin? We can take histone acetylation as an example. Histone acetylation is a highly abundant, extensively characterised modification, with a large number of lysine residues subject to acetylation on all four histone tails, but also within the nucleosome core. Acetylation neutralises the positive charge associated with lysine, and is assumed to reduce histone tail interactions with the DNA and impact on chromatin structure. This has functional consequences as in vitro experiments show histone acetylation facilitates transcription factor binding to nucleosome-bound DNA,27 and reduces the ability of chromatin to form the 30 nm fibre, a transition associated with transcriptional repression.28 Likewise, chromatin immunoprecipitation (ChIP) indicates many acetylated lysine isoforms are associated with transcriptionally active loci (e.g. at promoters and enhancers) and are absent in silenced regions of chromatin. This suggests histone acetylation exerts functional effects by influencing local chromatin structure, but other histone modifications do not seem to work this way, as even small chemical differences with no apparent impact on chromatin structure have functional outcomes.
For example, lysine residues in histones are subject to different degrees of methylation (i.e. me1, me2 or me3), an apparently minimal difference, but the H3 K 4 mono- and trimethylated isoforms show distinct genomic distributions in cells.29,30 Likewise, the methylation of lysine residues on the histone H3 tail can have functionally opposing roles – methylation of H3K4 locating to active genes (termed an ‘activating mark’), whereas methylation of H3 K 9 is found in transcriptionally silent chromatin (‘repressive mark’). This suggests that the genomic distribution of a histone modification (and by inference, its function) is determined by both the precise chemical moiety, and residue modified. This is consistent with the ‘epigenetic code’31 and ‘histone code’ hypotheses32 that propose these modifications represent a layer of epigenetic information (or ‘marks’) that determine the functional status of the underlying DNA. As such, they act by playing a ‘signalling’ role – acting as precise marks that are recognised by a class of specialised ‘reader’ protein domains which bind histone modifications with a high degree of modification and residue specificity, and thereby co-recruit functional and/or structural ‘effector’ proteins to the marked region of chromatin33 (Figure 1.7). The last decade has seen the identification and characterisation of many classes of modification-specific reader domains (e.g. acetylation-binding ‘bromodomains’), and the protein complexes that these recruit (e.g. chromatin remodelling complexes, components of the general transcriptional machinery).
Regulatory components of the epigenetic/histone code. [Upper panel] The abundance of a specific form of histone modification at a site is determined by the recruitment and activity of opposing classes of modification enzymes: ‘writers’ that deposit the mark(s), and ‘erasers’ that subsequently remove them. Modification(s) are recognised by specific ‘reader’ domains that bind them and co-recruit the associated complex to chromatin, thereby targeting a functional or structural activity. [Lower panel] Enzymatic regulation of histone acetylation. Histone acetyl isoforms are regulated by large families of histone acetyltransferases (HATS) and histone deacetylases (HDACs), with bromodomain-containing effector complexes binding these sites. Enzyme activity and modification abundance is modulated by factors in the intra- and extracellular environment.
Regulatory components of the epigenetic/histone code. [Upper panel] The abundance of a specific form of histone modification at a site is determined by the recruitment and activity of opposing classes of modification enzymes: ‘writers’ that deposit the mark(s), and ‘erasers’ that subsequently remove them. Modification(s) are recognised by specific ‘reader’ domains that bind them and co-recruit the associated complex to chromatin, thereby targeting a functional or structural activity. [Lower panel] Enzymatic regulation of histone acetylation. Histone acetyl isoforms are regulated by large families of histone acetyltransferases (HATS) and histone deacetylases (HDACs), with bromodomain-containing effector complexes binding these sites. Enzyme activity and modification abundance is modulated by factors in the intra- and extracellular environment.
1.5.3.1 Drivers of Dynamic Gene Regulation: Histone-Modifying Enzymes
Histone modification(s) are a diverse range of chemical tags, but those that have been characterised to date act as a short-term layer of epigenetic information to regulate local protein recruitment to chromatin. As such, the abundance and distribution of these marks at a gene reflects the ongoing processes that are taking place, and emphasise the regulatory importance of the recruitment and activity of two or more opposing classes of histone-modifying enzymes that maintain this. This common theme suggests histone-modifying enzymes can be divided into two classes: ‘writers’, which deposit a class of histone modifications (e.g. histone acetyl transferases) and ‘erasers’, the enzymes that remove these marks (e.g. histone deacetylases (HDACs): Figure 1.7). These activities are recruited to genes in the context of large co-transcriptional regulator complexes, often by protein–protein interactions with activating or repressive transcription factors, or other components of the transcriptional process (Figure 1.8). Analysis of the epigenetic regulators recruited to steroid hormone responsive genes shows the binding of these enzyme complexes is highly dynamic, changing on a minute-by-minute basis, consistent with the rapid turnover of histone modifications, and their associated reader proteins.34,35 For example, histone acetyl isoforms have a half-life measured in minutes, suggesting individual genes are regulated on this timescale. The recent finding that the activity of several classes of HDACs are sensitive to intracellular metabolic intermediates (e.g. NAD+), naturally occurring dietary components (e.g. resveratrol, a component of red wine), or by-products of the intestinal flora (e.g. the fatty acid, butyrate) suggests that these enzymes may have evolved to modulate their activity (and gene expression) in response to these agents (Figure 1.7, lower panel). This suggests that histone modification represents a second layer of mechanisms that integrate environmental signals and modulate gene activity – in addition to the transcription factors (Figure 1.4A). A detailed discussion of the regulators, reader domains and effector proteins involved in gene regulation, and the implications of this for modulation by small-molecule inhibitors, is contained elsewhere in this volume.
Gene regulation in chromatin. (Upper panel) Activating transcription factors bind in chromatin and recruit transcriptional co-activator complexes containing chromatin remodellers and/or histone modifying enzymes which deposit ‘activating’ marks on the adjacent chromatin. Note the nucleosome at the transcriptional start site may contain histone variants. (Lower panel) Transcriptional repressors act via recruiting regulatory complexes that deposit ‘repressive’ marks and/or erase activating marks.
Gene regulation in chromatin. (Upper panel) Activating transcription factors bind in chromatin and recruit transcriptional co-activator complexes containing chromatin remodellers and/or histone modifying enzymes which deposit ‘activating’ marks on the adjacent chromatin. Note the nucleosome at the transcriptional start site may contain histone variants. (Lower panel) Transcriptional repressors act via recruiting regulatory complexes that deposit ‘repressive’ marks and/or erase activating marks.
1.5.4 DNA Methylation
The other major epigenetic mechanism, DNA methylation, is restricted to metazoans, suggesting that it co-evolved with the increased complexity of higher eukaryotic genomes. Methylation takes place at the 5-carbon of cytosine residues in cytosine–guanine (CpG) sites, a symmetrical sequence found on both strands of the DNA. The human genome has a low incidence of CpG dinucleotides, but is interspersed with regions containing elevated levels of these sequences (termed ‘CpG islands’), which associate with gene-regulatory regions. Typically, CpG sites throughout the genome are methylated, but sites within CpG islands remain non-methylated, unless the corresponding gene is silenced. DNA methylation is associated with long-term transcriptional silencing in processes which are not normally reversed including: (i) imprinting, where maternal or paternal alleles of genes are silenced, (ii) dosage compensation, the inactivation of one X chromosome in females, or (iii) the silencing of developmentally regulated genes with cell differentiation, and/or potentially damaging transposon and virally integrated sequences.
DNA methylation and its associated transcriptional repression is assumed to be an irreversible process, but there are biological exceptions when DNA demethylation must occur – notably the global wave of demethylation observed in early development,36 and the experimental reversal of differentiation-associated gene silencing when terminally differentiated cells are reprogrammed to induced pluripotent stem cells.37 The recent finding that DNA methylation is a substrate for the ten-eleven translocation (TET) proteins,38 which act as methylcytosine oxidases to generate a number of oxidised intermediates including 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC, Figure 1.9), suggest that this may be a route to DNA demethylation, potentially via the dilution of 5mC marks during replication, or removal by base-excision repair.39,40 This remains an area of active research, but the finding that these intermediates are present in nuclei, that TET proteins are recruited to CpG islands, and that their activity correlates with transcriptional activation or repression, suggests that this plays a role in gene regulation. The function of these oxidised 5mC intermediates, and whether they represent distinct epigenetic marks that are able to recruit a subset and/or specialised methyl-CpG binding domain (MBD) proteins remains an area of active research.41
Cytosine methylation and TET-mediated oxidation. The methylation of deoxycytosine residues to 5-methylcytosine (5mC) is mediated by DNA methyltransferase (DNMT) enzymes, and sequentially oxidised by ten-eleven translocation (TET) enzymes via 5-hydroxymethylcytosine (5mC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC).
Cytosine methylation and TET-mediated oxidation. The methylation of deoxycytosine residues to 5-methylcytosine (5mC) is mediated by DNA methyltransferase (DNMT) enzymes, and sequentially oxidised by ten-eleven translocation (TET) enzymes via 5-hydroxymethylcytosine (5mC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC).
1.6 Epigenetic Crosstalk – Integrating Histone Modification and DNA Methylation
A final topic is the level of integration between the different epigenetic processes. This occurs not just between mechanisms that regulate short-term gene expression (e.g. chromatin remodelling enzymes can be recruited by histone modifications), but also in long-term gene silencing via integration of DNA methylation with short-term histone modifications like acetylation and lysine methylation.
This crosstalk is apparent in the mechanism of mark deposition, but also the ways the various marks impact on transcription. DNA methylation is regulated by one of three classes of DNA methyltransferase (DNMT) enzymes, which maintain the pattern of methylation after replication or excision repair by acting at hemi-methylated sites (DNMT1), or deposit de novo methylation at unmethylated CpGs (DNMT3a,b). DNMT activity can be recruited to chromatin by heterochromatin protein 1 (HP1), a protein that binds the repressive histone modification H3K9me3. Likewise, the primary functional effect of DNA methylation is via the recruitment of specialised MBD proteins (i.e. MBD1, MBD2, MeCP2 and Kaiso42,43 ) which co-recruit large repressor complexes. Many of these complexes contain the histone deacetylases HDAC1 and HDAC2, which act on adjacent chromatin and maintain transcriptional silencing (Figure 1.10), although the MBD2 complex also contains Mi2, a chromatin remodeller involved in transcriptional repression.
Integrating histone modification and DNA methylation. Heterochromatin protein 1 binds the repressive mark H3K9me3, and co-recruits a complex containing DNMT activity which methylates CpG dinucleotides. This contributes to establishing silencing at these loci via recruiting one of several classes of methyl CpG binding domain proteins, which co-recruit HDAC activity to the locus.
Integrating histone modification and DNA methylation. Heterochromatin protein 1 binds the repressive mark H3K9me3, and co-recruits a complex containing DNMT activity which methylates CpG dinucleotides. This contributes to establishing silencing at these loci via recruiting one of several classes of methyl CpG binding domain proteins, which co-recruit HDAC activity to the locus.
This coordination and integration of histone modification and DNA methylation processes is consistent with multiple examples of large epigenetic regulators containing remodelling and/or DNA or histone (de)modifying enzymes, and contributes to the regulation and maintenance of gene expression patterns over the short and long term.
1.7 Summary
Epigenetic regulation represents an integrated layer of regulatory information – ranging from short-term and highly dynamic processes involved in the minute-to-minute activation or silencing of genes, to the long-term maintenance of these patterns. There are now many examples where mutated epigenetic regulators, for example following gene rearrangements of the MLL (mixed lineage leukaemia) histone methyltransferase,44 or MOZ/MYST3 (Monocytic leukaemia zinc-finger protein) acetyltransferase, or mutation of components of chromatin remodellers (SNF2) are known to be causal for gene deregulation and initiating disease processes.
The realisation that epigenetic regulation is driven by multiple families of regulators, many acting via multi-subunit complexes, to allow tight integration with each other, represents a challenge for developing gene-specific agents. Nonetheless, the emergence of successful epigenetic therapies suggests that this is possible, particularly as only a small proportion of epigenetic regulators appear to have been targeted to date.