- 1.1 Overview: the Biochemistry of DNA Synthesis
- 1.2 Where and When Does DNA Replication Take Place?
- 1.2.1 Cell Cycle Control
- 1.2.2 Origin Clusters and Replication Foci
- 1.2.3 The Replication Timing Programme
- 1.3 Origins of DNA Replication
- 1.4 Licensing of DNA for Replication
- 1.5 Initiation of DNA Replication
- 1.6 Elongation of Replication Forks
- 1.7 Termination of DNA Replication
- 1.8 Replication of Chromatin
- 1.9 Chromatid Cohesion and Segregation
Chapter 1: Conserved Steps in Eukaryotic DNA Replication
-
Published:30 Sep 2009
X. Quan Ge and J. Julian Blow, in Molecular Themes in DNA Replication, ed. L. S. Cox, The Royal Society of Chemistry, 2009, ch. 1, pp. 1-21.
Download citation file:
1.1 Overview: the Biochemistry of DNA Synthesis
The genome of mammals comprises ∼6 × 109 nucleotides arranged in extremely long linear polymers—the chromosomes. Accurate copying of this amount of genetic information in a biologically relevant time frame (often a few hours, though sometimes as little as a few minutes) requires highly accurate enzyme machines together with complex molecular coordination and feedback.
The DNA template is a long polymer of four types of deoxyribonucleotide arranged as a double-stranded anti-parallel helix (Figure 1.1A). The backbone of each strand consists of phosphodiester linkages between the 3′ and 5′ carbons of deoxyribose. The 1′ carbon of the deoxyribose is linked to one of four different bases: the purines (adenine and guanine) or the pyrimidines (thymidine and cytosine). The two strands are held together by hydrogen bonds between complementary bases. The two-ringed heterocyclic purines always base pair with single ring pyrimidines, maintaining the linear axis of the helix and avoiding backbone distortion; specifically, guanine forms three hydrogen (H) bonds with cytosine and adenine makes two hydrogen bonds with thymidine (Figure 1.1B). Whilst each H bond is relatively weak, the huge number of H bonds in an average mammalian chromosome (>109) ensures that the duplex is extremely stable. As noted by Watson and Crick in 1953,1 each of the two single strands of DNA contains all the information necessary to produce new second strands through complementary base pairing.
During DNA replication, the two strands are opened up and a nascent strand, complementary to the template strand, is synthesised by a complex of many proteins called the replication fork or replisome. Unwinding of the two strands of DNA to expose bases during template-directed DNA synthesis requires the input of chemical energy to break the hydrogen bonds. This energy is derived from hydrolysis of ATP by helicases, which act as DNA-dependent ATPases that run ahead of the replication fork (Figure 1.2). In eukaryotes, the major replicative helicase is almost certainly the hexameric Mcm2-7 complex2–5 (Figure 1.3, see also Chapter 3). At the same time, torsional stress (positive supercoiling) is caused by unwinding the DNA and this is relieved by topoisomerases (nicking-closing enzymes) (Figure 1.2). Then DNA is synthesised by enzyme-mediated polymerisation of deoxyribonucleotide triphosphates (dNTPs) complementary to the sequence of the exposed bases on the template strand (see Chapter 4). New daughter duplexes thus consist of one parental template strand base paired to a complementary daughter strand. This is semi-conservative replication and was first demonstrated experimentally by Meselson and Stahl,6 who showed that newly synthesised DNA is composed of one template strand plus one nascent strand.
During polymerisation, nucleophilic attack by the lone pair of electrons on the 3′ hydroxyl (OH) of deoxyribose onto the 5′ phosphate of an incoming dNTP results in the formation of a phosphodiester bond with the elimination of pyrophosphate (Figure 1.1C). The subsequent, and very rapid, hydrolysis of pyrophosphate to two inorganic phosphates releases energy, and it is this that drives the polymerisation reaction forwards. An important consequence of this reaction is that DNA must always be synthesised in a 5′ to 3′ direction; all known DNA polymerases act 5′–3′ with respect to the newly synthesised (nascent) DNA molecule. However, while the double-stranded DNA template exists as an anti-parallel double helix, replication of both template strands is coordinated at the replication fork, which moves in a net direction away from the start point (replication origin). To overcome this conflict of directionality, on only one strand (the ‘leading strand’) can DNA be polymerised in the same direction as the fork is moving. On the other strand (the ‘lagging’ strand), nascent DNA is synthesised in short sections called Okazaki fragments, typically ∼150 nucleotides long in eukaryotes (Figure 1.3). Okazaki fragments are started by short RNA primers which are subsequently removed before the fragments are joined together. Thus the fork can move away from the start site while co-coordinating synthesis of both nascent strands and without contravening the energy requirements of 5′–3′ synthesis.
In eukaryotes, the chromosomal DNA is located within the cell nucleus where it is associated with proteins to form a DNA-protein complex called chromatin (see Chapter 10). The basic building block of chromatin is the nucleosome core particle, which contains 147 base pairs of double-stranded DNA wrapped in 1.65 left-handed superhelical turns around the surface of histone octamer comprising two central H3–H4 dimers flanked on either side by two H2A–H2B dimers (Figure 1.4). A variety of other proteins also bind to DNA and regulate its activity. For replication to occur, pre-existing nucleosomes and other DNA-bound proteins that are located ahead of replication forks need to be transiently disrupted. After fork passage, those proteins are deposited back on parental as well as nascent DNA so that the chromatin status is reproduced in daughter strands7 (see also Chapter 10).
1.2 Where and When Does DNA Replication Take Place?
1.2.1 Cell Cycle Control
In eukaryotes, DNA replication takes place during a distinct phase of the cell cycle called S phase, during which time the entire genome is precisely duplicated (Figure 1.5). The replicated DNA molecules are segregated to the two daughter cells during a subsequent cell cycle phase called mitosis (see Chapter 9). S phase and mitosis are separated by two ‘gap’ phases, G1 and G2. Progression through each stage of the cell cycle is very tightly regulated by a complex interplay of kinases (enzymes that phosphorylate proteins), phosphatases (enzymes that remove phosphate groups from proteins) and proteases (which degrade proteins into shorter polypeptides or constituent amino acids).
During S phase, pairs of replication forks are initiated bidirectionally from chromosomal loci called replication origins. The large size of eukaryotic chromosomes (each of which can be tens or hundreds of megabases long) means that in order for them to be replicated in a reasonable period of time, a large number of replication origins are needed. Although the initiation of a pair replication forks at a replication origin is a tightly controlled process, each fork will typically then move along the DNA (‘elongate’) until it encounters a fork moving in the opposite direction, at which stage both forks will disassemble (‘terminate’). When DNA is visualised during the S phase, replicated DNA can be observed as a series of ‘bubbles’ with replication origins near their centres (arrowheads in Figure 1.6). The stretch of DNA replicated by forks emanating from a single origin is referred to as a replicon. Replicon sizes can vary significantly, both among different organisms and among different cell types in the same organism. Rapidly dividing cells typically have small replicon sizes (for example, cells in the early Xenopus embryo has an average replicon size of ∼10 kb, whilst mammalian somatic cells typically have replicon sizes of 50–150 kb8–10 ).
1.2.2 Origin Clusters and Replication Foci
In metazoans, adjacent origins (typically 2–5) are organised into clusters which initiate synchronously while different origins clusters are activated at different stages of S phase.11 One or more clusters of origins are organised into a discrete replication focal site, which has been estimated to comprise about 1 Mb of DNA and 6–12 replicons. Each focus is thought of as a factory for DNA replication and contains a range of replication fork proteins (forming so-called replisomes).12 It is possible that replisomes are anchored to a fibrous network within the nucleus (the ‘nuclear matrix’ or ‘nuclear scaffold’) through which multiple replication forks are spooled; alternatively, the physical organisation of the chromosomal DNA into higher order chromatin structures could provide the framework on which replication foci are built.13–15 DNA replication is typically completed in each focus within 30–120 minutes,16 and during this time, live cell imaging of the replication fork protein PCNA2 (see Chapters 3 and 7) has shown that replication foci do not merge, divide or have directional movement,17,18 thus arguing that replication foci are achieved by the coordinated assembly and disassembly of replisomal proteins at sites that are more or less fixed.
1.2.3 The Replication Timing Programme
Eukaryotes replicate their genomic DNA according to a specific temporal programme, with different clusters of origins firing at different time during an S phase that lasts from minutes in yeast to hours in metazoans. Several pieces of evidence have suggested that chromatin context is a critical determinant of origin initiation time. The replication timing programme is re-established in each cell cycle shortly after mitosis.19 Transcriptionally active regions tend to have open chromatin structure and replicate early, whereas gene-poor regions and the more condensed heterochromatin replicate late.20,21 Transcriptional silencing can reprogramme an origin from initiating early to late, as well as by promoting a more compact chromatin structure around the region.22
The timing decision point in early G1 (Figure 1.5) is the time when specific regions of the chromosome become programmed to replicate at specific stages of S phase. This takes places coincidently with the repositioning of chromosomes in the nucleus and the formation of immobile structures in the nucleus that restrict chromosome mobility.19,23 It has been proposed that chromatin regulators might be concentrated into subnuclear compartments by a clustering of related chromosomal domains, which may influence the timing of origin firing within a chromatin domain. For example in yeast, late replicating origins reside close to the nuclear periphery in G1, whereas early replicating origins are apparently randomly localised within the nucleus throughout the cell cycle.24
Many other factors could also contribute to determining the timing of origin firing. For example, in Saccharomyces cerevisiae, the timing of replication in certain origins is shown to be affected by the origin sequence.25 Precise levels of cyclin-dependent kinase (CDK) activity present at various stages of S phase are important for executing the temporal programme. In budding yeast, two S phase cyclins have differential roles in activation of early and late origins: Clb5 activates both early and late origins, while Clb6 activates only early origins.26
The replication timing programme determines the differential firing time of large sequence blocks containing replication origin clusters, but why has the cell evolved such a sophisticated programme for DNA replication? The grouping of replication forks into factories that are activated at different times might provide an environment whereby newly replicated DNA could be assembled into specific chromatin states, thus maintaining the epigenetic information that is important for regulation of other nuclear activities (such as transcription).10,27 It may also allow for tight regulation feedback, for example blocking firing of late origins when replication from early origins is halted.
1.3 Origins of DNA Replication
The number of origins ranges from a few hundred in a yeast cell to tens of thousands in a human cell. The extent to which conserved DNA sequence elements determine origins differs significantly among eukaryotic species. Replication origins in the budding yeast Saccharomyces cerevisiae contain highly conserved sequence elements called A, B1, B2 and B3 boxes of the autonomously replicating sequence (ARS).28 These conserved DNA sequences are required for binding of the initiator protein ORC (origin recognition complex, see Chapter 2). However, not all DNA segments containing the conserved sequence elements are recognised by ORC in vivo. Other sequences distributed over 100 bp also contribute to replication origin function, possibly by providing binding sites for proteins that can enhance the recruitment of ORC to DNA or by providing DNA sequences that can be easily unwound.29
The origins in most other eukaryotes are much less stringent in terms of sequence requirement. In the fission yeast, Schizosaccharomyces pombe, the required origin sequences are distributed over large DNA segments (500–1000 bp) and are AT rich.30 It appears that it is the number of AT tracts in a given segment of DNA that determines its probability of binding ORC and functioning as an origin of DNA replication.
The nature of origins in metazoans is even less well defined than in yeasts and the origins appear not to contain any consensus sequence. Replication origins occur at frequent and nearly random intervals along metazoan chromosomal DNA, and only a fraction of them are utilised in each cell cycle with a wide variation of efficiency. A typical pattern of origin initiation in metazoans is broad zones containing many relatively inefficient origins, one or a few of which are selected stochastically and the rest are suppressed.31,32 However, at some origins, such as lamin B2 and β-globin origins, replication starts from tightly-defined sites.33,34
Several interacting components may influence the location and efficiency of initiation in any given cell cycle, such as:
DNA sequences. Sequences rich in AT could facilitate ORC binding or DNA unwinding.
Local chromatin structure. It has been shown that the positions of nucleosomes near origins are important for origin function.35 Whilst histone acetylation has been shown to affect origin specification in Xenopus and Drosophila,36,37 in mammalian cells, an ATP-dependent chromatin remodelling complex is required for efficient replication of heterochromatin.38
Transcription. Transcription has been shown to interfere with origin activity and indeed, replication origins are almost never found within actively transcribed DNA.10,20,39,40
Protein–protein interactions. The presence of other proteins could help recruit ORC and enhance origin efficiency. For example, Abf1 and the Myb protein complex bind to origins and can affect the efficiency of origin utilisation in yeast and Drosophila.41,42
Origin interference. It has been observed that in an initiation zone, firing of one replication origin appears to inhibit initiation at nearby origins, but is coordinated with neighbouring origins at more distant sites.43 This may suggest some sort of long range interaction between origins.
1.4 Licensing of DNA for Replication
It is essential for a cell to replicate its genome only once per cell cycle and this is regulated by the ability of cells to load the Mcm2-7 protein complex onto the origins (see Chapters 2 and 3). Mcm2-7 form a clamp around DNA and provide helicase activity to separate the double helix ahead of replication forks2–5 (see Chapter 3). During late M and G1 phases of the cell cycle, Mcm2-7 are loaded onto the DNA, which probably involves the clamping of the proteins around origin DNA without activation of their helicase activity (Figure 1.7). This ‘licenses’ the origin for use in the subsequent S phase. Mcm2-7 loading requires the recognition of the origin DNA by the origin recognition complex (ORC) (Figure 1.8). ORC in turn recruits proteins Cdc6 and Cdt1, which load Mcm2-7 onto DNA by hydrolysing ATP44 (see Chapter 2). The complex of ORC, Cdc6, Cdt1 and Mcm2-7 at replication origins is termed the pre-replicative complex or pre-RC. It is not clear whether ORC, Cdc6 and Cdt1 open the Mcm2-7 ring and load it around DNA, or whether they facilitate the assembly of the Mcm2-7 hexamer on DNA from different Mcm subcomplexes present in the nucleoplasm.
As a licensed origin initiates during S phase, the Mcm2-7 complex becomes activated as helicase, possibly by binding other replication fork proteins including the GINS complex.45,46 Since Mcm2-7 proteins travel with the replication fork,47,48 this means that an origin becomes unlicensed after it initiates. To prevent DNA being replicated a second time in a single cell cycle, it is therefore important to prevent re-licensing of replicated origins during S and G2 phases of the cell cycle. The mechanisms for achieving this vary in different eukaryotes.
In yeasts, CDKs which are active from late G1 to mid-mitosis, prevent licensing outside late M and G1 phase by ORC inactivation, Cdc6/Cdt1 degradation and Mcm2-7 export.49 In metazoans, the main route by which licensing is prevented during S and G2 is the downregulation of Cdt1 activity. This is brought about both by degradation of Cdt1 protein and activation of a Cdt1 inhibitory protein, geminin. Cdt1 is degraded at the end of G1 and early S phase in a process dependent on SCF-class ubiquitin ligase and cul-4 ubiquitin ligase.50–53 When geminin builds up during S, G2 and M phase, Cdt1 is stabilised by binding to geminin. As a result, licensing is inhibited and Cdt1 is protected from degradation, so that when geminin is degraded in late mitosis and G1, Cdt1 is ready for licensing.49,54
ORC can load multiple copies of Mcm2-7 complexes onto DNA, and Mcm2-7 are in∼20-fold excess over replication origins used in S phase.55–60 Cells synthesise DNA at normal rates when the level of Mcm2-7 is reduced61,62 and, in Xenopus egg extracts, normal replication rates are maintained when Mcm2-7 complex is reduced to∼two per origin.59,60,63,64 This suggests that each of the loaded Mcm2-7 complexes could act at an origin to initiate DNA replication and up to 90% of them remain dormant in a single S phase. It has recently been shown that a biological role of the excess Mcm2-7 complexes loaded during licensing is to maintain genomic stability.65 When forks stall during DNA replication, the dormant Mcm2-7 complexes can initiate and rescue DNA replication between two stalled forks, allowing the intervening DNA to be replicated. If a replication fork encounters an unfired (dormant) origin, the Mcm2-7 must be removed from it to prevent re-replication from occurring.
1.5 Initiation of DNA Replication
Mcm2-7 is loaded in an inactive form at replication origins during G1 (Figure 1.9A), and then activated to initiate DNA replication during S phase. Activation of Mcm2-7 requires both S phase CDK (cyclin E/A-Cdk2) and Dbf4-dependent kinase (DDK) activity (Figure 1.9B). DDK and CDK are expressed at relatively constant levels during the cell cycle, but the expression of regulatory subunits (Dbf4 and cyclin respectively) is increased in S phase. One of the known substrates of Cdc7-Dbf4 is the Mcm2-7 complex, the phosphorylation of which is thought to change its interaction with other replication fork proteins.66 Recently, it has been shown in budding yeast that S phase CDKs phosphorylate two replication proteins, Sld2 and Sld346,67 (Figure 1.9B). By contrast, the critical S phase substrates of CDK-cyclin activity in metazoans have not yet been identified. Phosphorylation of yeast Sld2 induces its interaction with Dbp11, and facilitates its association with the GINS complex (Sld5, Psf1, Psf2, Psf3) and Dbp11 (Figure 1.9C). At the same time, Sld3 is phosphorylated by CDK and recruited to DNA by binding to Cdc45, where phospho-Sld3 recruits Dbp11 and Sld2. Current evidence suggests that the binding of Cdc45 and GINS to Mcm2-7 activates the helicase activity of Mcm2-7.45,46 Consequently GINS and Cdc45 remain associated with Mcm2-7 and travel with the active replication forks (see Chapter 3).
1.6 Elongation of Replication Forks
Mcm2-7 and associated complex unwind the DNA in a bidirectional manner away from origins, and the single-stranded DNA (ssDNA) becomes coated with a binding protein called Replication Protein A (RPA). Via interactions with the helicase and RPA, DNA polymerase α (polα) is loaded onto the template (Figure 1.9D). A subunit of the polα holoenzyme provides primase activity and synthesises short RNA primers (8–12 nucleotides long), which are then extended by the DNA polymerase activity of polα to synthesise a short initiator DNA (iDNA) of about 30 bases. Because polα does not have a proofreading exonuclease activity, the iDNA synthesised only serves as a DNA primer for more extensive DNA synthesis by DNA polymerases with proofreading activity after polymerase switching (Chapter 4). The primer-template DNA structure is recognised and bound by a clamp-loading heteropentameric protein complex, replication factor C (RFC). This promotes structural changes in RFC, which uses the energy from ATP hydrolysis to open the ring of the trimeric sliding clamp PCNA (Chapter 2), and clamp it around the DNA, while at the same time polα is displaced. PCNA acts as a processivity factor for the elongation DNA polymerases polδ and polε by forming a ring that tethers them to the template DNA (Chapters 3, 6 and 7). Current data suggest that polε is on the leading strand and polδ is on the lagging strand.68,69 When the lagging strand polymerase encounters the 5′ end of the adjacent Okazaki fragment, the 5′ end is displaced to form a 5′ flap, which is degraded by endonuclease FEN1 (Chapter 5). Then the two Okazaki fragments are ligated by DNA ligase and the DNA polymerase is recycled to a newly loaded clamp on the lagging strand.70
Recent work in budding yeast indicates that Mcm2-7 associate with a number of proteins at forks to form the ‘replisome progression complex’ (RPC) during elongation,71 perhaps to ensure that fork progression is coordinated with DNA synthesis and other processes. In addition to Cdc45 and GINS, the RPC also contains Ctf4 which is important for establishing cohesion between sister chromatids, Tof1-Csm3, which mediate pausing of forks at DNA replication fork barriers, the checkpoint mediator Mrc1, the histone chaperone FACT, topoisomerase 1 and Mcm10.72 In addition, PCNA acts as a landing pad for other proteins during replication such as the CDK inhibitor p21, cytosine methyltransferase, the chromatin assembly factor CAF-1, DNA ligase, FEN1 and other proteins involved in DNA repair73–75 (see Chapter 3).
1.7 Termination of DNA Replication
Replication forks terminate when they encounter another replication fork coming from the opposite direction. In most cases, this occurs without the need for any special DNA sequences. In some cases, though, replication fork barriers at specific DNA sequences slow replication forks so that these sites are likely to become sites of termination. One such example is in the heavily transcribed ribosomal DNA genes, where the replication fork barrier is positioned to inhibit replication forks from moving through the gene in the opposite direction from transcription.76
The exact mechanism of how replication machinery is displaced from the DNA during termination is poorly understood. At termination, the replication forks must be disassembled. Most of the proteins released from terminated replication forks can be recycled to newly initiating forks. The Mcm2-7 proteins, however, are a special case. They are released from DNA at termination, but are not reloaded onto DNA until the next mitosis in order to prevent DNA from being replicated more than once in a single cell cycle (Figures 1.5 and 1.7). Similarly, if an active fork encounters inactive Mcm2-7 bound to a dormant replication origin, the inactive Mcm2-7 will be displaced from DNA.
1.8 Replication of Chromatin
The histones around which the DNA is wrapped (Figure 1.4) have to be displaced from the chromatin as the DNA replication forks pass, and the newly synthesised DNA has to be reassembled into chromatin (see Chapter 10). Nucleosome disruption is likely to be facilitated by ATP-dependent chromatin remodelling enzymes, such as WSTF which is targeted to replicating DNA through direct interaction with PCNA and in turn recruits ISWI-type nucleosome-remodelling factor SNF2.77 At the same time, histone chaperones facilitate the disruption of parental nucleosomes by acting as histone acceptors and hence aid the transfer of the histones onto the nascent strand.78 FACT is complexed with Mcm proteins during fork movement, and facilitates nucleosome disruption and re-deposition of H2A-H2B.79,80 CAF-1 associated with PCNA is aided by Asf1 to deposit H3-H4 onto replicating DNA.81,82
Chromatin also contains epigenetic information in addition to that from the DNA sequence which affects, amongst other things, the level of gene expression. This information is encoded by covalent modifications to histones (such as acetylation and methylation) as well as methylation of cytosine bases. When DNA is replicated, the epigenetic information must be copied too. This is achieved by association of a large number of chromatin-modulating enzymes with PCNA during replication such as DNA methyltransferase I, CAF-1 and histone deacetylase (Chapters 3 and 10). These enzymes either themselves have catalytic activity or can recruit other enzymes implicated in chromatin modification.
1.9 Chromatid Cohesion and Segregation
After replication, it is essential that the sister chromatids are identified and each of them is sent to a different daughter cell. To achieve this, replicated chromosomes remain physically attached to each other by cohesion until anaphase, when they are separated by microtubule pulling force. Sister chromatid cohesion is established by cohesin, a complex consisting of at least four proteins (Smc1, Smc3, Scc1, Scc3) that form a ring structure loaded at discrete sites along the entire length of the chromosome in G1 phase. During S phase, the cohesin complex establishes a physical link (cohesion) between replicated sister chromatids by several factors, including Eco1, Ctf4 and Ctf18. Several models have been proposed to explain how cohesin contributes structurally to sister chromatid cohesion, one of which is that the cohesin ring establishes cohesion by embracing both sister chromatids (see Chapter 9).
At the metaphase-to-anaphase transition, the separation of sister chromatids is triggered by the removal of cohesin from chromosomes. This is achieved by activation of a protease called separase, which cleaves the cohesin ring. Separase is inhibited by protein securin and, at the metaphase-to-anaphase transition, securin is degraded after APC/C dependent ubiquitination.83 Thus a complex interplay of cell cycle regulatory factors establishes cohesion concomitant with synthesis of sister chromatids and ensures separation only at the metaphase-anaphase transition (Chapter 9).
The authors are supported by Cancer Research UK grants C303/A4416 (XQG) and C303/A7399 (JJB).
Proliferating cell nuclear antigen