Amino Acids, Peptides and Proteins: Volume 43
Search and destroy: versatile proteins offer unique structural solutions against uracil in DNA
Published:09 May 2019
Beáta G. Vértessy, 2019. "Search and destroy: versatile proteins offer unique structural solutions against uracil in DNA", Amino Acids, Peptides and Proteins: Volume 43, Maxim Ryadnov, Ferenc Hudecz
Download citation file:
Despite the simplistic general view of DNA being assembled by only four deoxynucleotides, non-canonical bases constitute a significant part of the genome and play an important role in epigenetics and other physiological processes. Uracil is among the most frequently occurring non-canonical base in DNA and in order to maintain normal cell function, proteins involved in uracil-DNA metabolism require an adequate set of specific binding sites that provide ample distinctive power for nucleobases. Here we present on overview of how this distinction is governed by molecular interactions between side chain and main chain atoms of the protein binding sites and the uracil base.
1 The widening chemical space of DNA
Textbooks traditionally present DNA as a rather simple molecule consisting of deoxyribose, phosphate and four nucleobases: adenine, thymine, guanine and cytosine. This view is still dominating despite earlier and recent advances in DNA chemistry. In this textbook scheme adenine pairs with thymine whereas cytosine pairs with guanine according to the Watson–Crick base pairing rules. The double helix model is of course in full agreement with this 4-base-pattern.1 Decades of research in molecular biology and genetics could successfully rely on this model with satisfactory results. Evidence from high resolution structural biology techniques also greatly supported the model of 4-base-alphabet for DNA. Moreover, sequencing methods, let these be either traditional Sanger sequencing or next generation sequencing technologies, also strengthened the concept of the 4-base-DNA-alphabet. However, the seemingly perfect agreement between the model with four bases and the actual experimental data was simply due to a built-in bias in the experimental techniques. On the one hand, structural biology approaches either by X-ray crystallography or NMR was employed on samples containing DNA oligonucleotides conforming to the 4-base-alphabet. In almost all such cases, DNA oligonucleotides were chemically synthesized or produced in biochemical synthesis under conditions wherein only the four bases (adenine, thymine, guanine and cytosine) were present. Evidently, under such conditions, the produced oligonucleotides necessarily contained only the four usual bases.
On the other hand, the results from sequencing need also to be taken with a pinch of salt. Nowadays sequencing methods – disregarding very few exceptions – share a built-in bias, as well. Namely, these sequencing methods operate by following the process of synthesis of a new DNA strand upon a template DNA strand that is to be sequenced.2 No matter what the exact detection method may be, all of these synthesis-based techniques are unfortunately indirect. What happens through these procedures is that incorporation of each new nucleobase building block to follow the template strand sequence information is separately detected. In such a way, whenever an adenine is present in the template strand, a thymine-containing building block (i.e. dTTP) will be incorporated into the new strand that is being synthesized, and that position will be recorded as adenine in the template strand. Guanine in the template strand will direct incorporation of cytosine-containing building block, and so on, according to Watson–Crick base pairing. This process is unfortunately refractory to the presence of any unusual nucleobase in the template strand. Let it be either thymine or uracil present in the template strand, the sequencing procedure dictates incorporation of the very same adenine into the new strand. Also, methyl-cytosine will be read as cytosine, methyl-adenine will be read as adenine, as so on. Hence, the information of any unusual nucleobases eventually present in the template strand is completely lost in the sequencing result.
Still, in the recent years, it has been more and more recognized that in addition to the usual four bases, additional nucleobases may also be present in the genomic DNA.3,4 It has been long known that the chemical composition of DNA is under constant stress not just from exogenous effects such as radiation, different chemicals, etc., but also within a normal physiological environment due to metabolically produced reactive chemical species (redox moieties, free radicals, etc.).5 Especially for aerobic organisms, where molecular oxygen as well as reactive oxygen species are naturally abundant, chemical reactions of DNA bases occur quite frequently. Mitochondrial DNA is eminently known to harbor oxidatively modified bases, however, nuclear DNA is also not exempt from such modifications. Methylated bases may be produced upon reaction of DNA with alkylating agents. For a long time, non-orthodox nucleobases (i.e. those that not constituents of the four-letter-alphabet) have been regarded as mistakes only, which are continuously cleared away via the DNA repair processes and are replaced by the corresponding usual base. However, recent data are emerging that argue for the role of some of these “damaged” bases in normal physiological processes.6
Undisputedly the first non-orthodox nucleobase that gained wide recognition as a means of epigenetic regulation was 5-methyl-cytosine (5meC).7 Other methylated bases were also discovered in bacteria as elements of the restriction methylase-endonuclease system acting for an intriguing self-defense pathway against foreign DNA. In more recent years, the discovery of more and more unusual base in genomic DNA is accelerating. The rapid development of research on the widening chemical space of DNA largely profits from the highly sophisticated novel mass spectrometry methods that make it amenable to analyze the chemical composition of isolated genomic DNA in an unbiased manner.8,9
In parallel with the mass spectrometry analysis of the composition of DNA, methods focusing on detection of various unusual bases at the specific DNA sequence positions have been developed.10,11 As mentioned above, most standard sequencing methods are unfortunately not sufficient for deciphering the unusual bases. Hence, clever chemical and biotechnological methods have been designed for sequencing 5meC and its oxidized variants.
2 Uracil in DNA: occurrence and metabolism
Among the unusual DNA bases, uracil is of special significance.12 Uracil is a normal nucleobase in RNA, however, in DNA it is replaced by thymine. The exchange of uracil in RNA to thymine in DNA was presumably the result of an evolutionary pressure to increase the stability of information storage. Paradoxically, uracil on its own does not present any grave instability problem and the need to remove it from the DNA nucleobase set was quite probably due to the chemical instability of cytosine. Namely, cytosine is frequently deaminated into uracil through spontaneous physiological processes.13 This modification is mutagenic since uracil will base pair with adenine in the next replication cycle (leading to a stable point mutation: C:G→U:G→U:A→T:A). The high frequency of cytosine-deamination events leading to uracil (up to several hundred per day in a mammalian-sized genome) required an efficient system to repair this mistake.
Uracil-directed excision repair has appeared early in evolution and is present in all free-living organisms from Archea and eubacteria to eukaryotes.14 An immediate and necessary consequence of the uracil-excision repair “innovation” was that the “innocent” adenine-pairing uracils required protection against uracil-excision.15 The evolutionary solution to this problem was to label adenine-pairing uracils with a methyl group, leading to the appearance of thymine in DNA.
In addition to the cytosine deamination events, there is also another alternative route that leads to the appearance of uracil in DNA. This alternative possibility is made possible by the unfortunate suboptimal preciseness characteristic for most DNA polymerases that cannot reliably distinguish thymine from uracil. Most polymerases do not have a binding pocket for the only structural difference between thymine and uracil (i.e. the methyl group on the thymine ring).12 Consequently, it is only the cellular availability of the dTTP and dUTP building blocks that will determine if during DNA synthesis, the polymerase will insert thymine or uracil-containing moiety into the newly synthesized strand opposite to adenine in the template strand. If the concentration of dUTP within the cell is commeasurable to dTTP, DNA will be readily uracilated. During this process, uracil will be inserted to replace thymine, hence the Watson–Crick base pairing with adenine is left unperturbed: this replacement is not mutagenic. However, loss of the methyl group of thymine may still lead to alterations of interactions with DNA binding proteins. Also, physico-chemical characteristics of uracil-DNA as compared to thymine-DNA show some differences.16
Uracil-directed excision repair can and will remove both cytosine-deamination-derived as well as thymine-replacing uracils.17 The repair starts with a uracil-specific recognition process by the enzyme termed uracil-DNA glycosylase (UDG). UDG cleaves the N-glycosidic bond between the nucleobase and the deoxyribose, leaving an abasic site in the DNA. Potentially reflecting the importance and high physiological need for UDG action, this enzymatic activity can be found in many different protein families. To avoid unnecessary degradation, UDGs need to be highly specific for the uracil base. The UDG reaction is followed by the action of the abasic-site-specific AP endonuclease enzyme that cleaves the phosphodiester bind of the DNA. The leftover moieties from the abasic site are removed by a repair polymerase enzyme, that also fills in the gap by re-introducing the correct building block (cytosine opposite to guanine or thymine opposite to adenine). Finally, repair is finished by a ligase enzyme.18
High levels of dUTP as compared to dTTP in the cellular nucleotide pool drastically misdirect the repair process. Under such circumstances, the polymerase will re-introduce uracils in a repetitive manner. Re-introduced uracils will again be subjected to the excision repair. The cycles of such repetitive events will finally lead to hijacking the uracil-excision repair and transforming it into a hyperactive but futile process that may eventually lead to chromosome fragmentation and cell death. It is therefore of vital importance to strictly regulate cellular dUTP pools. The enzyme responsible for this action is dUTPase that cleaves dUTP into dUMP and inorganic pyrophosphate.12,19,20 The essential character of dUTPase activity has been experimentally proven in various organisms.21–24 The enzymatic activity is found in two highly different protein families: the trimeric and dimeric dUTPases.25,26 To avoid unnecessary and wasteful hydrolysis of high-energy containing NTPs, dUTPases are strictly specific for dUTP and provide a strong binding pocket for this substrate that also efficiently exclude other NTPs.
3 Structural solutions for uracil recognition in diverse enzyme families
In parallel to the two potential pathways leading to the appearance of uracil in DNA (i.e. cytosine deamination and thymine replacing incorporation), there are two major routes also to keep uracil out of DNA. Both of these routes rely on specific recognition of uracil. As a preventive measure at the monomeric nucleotide building block level, cellular dUTP pools are efficiently limited through the action of dUTPases. Once in DNA, uracils are recognized by UDGs that initiate excision repair.
Although dUTPases and UDGs therefore share the same function in uracil recognition, the respective enzyme families are quite different reflecting also the diverse roles in dUTP hydrolysis and hydrolytic cleavage of the N-glycosidic bond. Uracil offers a set of functional moieties that are exploited in both dUTPases and UDGs. The basic requirements for uracil recognition include accommodation of the uracil ring into an at least partially hydrophobic microenvironment and a polar set of interactions directed at the carboxy groups and the nitrogen moieties within the uracil ring. The structural solutions of the different enzymes show some clear similarities while also make use of characteristic interactions specific to each enzyme family.
3.1 Incorporation of uracil into the dUTPase active site
The dUTPase enzymes belong to one of two clearly distinct protein families.26 Most dUTPases show a highly conserved β-pleated fold realized in a deeply intertwined homotrimeric oligomer.27–33 The family of trimeric dUTPases are widely dispersed from Archea to mammalian species. In this arrangement, three active sites are formed within the homotrimer and each active site is built from residues from all the three subunits (Fig. 1A). This architecture is strongly held together by swapping β-strands between subunits. The recognition of the uracil moiety of the substrate dUTP is realized within a β-turn of each monomer in such a way that most interactions are realized through main chain atoms (Fig. 1B). The uracil ring is held in place by fixing it to glycine main chain atoms from the two side of the β-turn (protein interaction partners that constitute main chain atoms are labeled in gray-background). As such, this accommodation pattern is a nice example of a ligand binding site realized through a secondary protein structural motif. The uracil binding pocket is completed by an aromatic interaction (visualized by a transparent green shadow) provided by a phenylalanine residue from a neighboring subunit. All residues participating in uracil recognition in trimeric dUTPases are strongly conserved along the different species.12
In some microorganisms, such as Trypanosomes, Campylobacter and some Gram-negative bacteria, the dUTPase enzymatic activity resides in another protein family.34–36 Here, the functional enzyme shows a homodimeric assembly (Fig. 2A). The protein fold is exclusively set up from α-helices and there is no sequence conservation between the trimeric and dimeric dUTPases. Interestingly, some bacteriophages also encode genes for the dimeric dUTPase, and numerous Staphylococcal phages encode both the trimeric and the dimeric versions. A representative example of dimeric dUTPase is shown in Fig. 2. Panel B presents a close-up view of the uracil-accommodating interactions. These are interestingly less numerous as compared to the case of trimeric dUTPases. Polar moieties of the uracil ring are bound via H-bonding to asparagine and glutamine side chains. In addition, the aromatic interaction from a phenylalanine residue is also present.
3.2 Recognition of uracil in different families of uracil-DNA glycosylases
As to date, at least six families of uracil-DNA glycosylases have been identified in the literature.37,38 Despite considerable differences in protein structure, all of these families are suggested to have a joint evolutionary origin.38 All UDGs possess an α/β fold and there are also very well conserved characteristics for binding of uracil. Importantly, the UDG enzymes have an enlarged positively charged surface that presents the initial interaction force with DNA. UDGs bind to DNA first by electrostatic interactions provided by this positively charged patch. In this review, we only discuss those three UDG families where good resolution three-dimensional structural information has already been published for the protein complexed to uracil.
Without doubt, the isoform associated with the highest catalytic activity is the family of UNG enzymes (Fig. 3A). UNGs are capable of cleaving U from both U:A and U:G base pairs and act on both single stranded and double stranded DNA. Uracil binding in these enzymes are shown on Fig. 3B (a representative UNG enzyme is pictured). First we focus on those interactions that are well conserved for all UDG families. Among these, a phenylalanine residue provides aromatic overlap with the uracil ring. The main chain N group of this same residue is also exploited in a polar interaction to one of the carboxy groups of uracil. An asparagine side chain participates in H-bonding to one of the uracil carboxy groups and also to an N atom in the uracil ring. A histidine side chain provides an imidazole NH group to bind to the other carboxy group of the uracil. In addition to this Phe-Asn-His triad, an additional conserved feature for UDGs is that a main chain N moiety provides H-bonding to one of the carboxy groups of the uracil. For UNG, there is also a second aromatic overlap, with a tyrosine side chain.
The SMUG family of UDGs was originally suggested to be mainly active on single-stranded uracil-containing DNA, however, it was later shown to be also catalyzing uracil cleavage from double stranded DNA substrates.39,40 Despite very low sequence conservation and a markedly different fold as compared to UNGs (Fig. 4A), SMUG also possesses the Phe-Asn-His triad for uracil binding in the very same pattern (Fig. 4B). In this case, the additional main chain interactor is provided by a methionine residue.
Lastly, it is interesting to observe that two UDG families adopted an iron–sulfur prosthetic group in their protein structure (Fig. 5A).41 The iron–sulfur UDG enzymes (families 4 and 5) most probably use this group for stabilization of the protein structure. It is remarkable to note that Phe-Asn-His interaction network is fully conserved in this case as well (Fig. 5B). Additional protein atoms contributing to uracil binding are main chain atoms from two residues.
In summary, uracil accommodation in both dUTPase and UDG families share an interesting conservation when it comes to the aromatic overlap partner of the uracil ring. In almost all cases (with few exceptions), a phenylalanine ring is involved, tryptophane, tyrosine or histidine is very rarely seen. It is also a shared common characteristic that main chain protein atoms usually constitute H-bonding interaction to the uracil polar atoms. The most striking example for this pattern is seen in trimeric dUTPases where polar interactions for the uracil moiety are exclusively provided by main chain atoms from the two opposing β-strands of a well-structured β-turn. This arrangement stabilizes the binding pocket in a secondary structure protein motif.
4 Potential physiological roles for uracil in DNA
As discussed above, uracil in DNA is a frequently occurring error in DNA and its appearance induces the highly active and efficient uracil-excision repair. However, in the recent years, several data accumulated suggesting that uracil in DNA may also constitute physiological signals as well. In various physiological processes and a wide range of organisms, uracil in DNA has been implicated to serve important cellular functions. It is of special interest to follow how such signaling roles may co-exist with the excision repair process. Basically, two different scenarios provide for such “peaceful co-existence”. On the one hand, inducing uracil-excision repair at well-defined uracil sites is exploited in development. On the other hand, transient or long-term inactivation of the excision-repair system leads to stabilization of uracilated DNA.
Perhaps the best characterized developmental role of uracil presence in DNA concerns the immune system wherein diversification of immunoglobulin genes via class-switch recombination and somatic hypermutation.42,43 In these pathways, cytosine deamination is specifically induced by activation-induced deaminase enzymes leading to uracil appearance within immunoglobulin genes. Excision-repair of uracil at these sites has been suggested to be involved in generating double-stranded DNA breaks that are further processed to generate a high diversity of immunoglobulin genes. In this case, we observe a finely regulated and site-specific interplay between appearance and removal of uracil in DNA in the development of the immune system.
Periodic inactivation of the defense systems against uracil can be observed in differentiated cells and tissues that do not perform active DNA replication (i.e. non-dividing cell stage). Both the major dUTPase and UDG isoforms are cell-cycle regulated with highly attenuated expression in non-dividing cells.44,45 At this stage, lack of dUTPase leads to an elevated dUTP cellular level, and decrease in UDG activity can allow stable existence of uracilated DNA. It has been described that the HIV retrovirus upon infection of differentiated macrophages will readily use cellular dUTP to build its retro-transcribed viral DNA in a uracilated fashion.46 Uracil-DNA in the retroviral HIV genome has been suggested to regulate integration of the viral genetic information.
Another case for stage-controlled inactivation of the uracil-defense system has been presented for Drosophila melanogaster. Intriguingly, the UNG isoform (present in almost all free-living organism) is not encoded in the Drosophila genome, and dUTPase expression is greatly down-regulated in larval stages.47 Hence, the conditions for stability of uracilated DNA are present in Drosophila larvae, and in fact, it has been shown that uracil content in larval DNA is highly elevated in all larval tissues, except imaginal disks. In imaginal disks, dUTPase expression is stills strong and efficiently prevents synthesis of uracilated DNA.22,48 At the pupation stage, during metamorphosis, larval tissues with high uracil-DNA content are degraded while imaginal disks with sanitized, uracil-free DNA develop into the image. It has been hypothesized that this intriguing situation may be also present in other holometabolic insects.
Inactivation of dUTPase or UDG enzymes via protein inhibitors constitute another means to allow uracil presence in DNA.49 Several protein inhibitors against UNG (the major UDG isoform) has been discovered.50 These are mostly encoded in bacteriophages, e.g. in the PBS phages of Bacillus subtilis.51 UNG inhibitors are expressed from the phage genome in the host bacterium cell and allow synthesis of uracilated phage genome for further packaging into viral particles. This strategy is truly powerful and may lead to high uracil content in the phage DNA. Proteinaceous inhibition of dUTPase has been also described recently. It has been shown that a staphylococcal repressor protein (Stl) strongly binds and inhibits dUTPases from diverse sources.52–55
Further investigations of potential additional roles of uracil in DNA are expected to be facilitated by novel techniques to directly determine the uracil content of genomic DNA. A special emphasis is now in generating also information about genome-wide distribution of uracil moieties. As mentioned earlier in this study, usual sequencing methods cannot provide the true chemical sequence and composition of DNA and will only generate a simplified information in the context of the four orthodox DNA bases. However, several promising novel methods have been recently published to deal with this problem.56–58
List of abbreviations
major isoform of uracil-DNA-glycosylase
single-strand selective uracil-DNA glycosylase
Supported by the National Research, Development and Innovation Office of Hungary (K119493, NVKP_16-1-2016-0020, 2017-1.3.1-VKE-2017-00002, 2017-1.3.1-VKE-2017-00013, VEKOP-2.3.2-16-2017-00013 to B.G.V., NKP-2018-1.2.1-NKP-2018-00005), and the BME-Biotechnology FIKP grant of EMMI (BME FIKP-BIO).