Nucleic Acids in Chemistry and Biology
- 1.1 The Biological Importance of DNA
- 1.2 The Origins of Nucleic Acids Research
- 1.3 Early Structural Studies on Nucleic Acids
- 1.4 The Discovery of the Structure of DNA
- 1.5 The Advent of Molecular Biology
- 1.6 The Partnership of Chemistry and Biology
- 1.7 The Burgeoning World of RNA
- 1.8 Frontiers in Nucleic Acids Research
- 1.8.1 Sequencing
- 1.8.2 Nucleic Acid Therapeutics
- 1.8.3 Gene Synthesis and Gene Editing
- 1.8.4 Structural Biology of Nucleic Acids
Introduction and Overview†
Published:24 Jun 2022
G. Michael Blackburn, Martin Egli, Michael J. Gait, Jonathan K. Watts, 2022. "Introduction and Overview†", Nucleic Acids in Chemistry and Biology, G Michael Blackburn, Martin Egli, Michael J Gait, Jonathan K Watts
Download citation file:
For over 60 years, we have been fascinated and stimulated by their awareness that the study of nucleic acids is central to the knowledge of life. This chapter looks at the biological importance of DNA, the origins of nucleic acids research, early structural studies on nucleic acids and also the discovery of the structure of DNA. It then focuses on the advent of molecular biology, the partnership of chemistry and biology and then the world of RNA. The chapter concludes with a section on frontiers in nucleic acid research.
1.1 The Biological Importance of DNA
From the beginning, the study of nucleic acids has drawn together, as though by a powerful unseen force, a galaxy of scientists of the highest ability.1,2 Striving to tease apart its secrets, these talented individuals have brought with them a broad range of skills from other disciplines while many of the problems they have encountered have proved to be soluble only by new inventions. Looking at their work, one is constantly made aware that scientists in this field appear to have enjoyed a greater sense of excitement in their work than is given to most. Why?
For over 60 years, such men and women have been fascinated and stimulated by their awareness that the study of nucleic acids is central to the knowledge of life. Let us start by looking at Fred Griffith, who was employed as a scientific civil servant in the British Ministry of Health investigating the nature of epidemics. In 1923, he was able to identify the difference between a virulent, S, and a non-virulent, R, form of the pneumonia bacterium. Griffith went on to show that this bacterium could be made to undergo a permanent, heritable change from non-virulent to virulent type.3 This discovery was a bombshell in bacterial genetics.
Oswald Avery and his group at the Rockefeller Institute in New York set out to identify the molecular mechanism responsible for the change Griffith had discovered, now technically called bacterial transformation. They achieved a breakthrough in 1940 when they found that non-virulent R pneumococci could be transformed irreversibly into a virulent species by treatment with a pure sample of high molecular weight DNA.4 Avery had purified this DNA from heat-killed bacteria of a virulent strain and showed that it was active at a dilution of 1 part in 109.
Avery concluded that ‘DNA is responsible for the transforming activity’ and published that analysis in 1944, just three years after Griffith had died in a London air-raid in WWII. The staggering implications of Avery's work turned a searchlight on the molecular nature of nucleic acids and it soon became evident that ideas on the chemistry of nucleic acid structure at that time were wholly inadequate to explain such a momentous discovery. As a result, a new wave of scientists directed their attention to DNA and discovered that large parts of the accepted tenets of nucleic acid chemistry had to be set aside before real progress was possible. We need to examine some of the earliest features of that chemistry to appreciate fully the significance of later progress.
1.2 The Origins of Nucleic Acids Research
Friedrich Miescher started his research career in Tübingen by looking into the physiology of human lymph cells. In 1868, seeking a more readily available material, he began to study human pus cells which he obtained in abundant supply from the bandages discarded from the local hospital. After defatting the cells with alcohol, he incubated them with a crude preparation of pepsin from pig stomach and so obtained a grey precipitate of pure cell nuclei. Treatment of this with alkali followed by acid gave Miescher a precipitate of a phosphorus-containing substance, which he named nuclein. He later found this material to be a common constituent of yeast, kidney, liver, testicular and nucleated red blood cells.5
After Miescher moved to Basel in 1872, he found the sperm of Rhine salmon to be a more plentiful source of nuclein. The pure nuclein was a strongly acidic substance, which existed in a salt-like combination with a nitrogenous base, which Miescher crystallized and called protamine. In fact, his nuclein was really a nucleoprotein and it fell subsequently to Richard Altman in 1889 to obtain the first protein-free material, to which he gave the name nucleic acid.6
Following William Perkin's invention of mauveine in 1856, the development of aniline dyes had stimulated a systematic study of the colour-staining of biological specimens. Cell nuclei were characteristically stained by basic dyes, and around 1880 Walter Flemming applied that property in his study of the rod-like segments of chromatin (so called because of their colour-staining characteristic) that became visible within the cell nucleus only at certain stages of cell division. Flemming's speculation that the chemical composition of these chromosomes was identical with that of Miescher's nuclein was confirmed in 1900 by E. B. Wilson7 who wrote:
Now chromatin is known to be closely similar to, if not identical with, a substance known as nuclein which analysis shows to be a tolerably definite chemical compound of nucleic acid and albumin. And thus we reach the remarkable conclusion that inheritance may, perhaps, be affected by the physical transmission of a particular compound from parent to offspring.
While this insight was later to be realized in Griffith's 1928 experiments, all of this work was really far ahead of its time. We have to recognize that, at the turn of the century, tests for the purity and identity of substances were relatively primitive. Emil Fischer's classic studies on the chemistry of high molecular weight, polymeric organic molecules were in question until well into the twentieth century. Even in 1920, it was possible to argue that there were only two species of nucleic acids in nature: animal cells were believed to provide thymus nucleic acid (DNA), whilst nuclei of plant cells were thought to give pentose nucleic acid (RNA).
1.3 Early Structural Studies on Nucleic Acids
Accurate molecular studies on nucleic acids essentially date from 1909 when Levene and Jacobs began a reinvestigation of the structure of nucleotides at the Rockefeller Institute. Inosinic acid, which Liebig had isolated from beef muscle in 1847, proved to be hypoxanthine-riboside 5′-phosphate. Guanylic acid, isolated from the nucleoprotein of pancreas glands, was identified as guanine-riboside 5′-phosphate (Figure 1.1). Each of these nucleotides was cleaved by alkaline hydrolysis to give phosphate and the corresponding nucleosides, inosine and guanosine respectively. Since then, all nucleosides are characterized as the condensation products of a pentose and a nitrogenous base while nucleotides are the phosphate esters of one of the hydroxyl groups of the pentose.
Thymus nucleic acid, which was readily available from calf tissue, was found to be resistant to alkaline hydrolysis. It was only successfully degraded into deoxynucleosides in 1929 when Levene and London adopted enzymes to hydrolyse the deoxyribonucleic acid followed by mild acidic hydrolysis of the deoxynucleotides. He identified its pentose as the hitherto unknown 2-deoxy-d-ribose.8 These deoxynucleosides involved the four heterocyclic bases, adenine, cytosine, guanine and thymine, with the latter corresponding to uracil in ribonucleic acid.
Up to 1940, most groups of workers were convinced that hydrolysis of nucleic acids gave the appropriate four bases in equal relative proportions. This erroneous conclusion probably resulted from the use of impure nucleic acid or from the use of analytical methods of inadequate accuracy and reliability. It led, naturally enough, to the general acceptance of a tetranucleotide hypothesis for the structure of both thymus and yeast nucleic acids, which materially retarded further progress on the molecular structure of nucleic acids.
Several of these tetranucleotide structures were proposed. They all had four nucleosides (one for each of the bases) with an arbitrary location of the two purines and two pyrimidines. These were joined together by four phosphate residues in a variety of ways, among which there was a strong preference for phosphodiester linkages. In 1932, Takahashi showed that yeast nucleic acid contained neither pyrophosphate nor phosphomonoester functions and so disposed of earlier proposals in preference for a neat, cyclic structure which joined the pentoses exclusively using phosphodiester units (Figure 1.2).9 It was generally accepted that these bonded 5′- to 3′-positions of adjacent deoxyribonucleosides, but the linkage positions in ribonucleic acid were not known.
One property stuck out like a sore thumb from this picture: the molecular mass of nucleic acids was greatly in excess of that calculated for a tetranucleotide. The best DNA samples were produced by Einar Hammarsten in Stockholm. One of his students, Torjbörn Caspersson, showed that this material was greater in size than protein molecules. Hammarsten's DNA was examined by Rudolf Signer and colleagues in Bern whose flow-birefringence studies revealed rod-like molecules with a molecular mass of 0.5–1.0 × 106 Daltons (Da).10 The same material provided Astbury in Leeds with X-ray fibre diffraction measurements that supported Signer's conclusion.11 Finally, Levene estimated the molecular mass of native DNA at between 2 00 000 and 1 × 106, on the basis of the results of ultracentrifugation studies.
Scientists compromised: Masson Gulland, in his Tilden Lecture of 1943, suggested that the concept of nucleic acid structures of polymerized, uniform tetranucleotides was limited, but he allowed that they could ‘form a practical working hypothesis'.
This then was the position in 1944 when Avery published his great work on the transforming activity of bacterial DNA. One can sympathize with Avery's hesitance to press home his case. Levene, in the same institute, and others were strongly persuaded that the tetranucleotide hypothesis imposed an invariance on the structure of nucleic acids which denied them any role in biological diversity. In contrast, Avery's work indicated that DNA was responsible for completely transforming the behaviour of bacteria. It demanded a fresh look at the structure of nucleic acids.
1.4 The Discovery of the Structure of DNA
From the outset, it was evident that DNA exhibited greater resistance to selective chemical hydrolysis than did RNA. So, the discovery in 1935 that DNA could be cut into mononucleotides by an enzyme doped with arsenate was invaluable. Using this procedure, Klein and Thannhauser obtained the four crystalline deoxyribonucleotides whose structures (Figure 1.3) were later put beyond doubt by total chemical synthesis by Alexander Todd12 and the Cambridge school he founded in 1944. Todd established the d-configuration and the glycosylic linkage for ribonucleosides in 1951, but found the chemical synthesis of the 2′-deoxyribonucleosides more taxing. The key to success for the Cambridge group was the development of methods of phosphorylation, for example for the preparation of the 3′- and 5′- phosphates of deoxyadenosine13 (Figure 1.4).
All the facts were now available to establish the primary structure of DNA as a linear polynucleotide in which each deoxyribonucleoside is linked to the next by means of a 3′-to-5′ phosphate diester (Section 2.2.1, Figure 2.10). The presence of only diester linkages was essential to explain the stability of DNA to chemical hydrolysis, since phosphate triesters and monoesters, not to mention pyrophosphates, are more labile. The measured molecular masses for DNA of about 1 × 106 meant that a single strand of DNA would have some 3000 nucleotides. Such a size was much greater than that of enzyme molecules, but entirely compatible with Staudinger's established ideas on macromolecular structure for synthetic and natural polymers.14 However, by the mid-point of the twentieth century, chemists could advance no further with the primary structure of DNA. Neither of the key requirements for sequence determination was to hand: there were no methods for obtaining pure samples of DNA with homogeneous base sequence nor were methods available for the cleavage of DNA strands at a specific base residue. Consequently, all attention came to focus on the secondary structure of DNA.
Two independent experiments in biophysics revealed that DNA possesses an ordered secondary structure. Using a sample of DNA obtained from Hammarsten in 1938, Astbury and Bell obtained an X-ray diffraction pattern from stretched, dry fibres of DNA. From the rather obscure data they deduced “… A spacing of 3.34 Å along the fibre axis corresponds to that of a close succession of flat or flattish nucleotides standing out perpendicularly to the long axis of the molecule to form a relatively rigid structure.”14 These conclusions roundly contradicted the tetranucleotide hypothesis.
Some years later, Gulland studied the viscosity and flow-birefringence of calf thymus DNA and thence postulated the presence of hydrogen bonds linking the purine–pyrimidine hydroxyl groups and some of the amino groups. He suggested that these hydrogen bonds could involve nucleotides either in adjacent chains or within a single chain, but he somewhat hedged his bets between these alternatives.15 Sadly, Astbury returned to the investigation of proteins and Gulland died prematurely in a train derailment in 1947. Both of them left work that was vital for their successors to follow, but each contribution contained a misconception that was to prove a stumbling block for the next half-a-dozen years. Thus, Linus Pauling's attempt to create a helical model for DNA located the pentose-phosphate backbone in its core and the bases pointing outwards16 — as Astbury had decided. Gulland had subscribed to the wrong tautomeric forms for the heterocyclic bases thymine and guanine, believing them to be enolic and having hydroxyl groups. The importance of the true keto forms was only appreciated in 1952.
Erwin Chargaff began to investigate a very different type of order in DNA structure. He studied the base composition of DNA from a variety of sources using the new technique of paper chromatography to separate the products of hydrolysis of DNA and employing one of the first commercial ultraviolet spectrophotometers to quantify their relative abundance.17 His data indicated that there is a variation in base composition of DNA between species that is overridden by a universal 1 : 1 ratio of adenine with thymine and of guanine with cytosine. This meant that the proportion of purines, (A + G), is always equal to the proportion of pyrimidines, (C + T). Although the ratio (G + C) : (A + T) varies from species to species, different tissues from a single species give DNA of the same composition. Chargaff's results finally discredited the tetranucleotide hypothesis, because that called for equal proportions of all four bases in DNA.
In 1951, Francis Crick and Jim Watson joined forces in the Cavendish Laboratory in Cambridge to tackle the problem of DNA structure. Both of them were persuaded that the model-building approach that had led Pauling and Corey to the α-helix structure for peptides should work just as well for DNA. Almost incredibly, they attempted no other line of direct experimentation but drew on the published and unpublished results of other research teams in order to construct a variety of models, each to be discarded in favour of the next until they created one which satisfied all the facts.18,19
The best X-ray diffraction results were to be found in King's College, London. There, Maurice Wilkins had observed the importance of keeping DNA fibres in a moist state and Rosalind Franklin had found that the X-ray diffraction pattern obtained from such fibres showed the existence of an A-form of DNA at low humidity which changed into a B-form at high humidity. Both forms of DNA were highly crystalline and clearly helical in structure.20,21 Consequently, Franklin decided that this behaviour required the phosphate groups to be exposed to water on the outside of the helix, with the corollary that the bases were on the inside of the helix.
Watson decided that the number of nucleotides in the unit crystallographic cell favoured a double-stranded helix. Crick's physics-trained mind recognized the symmetry implications of the space-group of the A-form diffraction pattern, monoclinic C2. There had to be local twofold symmetry axes normal to the helix, a feature which called for a double-stranded helix whose two chains must run in opposite directions.
Crick and Watson thus needed merely to solve the final problem: how to construct the core of the helix by packing the bases together in a regular structure. Watson knew about Gulland's conclusions regarding hydrogen bonds joining the DNA bases. This convinced him that the crux of the matter had to be a rule governing hydrogen bonding between bases. Accordingly, Watson experimented with models using the enolic tautomeric forms of the bases (Figure 1.3) and pairing like with like. This structure was quickly rejected by Crick because it had the wrong symmetry for B-DNA. Self-pairing had to be rejected because it could not explain Chargaff's 1 : 1 base ratios, which Crick had perceived were bound to result if you had complementary base-pairing.
Based on advice from Jerry Donohue in the Cavendish Laboratory, Watson turned to manipulating models of the bases in their keto forms and paired adenine with thymine and guanine with cytosine. Almost at once he found a compellingly simple relationship involving two hydrogen bonds for an A·T pair and two or three hydrogen bonds for a G·C pair. The special feature of this base-pairing scheme is that the relative geometry of the bonds joining the bases to the pentoses is virtually identical for the A·T and G·C pairs (Figure 1.5). It follows that if a purine always pairs with a pyrimidine then an irregular sequence of bases in a single strand of DNA could nonetheless be paired regularly in the centre of a double helix and without loss of symmetry.22
Chargaff's ‘Rules’ were straightaway revealed as an obligatory consequence of a double-helical structure for DNA. Above all, since the base-sequence of one chain automatically determines that of its partner. Crick and Watson could easily visualize how one single chain might be the template for creation of a second chain of complementary base-sequence.
The structure of the core of DNA had been solved and the whole enterprise fittingly received the ultimate accolade of the scientific establishment when Crick, Watson and Wilkins shared the Nobel Prize for physiology or medicine in 1962, just four years after Rosalind Franklin's early death.
1.5 The Advent of Molecular Biology
It is common to describe the publication of Watson and Crick's paper in Nature in April 1953 as the end of the ‘classical’ period in the study of nucleic acids, up to which time basic discoveries were made by a few gifted academics in an otherwise relatively unexplored field. The excitement aroused by the model of the double helix drew the attention of a much wider scientific audience to the importance of nucleic acids, particularly because of the biological implications of the model rather than because of the structure itself. It was immediately apparent that locked into the irregular sequence of nucleotide bases in the DNA of a cell was all the information required to specify the diversity of biological molecules needed to carry out the functions of that cell. The important question now was: What was the key, the genetic code, through which the sequence of DNA could be translated into protein?23
The solution to the coding problem is often attributed to the laboratories in the USA of Marshall Nirenberg and of Severo Ochoa who devised an elegant cell-free system for translating enzymatically synthesized polynucleotides into polypeptides and who by the mid 1960s had established the genetic code for a number of amino acids.24,25 In reality, the story of the elucidation of the code involves numerous strands of knowledge obtained from a variety of workers in different laboratories. An essential contribution came from Alexander Dounce in Rochester, New York, who in the early 1950s postulated that RNA, and not DNA, served as a template to direct the synthesis of cellular proteins and that a sequence of three nucleotides might specify a single amino acid.26 Sydney Brenner and Leslie Barnett in Cambridge later (1961) confirmed the code to be both triplet and non-overlapping.27 From Robert Holley and colleagues in Cornell University, New York,28 and Hans Zachau and colleagues in Cologne,29 came the isolation and determination of the sequence of three transfer RNAs (tRNA) ‘adapter’ molecules that each carry an individual amino acid ready for incorporation into protein and which are also responsible for recognizing the triplet code on the messenger RNA (mRNA). The mRNA species contain the sequences of individual genes copied from DNA (Chapter 5). Gobind Khorana and his group in Madison, Wisconsin, chemically synthesized all 64 ribotrinucleoside diphosphates and, using a combination of chemistry and enzymology, synthesized a number of polyribonucleotides with repeating dinucleotide, trinucleotide and tetranucleotide sequences.30 These were used as synthetic mRNA to help identify each triplet in the code. This work was recognized by the award of the Nobel Prize for physiology or medicine in 1968 jointly to Holley, Khorana and Nirenberg.
Nucleic acid research in the 1950s and 1960s was preoccupied by the solution to the coding problem and the establishment of the biological roles of tRNA and mRNA. This was not surprising bearing in mind that at that time the smaller size and attainable homogeneity made isolation and purification of RNA a much easier task than it was for DNA. It was clear that in order to approach the fundamental question of what constituted a gene—a single hereditable element of DNA that up to then could be defined genetically but not chemically—it was going to be necessary to break down DNA into smaller, more tractable pieces in a specific and predictable way.
The breakthrough came in 1968 when Meselson and Yuan reported the isolation of a restriction enzyme from the bacterium Escherichia coli.31 Here at last was an enzyme, a nuclease, which could recognize a defined sequence in a DNA and cut it specifically (Section 7.4.2 and Section 13.5.1). The bacterium used this activity to break down and hence inactivate invading DNA (e.g. phage). It was soon realized that this was a general property of bacteria, and the isolation of other restriction enzymes with different specificities soon followed. However, it was not until 1973 that the importance of these enzymes became apparent. At this time, Chang and Cohen at Stanford and Helling and Boyer at the University of California were able to construct in a test tube a biologically functional DNA that combined genetic information from two different sources. This chimera was created by cleaving DNA from one source with a restriction enzyme to give a fragment that could then be joined to a carrier DNA, a plasmid. The resultant recombinant DNA was shown to be able to replicate and express itself in E. coli.32
This remarkable demonstration of genetic manipulation was to revolutionize biology. It soon became possible to dissect out an individual gene from its source DNA, to amplify it in a bacterium or other organism (cloning, Section 7.6.1) and to study its expression by the synthesis first of RNA and then of protein (Section 5.4). This single advance by the groups of Cohen and Boyer truly marked the dawn of modern molecular biology.
1.6 The Partnership of Chemistry and Biology
In the 1940s and 1950s the disciplines of chemistry and biology were so separate that it was a rare occurrence for an individual to embrace both. Two young scientists who were just setting out on their careers at that time were exceptional in recognizing the potential of chemistry in the solution of biological problems and both, in their different ways, were to have a substantial and lasting effect in the field of nucleic acids.
One was Frederick Sanger, a product of the Cambridge Biochemistry School, who in the early 1940s set out to determine the sequence of a protein, insulin. This feat had been thought unattainable, since it was widely supposed that proteins were not discrete species with defined primary sequence. Even more remarkably, he went on to develop methods for sequence determination first of RNA and then of DNA (Section 8.2). These methods involved a subtle blend of enzymology and chemistry that few would have thought possible to combine.33 The results of his efforts transformed DNA sequencing in only a few years into a routine procedure. In the late 1980s, the procedure was adapted for use in automated sequencing machines and the 1990s saw worldwide efforts to sequence whole organism genomes. In 2003, exactly 50 years after the discovery of the structure of the DNA double helix, it was announced that the human genome sequence had been completed. The award of two Nobel Prizes to Sanger (1958 and 1980) hardly seems recognition enough!
The other scientist has already been mentioned in connection with the elucidation of the genetic code. From not long after his post-doctoral studies under Alexander Todd and George Kenner in Cambridge, Gobind Khorana was convinced that chemical synthesis of polynucleotides could make an important contribution to the study of the fundamental process of information flow from DNA to RNA to protein. Having completed the work on the genetic code in the mid 1960s and aware of Holley and co-workers' recently determined (1965) sequence for an alanine tRNA,28 he then established a new goal of total synthesis of the corresponding DNA duplex, the gene specifying the tRNA. Like Sanger, he ingeniously devised a combination of nucleic acid chemistry and enzymology to form the first general strategy of gene synthesis (Section 7.6.2).34 Knowledge became available by the early 1970s about the signals required for gene expression and the newly emerging recombinant DNA methods of Cohen and Boyer allowed a second synthetic gene, this time specifying the precursor of a tyrosine suppressor tRNA (Figure 1.6), to be cloned and shown to be fully functional.32
It is ironic that even up to the early 1970s many biologists thought Khorana's gene syntheses unlikely to have practical value. This view changed dramatically in 1977 with the demonstration by the groups of Itakura (a chemist) and Boyer (a biologist) of the expression in a bacterium of the hormone somatostatin (and later insulin A and B chains) from a chemically synthesized gene.36 This work spawned the biotechnology industry and synthetic genes became used routinely in the production of proteins. Furthermore, oligodeoxyribonucleotides, the short pieces of single-stranded DNA for which Khorana developed the first chemical syntheses, become invaluable general tools in the manipulation of DNA, for example as primers in DNA sequencing and amplification through the polymerase chain reaction, as probes in gene detection and isolation and as mutagenic agents to alter the sequence of DNA. From the late 1980s, research accelerated into synthetic oligonucleotide analogues as antisense modulators of gene expression in cells and as therapeutic agents (Section 9.3) and for the construction of microarray chips for gene expression analysis.
The availability of synthetic DNA also provided new impetus in the study of DNA and RNA structure. X-ray diffraction (Section 15.4†) played an outsize role in investigations directed at the structure and function of the nucleic acids. Experiments using fibres of calf thymus DNA had led to the discovery of the double helix and the A- and B-forms of DNA (‘structure A and structure B’).20–22 Fibre diffraction was also key to the discovery of the RNA double helix37 and hybridization38 and to deriving the first model of three-stranded RNA.39
In the early 1970s, single crystal X-ray crystallographic techniques were applied to solve the structures of the dinucleoside phosphates, ApU40 and GpC,41 by Rich and co-workers in Cambridge, USA. This was followed by the complete structure of yeast phenylalanine tRNA (Figure 2.36), determined independently by Rich and colleagues42 and by Klug and colleagues43 in Cambridge, England. For the first time, the complementary base-pairing between two strands could be seen in greater detail than was previously possible from studies of DNA and RNA fibres. ApU formed a double helix by end-to-end packing of molecules, with Watson–Crick pairing clearly in evidence between each strand. The tRNA showed not only Watson–Crick pairs, but also a variety of alternative base-pairs and base triples, many of which were entirely novel (Section 2.4.7).
Then in 1978, the structure of synthetic d(pATAT) was solved by Kennard and her group in Cambridge.44 This tetramer also formed an extended double helix, but excitingly revealed that there was a substantial sequence-dependence in its conformation. The angles between neighbouring dA and dT residues were quite different between the A–T sequence and the T–A sequence elements. Soon after, Wang and colleagues discovered that synthetic d(CGCGCG) adopted a totally unpredicted, left-handed Z-DNA conformation.45 This was soon followed by the demonstration of both a B-DNA helix in a synthetic dodecamer by Dickerson and colleagues in California46 and an A-DNA helix in an octamer by Kennard and colleagues,47 all of which decisively put an end to the idea that DNA had an invariant rod-like structure. Clearly, DNA could adopt different conformations dependent on sequence and also on its external environment (Sections 2.2 and 2.3). More importantly, an immediate inference could be drawn that conformational differences in DNA (or the potential for their formation) might be recognized by other molecules. Thus, it was not long before synthetic DNA was also being used in the study of DNA binding to carcinogens and drugs (Chapters 11 and 12) and to proteins (Chapter 13).
These spectacular advances were only possible because of the equally dramatic improvements in methods of oligonucleotide synthesis that took place in the late 1970s and early 1980s. The laborious manual work of the early gene synthesis days was replaced by reliable automated DNA synthesis machines, which within hours could assemble sequences well in excess of 100 residues (Section 7.1.4). Khorana's vision of the importance of synthetic DNA has been fully realized.
1.7 The Burgeoning World of RNA
In its early days, molecular biology was a monotheistic religion. Its creed: “Information flows from DNA to RNA to protein” clearly designated the almighty as DNA.‡ Messenger RNA (mRNA) was identified as a subservient intermediate between the repository of hereditary information and the product of its sole activity, proteins, which called for the availability of enzymes as the only known biocatalysts. Jim Watson founded an RNA Tie Club in 1954 giving it the aim of solving the riddle of the RNA structure and understanding how it built proteins. In one of its meetings in 1954, Francis Crick proposed the existence of an adaptor molecule to facilitate this process, postulating it to be about 5–10 nucleotides long. Paul Zamecnik, along with Mahlon Hoagland and Mary Stephenson, in 1958 identified the existence of such adaptors as soluble RNAs, now known as transfer RNA (tRNA), and showed them to be around 100 nucleotides long with a complex role in protein biosynthesis.48 This milestone in molecular biology led in 1961 to the serendipitous discovery by Marshall Nirenberg and colleagues that polyuridylic acid acted as mRNA in a cell-free synthesis to generate exclusively polyphenylalanine.49 They thereby identified UUU as the codon for Phe, the first triplet in the Genetic Code and Nirenberg was awarded the Nobel Prize in 1968 for the Genetic Code, shared with Robert Holley and Gobind Khorana.50
For many years, tRNA and mRNA along with the RNA present in the ribosomes of bacterial and eukaryotic cells were deemed to define the roles of RNA in cellular activity. Many incremental stages of development in RNA studies have eroded this simplistic view of RNA, of which two primary ones will be described here.
The concept of a primordial RNA world was proposed by Alex Rich in 1962 and articulated by Walter Gilbert in 1986. Although RNA-based life may prove to not have been the first form to exist, evidence for an RNA world has grown strongly and gained wide acceptance.51,52 It proposes that, in an early time in the evolution of life, there was a stage when RNAs served both as the genetic material and as catalysts for a variety of biochemical reactions. A major component of this analysis was the discovery of the catalytic activity of RNA in 1982 by Tom Cech,53 for which he coined the term ‘Ribozyme’ and shared the Nobel Prize with Sidney Altman in 1989. Over evolutionary time, proteins gradually replaced RNAs as catalysts because of their vastly improved efficiency and the variety of their activities. This proposal has now been supported by a range of experiments that include one on the prebiotic synthesis of purine and pyrimidine ribonucleotides,54 on the ability of early primitive nucleotide cofactors to assist functional proteins55 and on the pre-DNA signalling activities of riboswitches (Section 6.5.2).
A key question is “How might RNA catalyse its own proliferation?” This was addressed by Bartel and Szostak in a search for an RNA molecule capable of autocatalytic replication by delivering two apparently opposing functions at different times: (i) folding into an RNA polymerase that deploys RNA as a template or (ii) unfolding to act as a template for a different RNA replicase molecule. They used an iterative in vitro selection procedure to isolate a new class of ribozyme (Section 184.108.40.206) out of a very large pool of random-sequence RNAs. Their task was to join two RNAs, aligned on an RNA template with one having a 3′-OH group and the second having a 5′-triphosphate. That setup was designed to mimic the behaviour of an RNA polymerase. This in vitro evolution of a new ribozyme population led to a catalytic ligation activity of 7 × 106 times faster than the uncatalysed reaction rate.56 The “class I” RNA ligase is a robust enzyme with a turnover number (kcat) of 14 min−1 and a Michaelis constant (Km) of 9 µM, having been obtained from a starting population of approximately 1015 random sequence 220 mers (Figure 1.7).
If RNA can catalyse its own replication, the question that follows is: “Can a self-sustaining RNA system be capable of Darwinian evolution?” This was explored by Gerald Joyce in an examination of an RNA enzyme that catalyses the replication of RNA molecules, including the RNA enzyme itself! He used a cross-catalytic system that involved two RNA enzymes to catalyse the synthesis of each other from a total of four component substrates. He found that the molecules reproduced with high fidelity and occasionally gave rise to recombinants that were also capable of replication. Indeed, over many “generations” of selective amplification, new variants arose and became dominant in the population because of their relative fitness under the reaction conditions chosen.57
These two examples represent multiple studies to establish the existence of a primordial RNA world as a hypothetical era when RNA served the dual purpose of information and function, being both genotype and phenotype. Today we live in a second RNA world in which life's biological systems use RNA to play multiple active roles. After years of being relegated to the role of passive player in gene expression, the growing analysis of noncoding RNAs (Chapter 6) has uncovered multiple roles for RNA distributed across all living systems.
For example, less than 2% of the human genome codes for protein. Most of the rest was long assumed to be “junk DNA” – the messy result of unguided evolution. Seminal examples of small and long noncoding RNAs were discovered in the early 1990s (Sections 6.2 and 6.3) but it was not clear until some 20 years later that the vast majority of the genome is transcribed in some context, and over 80% of the genome is functional – much of this in the form of regulatory noncoding RNA.58 The pervasiveness and importance of RNA-based regulation has even led some to describe the human genome as an RNA machine.59 Therefore, this second RNA world is all around us, and while we do not have all the answers to how it works, our interrogation of this world will continue to refine our understanding of it.
1.8 Frontiers in Nucleic Acids Research
Nucleic acids are unique among molecules, bringing together a wonderfully strategic set of properties: they can be copied (i.e. amplified) using cells or enzymes. They can be extracted from organisms or, just as well, chemically synthesized in native or modified forms. In complementarity with these two forms of ‘writing’, they are also amenable to ‘reading’ – i.e. their sequence can readily be determined at very high throughput. When single-stranded, they can fold into complex structures with specific binding and catalytic activities, but in the presence of a complementary strand they generally adopt a duplex structure with broadly predictable properties. Nucleic acids even encode the synthesis of an entirely different class of molecules (i.e. proteins). The pursuit of basic research in nucleic acids and the application of nucleic acids to technological needs is therefore a wonderfully dynamic field, as each advance in one of the above areas opens new possibilities to dig deeper in exploring the other related properties. We highlight here four broad areas of nucleic acids research undergoing a particularly robust period of expansion and evolution at present.
Fifty years after Sanger's sequencing of bovine insulin and the discovery of the structure of the DNA double helix and only 25 years since the first simple bacteriophage genome sequences were obtained, the complete human genome sequence was announced by a consortium in April 2003.60 The price tag for the sequencing of the approximately 3 billion bases was between 500 million and a billion US dollars, although the total costs for funding the human genome project, including sequencing, physical and genetic mapping, technology development, program management and so forth, were probably closer to 3 billion dollars.
Today, advances in sequencing technology (Chapter 8) have lowered the cost of sequencing a whole exome (i.e. the whole protein-coding part of the genome) to less than $1000 and it takes only a single day to do the job! The genomes of thousands of eukaryotic and prokaryotic organisms are now deposited in publicly accessible databases (https://www.ncbi.nlm.nih.gov/genome/) and comprise model organisms such as mice, nematodes and zebrafish, agriculturally useful plants and animals and parasites, such as Plasmodium falciparum. The sequencing and bioinformatics revolutions have paved the way for major advances in evolutionary biology, medicine, archaeology, ancestral research, criminology and countless other fields. Commercial services provide insight into one's forebears and linkages to close and distant relatives across the world for a little spit in a tube and a small fee. Few among us would have expected that they carry hundreds of variants in their DNA that trace back to the Neanderthals, as revealed by the paleogenetic work of Svante Pääbo who sequenced the genome of this close relative of Homo sapiens.61 Even more ancient DNA samples can also be sequenced, with the current record (2021) held by million-year-old DNA from the molars of two species of mammoth, an approach that has allowed detailed insights into evolutionary relationships.62 Genetic genealogy and DNA sequence information from websites have helped reopen cold cases and identify suspects that had escaped justice for decades.
Precision medicine and personalized medicine programmes are now commonplace in medical centres. Geneticists diagnose patient-specific genetic diseases, while oncological tumour boards use genomic testing of biopsy tissues to identify mutations and decide on the best treatment options for individual cancer patients. RNA sequencing, as for DNA, has become much faster and cheaper and RNA-Seq63 platforms now routinely allow affordable whole-transcriptome sequencing. Indeed, researchers can use RNA sequencing approaches to count transcripts as a means of measuring gene expression levels across the whole transcriptome.
1.8.2 Nucleic Acid Therapeutics
The past decade has seen an explosion in the area of nucleic acid therapeutics (Chapter 9). Chemically modified oligonucleotides with high stability and pairing affinity and appropriate pharmacokinetics and pharmacodynamics, have been key enablers of clinical success. Eleven oligonucleotides had been given the green light by US and European regulators by the end of 2020,64,65 just over 40 years after the first demonstration of antisense inhibition of gene expression by Zamecnik and Stephenson.66,67 The approved drugs include antisense, aptamer, splice-switching and small interfering RNA (siRNA) oligonucleotides, with a wide range of targets that include wet age-related macular degeneration, spinal muscular atrophy, muscular dystrophy, familial hypercholesterolemia, hereditary transthyretin-mediated amyloidosis, porphyria and primary hyperoxaluria type 1. Andrew Fire and Craig Mello received the Nobel Prize for the discovery of RNA interference in the year of publication of the third edition of this book. It is encouraging that before the publication of this fourth edition, no fewer than four siRNA drugs have been approved for clinical use including the first oligonucleotide drug that may be given to millions of patients.68,69
Great advances have also been made in the chemical synthesis of native and chemically modified mRNAs, such that mRNA-based therapeutics have now become a reality. The COVID-19 pandemic spurred the extremely rapid development of mRNA vaccines that encode the viral spike protein, with approval of two independent mRNA vaccines in the US and Europe at the end of 2020, less than a year after sequence information for SARS-CoV-2 became available.70 Virally delivered gene therapies, also, showed remarkable progress with multiple interventions now clinically approved. As platform technologies for clinically delivering oligonucleotides, mRNAs and genes continue to demonstrate clinical proof of concept, they will be applied more rapidly to new diseases.
1.8.3 Gene Synthesis and Gene Editing
The research and technologies described above all require nucleic acids synthesis. In this area, too, there has been significant progress. Just as with sequencing, the cost per nucleotide for nucleic acid synthesis keeps falling. The turnaround time for whole-gene DNA synthesis using a commercial source is often less than a week and the price tag is so low that cloning is often not worth the effort anymore. Unless a laboratory is engaged in the preparation of chemically modified oligonucleotides and/or hundreds of different strands per month, there is no need to operate an oligonucleotide synthesizer: reliance on a trusted vendor will save both time and money.
No discussion of tools and technical advances in nucleic acid research would be complete without one of the most exciting discoveries of the last decade, that has far-reaching consequences for biology, medicine, genetics, agriculture and beyond. These raise intricate ethical questions, for example, the clustered regularly interspaced short palindromic repeats–CRISPR-associated protein 9 (CRISPR–Cas9) gene editing tool (Section 9.8).71 The 2020 Nobel Prize for chemistry was awarded to Emmanuelle Charpentier and Jennifer Doudna for showing that this bacterial immune system is guided by an RNA sequence and can be programmed to cleave DNA from any species. First and foremost, this is a triumph for basic research and a reminder that powerful insights in biology are often the result of focusing our attention on apparently ‘simple’ organisms. Also, beyond the realm of basic research and tools for in vitro and cell-based applications, it is likely that CRISPR–Cas9 will have a transformative effect on medicine, with clinical trials already underway for cancer, sickle cell anaemia and other conditions.
1.8.4 Structural Biology of Nucleic Acids
There has been a dramatic revolution in life science in structural biology that has been enabled by remarkable improvements in cryogenic electron microscopy (cryo-EM) technology (Section 15.5†). Just a decade ago most cryo-EM structures, unless they concerned highly symmetrical viruses, were of low resolution, typically less then 10 Å. However, multiple advances, among them the development of so-called direct electron detectors, now afford routine high-resolution cryo-EM structures that rival the resolutions of X-ray crystal structures (see the book frontispiece for an example of a 3.1 Å cryo-EM model). Cryo-EM now allows the detailed visualization of large multi-protein assemblies and complex nucleic acid–protein machines, such as the ribosome, at single-nucleotide resolution (Section 5.4 and Section 13.4.5). Thus, cryo-EM results provide insights into long-standing questions regarding the mechanisms of translation, frameshifting and proof-reading as well as useful insights into molecular recognition, which can lead on to new approaches for drug design. With improvements in diffraction technology, X-ray crystallography will remain the chief method for the determination of 3D structures of biological macromolecules smaller than 200 kDa (Section 15.4†) in the short term.72 Cryo-EM will continue to be the dominant technique for solving the structures of proteins and protein–nucleic acid complexes larger than 200 kDa, and also will become increasingly important for analysis of the structures of smaller proteins and RNAs of biological interest (Section 15.5†).73
The heady days of the discovery of the double helix and the elucidation of the genetic code are behind us. In their place we have entered even more exciting times when many more explorers now have the opportunity to answer fundamental questions about genetic structure and function, treat disease and build other biotechnological tools.
“You ain't heard nothin' yet” (Al Jolson, The Jazz Singer; July 1927).
Chapter 15 is available at http://pubs.rsc.org/en/content/ebook/978-1-78801-904-0
It is perhaps symbolic that Francis Crick resigned his Fellowship of Churchill College because he wanted it to be a secular institution within the University of Cambridge.