CHAPTER 1: Introduction: Glycome and the Glyco-toolbox
-
Published:08 Apr 2019
-
Special Collection: 2019 ebook collectionSeries: Chemical Biology
S. Wang, G. A. Edmunds, L. Li, C. Chen, and P. G. Wang, in Synthetic Glycomes, ed. W. Guan, L. Li, and P. G. Wang, The Royal Society of Chemistry, 2019, pp. 1-14.
Download citation file:
Carbohydrates, nucleic acids, and proteins comprise the three major macromolecules found in mammalian systems. As genomics and proteomics represent the studies of nucleic acids and proteins, respectively, the term “glycomics” describes the systematic study of the complete repertoire of glycans. Unlike genomics and proteomics, which both have methods for sequencing, automatic synthesis, and amplification, glycomics is comparatively underdeveloped. In this chapter, the challenge, opportunities, and achievement of glycomics and the development of the “glyco-toolbox” will be discussed.
1.1 Introduction
Biologists have broken down the different aspects in the study of life into various “-omics”. While the exact boundaries are increasingly ill-defined and nebulous, an “-omic” can generally be defined as the study of the structures, functions, and – most importantly – the roles of the biomolecules that make up the “-ome” of interest. Genomics is the study of the complete genome of a living thing, encompassing DNA, its structure, and how the information contained within the genetic code is encoded, modified, and passed on. Similarly, proteomics is the study of the entirety of an organism's proteome and their interactions.
Building off these definitions, glycomics is the study of the glycome – and its interactions – present in an organism. It is a study not just of individual components, but of the entirety of glycans and how they interact. Glycomics is an underdeveloped field compared with the sister fields of genomics and proteomics. This discrepancy is largely due to the time these fields have had to mature, the general interest by researchers and funding agencies, and the ease of manipulating the biomolecules of interest. More recently, as the roles of glycans have been more understood in biological systems, developing the field of glycomics has been increasingly prioritized by researchers. Key to this progress is the development of the synthetic glycome.
With the development of new technologies, the biological functions of carbohydrates were found to span the spectrum from development, growth, maintenance, and survival of the organism. For example, studies into clinical applicability arose early with the discovery of the human blood groups, with studies showing evidence that some blood antigens were glycans, such as ABO blood type and P1Pk type. The ABO blood group is determined by the glycan information on the surface of a red blood cell. Different modifications of the core glycan structure, the H antigen, are responsible for the different ABO blood groups. When the H antigen is present on a red blood cell's surface, anti-A and anti-B antibodies will be generated, meaning only type O can be accepted when transferring the blood (Table 1.1). On the other hand, a person that displays A and B antigens – blood type AB – without the anti-A and anti-B antibodies, can accept blood from the other three blood types. Of course, considerations should be made for other kinds of blood group systems, such as the Rh system.
Blood type . | Type O . | Type A . | Type B . | Type AB . |
---|---|---|---|---|
Antigen | H antigen | A antigen | B antigen | A antigen and B antigen |
Antibodies in plasma | Anti-A and Anti-B | Anti-B | Anti-A | None |
Blood type . | Type O . | Type A . | Type B . | Type AB . |
---|---|---|---|---|
Antigen | H antigen | A antigen | B antigen | A antigen and B antigen |
Antibodies in plasma | Anti-A and Anti-B | Anti-B | Anti-A | None |
As each “-omic” field matures and progresses, success and progress in research have largely depended on three ideas: access to the desired biomolecules, accurate means for analysis of those biomolecules, and the ability to manipulate and direct these structures for desirable applications. As more tools for various -omic toolboxes are developed for access, analysis, and application, foundational understandings give way to more complete and developed concepts and designs. In order to better understand these concepts, we must first understand the large diversity within the glycome and how it impacts synthetic efforts.
1.2 Diversity of Glycans
One of the major stumbling blocks in the development of glycomics is the sheer size of the theoretical glycome. This is in large part due to the diversity seen within the glycome. Across all walks of life, the diversity of monosaccharides used as building blocks for glycan sequences is significantly larger than that for the residues in proteins or nucleic acids. Nucleic acids are composed of only four bases, A, T, G, and C for DNA and A, U, G, and C for RNA. Proteins are composed of just 20 amino acids. Both systems have one point for binding to an acceptor and another for receiving the next unit, making these molecules linear in nature. For glycans, there are more than 30 monosaccharides that exist in mammals with a multitude of potential branching points. The nine most common are displayed in Figure 1.1.
Many bacterial glycans show a wide diversity in structures and often feature sugars not found elsewhere in nature. It is widely known that the diversity of carbohydrates is much greater in bacteria than in mammals. Compared with mammalian glycans, the bacterial glycans have a more than ten-fold greater diversity at the monosaccharide level.1
The diversity of biopolymers was not only influenced by building blocks, but also the linkages between them. Even we only count nine common monosaccharides in mammals, due to numerous linkages, glycans have a more vast diversity than nucleic acids and proteins. Nucleic acids and proteins are both linear in style. Nucleotides were linked by a phosphodiester bond that was formed by the 3′ carbon atom of one sugar molecule of nucleotide and the 5′ carbon atom of another nucleotide; proteins were linked with a peptide bond and the bond was condensed by the carboxyl group of one amino acid molecule with the amino group of another amino acid molecule.
Comparing with nucleic acids and proteins, glycans have flexible linkages and branch structures because of the diverse glycosidic bonds. The glycosidic bond is formed between the anomeric carbon of one monosaccharide and a hydroxyl group of another. Firstly, based on the relative stereochemistry of the anomeric carbon and the stereo-center furthest from C1 carbon, the glycosidic bond has α/β distinction. An α-glycosidic bond is formed when both carbons have the same stereochemistry, whereas a β-glycosidic bond occurs when the two carbons have different stereochemistry. Secondly, unlike nucleotides and amino acids, which only have one possible linkage between two building blocks, monosaccharides have various forms of linkage.
For example, if only one kind of nucleotide or one amino acid is used to construct nucleic acid and protein, only one possible nucleic acid or protein can be formed. However, for glycans, due to the presence of multiple hydroxyl groups, there are numerous possible structures. For instance, both starch and cellulose are glucose polymers, with the major difference between them being the glucose linkage (Figure 1.2). Most linkages in starch are Glcα1–4Glc, while cellulose is only defined by the Glcβ1–4Glc linkage. This subtle difference of α- and β- linkage between starch and cellulose gives them disparate characteristics. Starch can be digested by humans into glucose to serve as an energy source and as metabolic precursors, while cellulose cannot be digested by humans and serves as structural support for plants. Glycogen, another kind of energy storage in humans, animals, fungi, and bacteria, is also formed by glucose with multibranched α1–4 and α1–6 linkages.
Rarely are glycan structures left bare. It is not uncommon for oligosaccharides to display various modifications in nature, including acetylation, sulfation, methylation, and more. These modifications are usually included after the oligosaccharide backbone is completed.
As an example, glycosaminoglycans (GAGs) are a class of polysaccharides known for their diversity in size and modifications (Figure 1.3). They are often highly sulfated with sulfate groups in various positions and configurations. The exact locations of these sulfations vary from one GAG family to the next and are not consistently repeated throughout an individual biomolecule. The hexosamine unit of a GAG can be either acetylated or non-acetylated, thus the variety in a single biomolecule grows immensely.
1.3 Limited Glycan Backbone Structures
Unlike nucleotides and peptides, glycans are not derived from a template. Instead, glycosyltransferases and other enzymes are the mediators that determine order and connectivity of glycans. The most predominant class of glycosyltransferases are known as Leloir glycosyltransferases, named after the Argentinian Scientist who received the 1970 Nobel Prize in Chemistry for studying their biochemical pathway. These glycosyltransferases use sugar-nucleotides as donors to transfer a monosaccharide onto an acceptor and are precise in their regio- and stereo-selectivity. Normally, each enzyme is responsible for one specific glycosylation with little promiscuity, meaning a monosaccharide is bound to a specific acceptor at a specific location with a specific regiochemistry. This has led to the common adage of “One enzyme-one linkage”. A one-enzyme-class/one-saccharide-linkage paradigm applies for almost all individual steps of glycan biosynthesis.2 The possible glycan linkages were shown in Table 1.2.
. | . | . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|
Glc . | Gal . | Man . | GlcNAc . | GalNAc . | Fuc . | GlcA . | Neu5Ac . | Xyl . | |
UDP-Glc | α1–2 3 | α1–2 7 | α1–3 4 | — | — | β1–3 8 | — | — | — |
α1–3 4 | |||||||||
α1–4 5 | |||||||||
α1–6 6 | |||||||||
UDP-Gal | β1–4 9 | α1–3 10 | — | β1–3 13 | β1–3 15 | — | — | — | β1–4 12 |
α1–4 11 | β1–4 14 | ||||||||
β1–3 12 | |||||||||
GDP-Man | — | — | α1–2 16 | α1–4 17 | — | — | — | — | — |
α1–3 16 | β1–4 16 | ||||||||
α1–6 16 | |||||||||
UDP-GlcNAc | — | β1–3 9 | β1–2 19 | β1–4 20 | β1–3 15 | β1–3 21 | α1–4 22 | — | — |
β1–6 18 | β1–4 19 | β1–6 15 | β1–4 23 | ||||||
β1–6 19 | |||||||||
UDP-GalNAc | — | α1–3 24 | — | β1–3 27 | α1–3 15 | — | β1–4 22 | — | — |
β1–3 25 | β1–4 28 | α1–6 15 | |||||||
β1–4 26 | |||||||||
GDP-Fuc | — | α1–2 29 | — | α1–3 30 | — | — | — | — | — |
α1–4 29 | |||||||||
α1–6 31 | |||||||||
UDP-GlcA | — | β1–3 12 | — | β1–3 23 | β1–3 22 | — | — | — | β1–3 19 |
β1–4 19 | β1–4 22 | β1–4 19 | |||||||
CMP-Neu5Ac | — | α2–3 32 | — | — | α2–6 15 | — | — | α2–8 33 | — |
α2–6 32 | |||||||||
UDP-Xyl | α1–3 34 | — | — | — | — | — | α1–3 19 | — | α1–3 34 |
. | . | . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|
Glc . | Gal . | Man . | GlcNAc . | GalNAc . | Fuc . | GlcA . | Neu5Ac . | Xyl . | |
UDP-Glc | α1–2 3 | α1–2 7 | α1–3 4 | — | — | β1–3 8 | — | — | — |
α1–3 4 | |||||||||
α1–4 5 | |||||||||
α1–6 6 | |||||||||
UDP-Gal | β1–4 9 | α1–3 10 | — | β1–3 13 | β1–3 15 | — | — | — | β1–4 12 |
α1–4 11 | β1–4 14 | ||||||||
β1–3 12 | |||||||||
GDP-Man | — | — | α1–2 16 | α1–4 17 | — | — | — | — | — |
α1–3 16 | β1–4 16 | ||||||||
α1–6 16 | |||||||||
UDP-GlcNAc | — | β1–3 9 | β1–2 19 | β1–4 20 | β1–3 15 | β1–3 21 | α1–4 22 | — | — |
β1–6 18 | β1–4 19 | β1–6 15 | β1–4 23 | ||||||
β1–6 19 | |||||||||
UDP-GalNAc | — | α1–3 24 | — | β1–3 27 | α1–3 15 | — | β1–4 22 | — | — |
β1–3 25 | β1–4 28 | α1–6 15 | |||||||
β1–4 26 | |||||||||
GDP-Fuc | — | α1–2 29 | — | α1–3 30 | — | — | — | — | — |
α1–4 29 | |||||||||
α1–6 31 | |||||||||
UDP-GlcA | — | β1–3 12 | — | β1–3 23 | β1–3 22 | — | — | — | β1–3 19 |
β1–4 19 | β1–4 22 | β1–4 19 | |||||||
CMP-Neu5Ac | — | α2–3 32 | — | — | α2–6 15 | — | — | α2–8 33 | — |
α2–6 32 | |||||||||
UDP-Xyl | α1–3 34 | — | — | — | — | — | α1–3 19 | — | α1–3 34 |
With the large number of sugars and the variety of possible connections between them, the theoretical glycan structures in mammalian glycome are nearly unlimited in size. However, the natural glycome is limited by the number of glycosyltransferases that nature has made available. There are 236 glycosyltransferases that humans use to create the entire glycome in the human body. This leads to a much more manageable 10 000–20 000 glycans that make up the natural glycome.35 To make matters simpler, researchers can instead focus on the ∼7000 glycans that comprise 90% of the meaningful human glycome.36 Some signature glycan epitopes are shown in Figures 1.4–1.11.
1.4 Access
The tools within the “glyco-toolbox” are still underdeveloped when compared with other “-omics”. In particular, tools to access large numbers of complex glycans of interest are largely lacking. In contrast, virtually all known peptide sequences can be synthesized via solid-phase peptide synthesis. This technology was pioneered by Merrifield in 1963 and has since developed into a mature field, such that researchers can access a specific peptide by pushing a button on a table-top peptide synthesizer.37 For the synthesis of nucleotide, as the advance of genomics, the price for gene synthesis is as low as 0.07 US dollars per base pair, which makes it feasible to access any genes of interest. Furthermore, isolating and purifying glycans from natural sources is prohibitively inefficient, due to their inherent natural complexity and diversity. To access complex glycans and glycoconjugates, numbers of chemical methodologies and enzymatic strategies have been developed, which are usually time-consuming and require years of training and experience in the specific field.
Recently, a number of chemoenzymatic approaches have been reported to access large numbers of complex glycans.38–41 For example, a systematic and efficient strategy of producing diverse asymmetrically branched N-glycans was developed by Boons' group.40 In addition, an efficient chemoenzymatic strategy, Core Synthesis/Enzymatic Extension (CSEE), was developed by the Wang Lab for rapid production of various asymmetric N-glycan isomers,41 in which eight core structures were firstly synthesized chemically then enzymatically extended to 73 complex N-glycans, including 63 isomers. Additionally, this CSEE strategy was used for the assembly of various classes of mammalian glycans, such as human milk oligosaccharides,42 and tandem epitope N-glycans.43 Most recently, an O-mannosyl glycans library was prepared by adapting this strategy.44
Programmed and/or automatic chemical synthesis of glycans were also proposed and developed to solve the unavailability of glycans, especially for non-specialists. Two decades ago, Wong developed a computer program OptiMer, a database search tool and guide for the selection of building blocks for the one-pot assembly of a desired oligosaccharide or a library of individual oligosaccharides. OptiMer evaluates a glycan target and suggests a method for a one-pot synthesis based on relative reactivity values.45 In addition, Seeberger has developed the Glyconeer, a commercially available machine that can perform automatic chemical synthesis of glycans from a large set of chemical building blocks.46
Alternatively, enzymes have been introduced into automatic glycan synthesis, which can efficiently control the reactions in a stereo- and regio-selective manner. In 2010, Nishimura and coworkers developed an artificial Golgi apparatus, which contains several recombinant mammalian glycosyltransferases in solutions. It enables the synthesis of simple oligosaccharide (e.g., sialyl Lewis X) in a fully automatable manner.47 Most recently, we have developed a machine-driven fully-automated system for oligosaccharides synthesis through enzymatic glycosylation.48 By employing a thermosensitive polymer and a commercially available peptide synthesizer, the glycosylation reactions take place in an aqueous environment controlled by the peptide synthesizer. The growing oligosaccharide chain was purified by the precipitation of the covalently attached polymer, and finally released via a specific chemistry to afford final products.
1.5 Application
It is important to note that as glycan synthetic methodologies are further developed, new inroads towards applications will open, in much the same way that therapies targeting genes and peptides have matured over the past few decades. Because of the roadblocks and hindrances naturally present in the study of glycomics, their applications are necessarily delayed. However, that has not stopped researchers from developing novel diagnostic and therapeutic methodologies.
Glycans play important roles in mature protein expression, adhesion, and cell signaling, and can even be indicators of diseases. For example, certain cancers express modified oligosaccharides on the cell surface.49 It has not escaped the attention of researchers that these near ubiquitous structures could be key markers and handles for diagnostic and therapeutic purposes. For example, P-selectin Glycoprotein Ligand 1 (PSGL-1) had been implicated in the recruitment of leukocytes to areas of inflammation and may aid cancer cells metastasize.50 A therapeutic drug based on this glycan structure may aid in preventing cancer metastasis, and as such has been the subject of intense research.51–53 As another example, Globo-H, an oligosaccharide that is overexpressed in many cancer lines, has been suggested to be a potential cancer vaccine.54,55
To illustrate the interaction between glycans and glycan binding proteins (GBPs), glycan microarrays were developed.56,57 Glycans are immobilized or “printed” onto a surface in a regular array. These arrays can then be exposed to various GBPs that may have affinities for the epitopes that these glycans display. Most recently, a new multiplex glycan bead array (MGBA), which allows simultaneous analyses of 384 samples and up to 500 glycans in a single assay, was developed.58 This technology makes high-throughput screening for binding affinity of glycans with GBPs easier.
1.6 Conclusion
Glycomics, relatively speaking, is late to the game. Genomics and proteomics have the benefit of decades of intense research and extensive funding. On the other hand, glycomics does have a few advantages. The groundwork that other biologists have made can be readily applied to the study of the glycome. New research into the structure and behavior of the cells reveals the significances of glycans in biological systems, triggering greater interest in this field from industry and academia alike. The National Institute of Health, USA, has set-up a Glycoscience Common Fund to facilitate the development of new glycomics tools. The field of glycomics is rapidly catching up with its sister fields, and it is only a matter of time before non-specialists will have the same access to the Synthetic Glycome as they do to the synthetic genome and proteome.