Skip to Main Content
Skip Nav Destination

Carbohydrate-binding-modules (CBMs) are discrete auxiliary protein modules with a non-catalytic carbohydrate-binding function and that exhibit a great diversity of binding specificities. CBMcarb-DB is a curated database that classifies the three-dimensional structures of CBM–carbohydrate complexes determined by single-crystal X-ray diffraction methods and solution NMR spectroscopy. We designed the database architecture and the navigation tools to query the database with the Protein Data Bank (PDB), UniProtKB, and GlyTouCan (universal glycan repository) identifiers. Special attention was devoted to describing the bound glycans using simple graphical representation and numerical format for cross-referencing to other glycosciences and functional data databases. CBMcarb-DB provides detailed information on CBMs and their bound oligosaccharides and features their interactions using several open-access applications. We also describe how the curated information provided by CBMcarb-DB can be integrated with AI algorithms of 3D structure prediction, facilitating structure–function studies. Also in this chapter, we discuss the exciting convergence of CBMcarb-DB with the glycan array repository, which serves as a valuable resource for investigating the specific binding interactions between glycans and various biomolecular targets. The interaction of the two fields represents a significant milestone in glycosciences. CBMcarb-DB is freely available at https://cbmdb.glycopedia.eu/ and https://cbmcarb.webhost.fct.unl.pt.

Carbohydrate-binding modules (CBMs) are a class of carbohydrate-binding proteins, defined as non-catalytic protein domains with amino acid sequences ranging from 30 to 200 amino acids.1,2  Currently, the amino acid sequence similarity dictates the classification of CBMs into different families.3  The number of newly identified CBM sequences with putative carbohydrate-binding function is growing fast due to the exponential increase of sequence information derived from microbial genomics, metagenomics and transcriptomics data.4,5  CBMs are typically associated with modular enzymes but can also be isolated from a carbohydrate-active enzyme polypeptide chain or associated with other non-enzymatic proteins.4,6,7  Several CBMs are also classified as lectins, such as Ricinus communis ricin toxin B chain in CBM family 13 and the human lectin malectin in family CBM57.8  Members of CBM family 50 are also known as LysM domains, found in peptidoglycan- and chitin-binding proteins.9  CBMs have been described to exert critical functions in enhancing enzyme catalytic efficiency (e.g. substrate targeting), as carbohydrate-sensing domains or as adhesion molecules (e.g. mediating bacterial adhesion to the host cell surface). These modules show diverse carbohydrate-binding specificities between families and even within the same family. Several characterised CBMs are associated with bacterial and fungal systems that recognise non-crystalline cellulose and chitin, insoluble storage polysaccharides, such as starch and glycogen, and different hemicellulosic substrates, such as xylans, mannans, galactans, and soluble α- and β-glucans.3  More recently, attention has also been drawn to CBMs of systems from bacterial pathogens and commensal bacteria of the human microbiome, which can recognise the complex glycosylation of mucin glycoproteins covering the epithelial gastrointestinal layer10  (see the eighth chapter of this volume for a discussion on CBMs from family 32 and binding to mammalian-type glycans). CBMs have also been predicted to play a role in the vaginal microbiome pathogen and commensal strains, e.g. CBMs annotated to families 40 and 47, of which characterised members show specificity towards human-type glycans.11 

Although simplistic, the convenient and quite visual classification of CBMs into three types: A, B, and C, according to the mode of interaction with the carbohydrates and the architecture of the binding site1,2  (see Fig. 1), facilitates understanding of the functional relevance of these modules. CBMs from type A have a planar hydrophobic surface decorated by aromatic residues that interact with flat-surface crystalline polysaccharides, such as chitin or cellulose. CBMs from type B or “endo” are classified as CBMs that bind to internal oligosaccharide sequences. These CBMs exhibit a cleft or groove that accommodates oligosaccharide chains with four or more residues and show higher binding affinities with the increase of the oligosaccharide chain length. CBMs from type C, or “exo”-type, are classified as CBMs that recognise the termini of an oligosaccharide sequence, binding in an optimal way to mono-, di- or trisaccharides due to steric restriction in the binding site. Unlike the type B CBMs, type C CBMs do not contain extended grooves in the binding site.1,2  Some CBMs may have more than one type of binding site or a combination of different types, such as CBMs classified in families 6 and 22.

Fig. 1

Schematic representation of the three modes of CBM recognition of carbohydrate substrates. The “planar” approach (classified as Type A) to crystalline polysaccharides is represented by the chitin-binding family 2 CBM from Pyrococcus furiosus (PDB ID 2CWR);12  the “endo” approach (classified as Type B) is represented by the mixed-linked beta-glucan-binding family 11 CBM from Clostridium thermocellum bound to Glcβ1, 4Glcβ1, 3Glcβ1, 4Glcβ1, 4Glcβ1, 3Glc (PDB ID 6R31)13  and exhibiting the typically extended groove (or cleft) with binding subsites capable of accommodating isolated carbohydrate chains with degrees of polymerisation longer than 4; the “exo” approach (classified as Type C) is represented by the family 9 carbohydrate-binding module from Thermotoga maritima bound to cellobiose (PDB ID 1I82)14  and exhibiting the typical pocket recognising the termini of glycans containing one to three monosaccharide units. Polypeptide chains are represented in a blue ribbon, while carbohydrate chains are represented in stick model. The picture was prepared with program UCSF Chimera.15 

Fig. 1

Schematic representation of the three modes of CBM recognition of carbohydrate substrates. The “planar” approach (classified as Type A) to crystalline polysaccharides is represented by the chitin-binding family 2 CBM from Pyrococcus furiosus (PDB ID 2CWR);12  the “endo” approach (classified as Type B) is represented by the mixed-linked beta-glucan-binding family 11 CBM from Clostridium thermocellum bound to Glcβ1, 4Glcβ1, 3Glcβ1, 4Glcβ1, 4Glcβ1, 3Glc (PDB ID 6R31)13  and exhibiting the typically extended groove (or cleft) with binding subsites capable of accommodating isolated carbohydrate chains with degrees of polymerisation longer than 4; the “exo” approach (classified as Type C) is represented by the family 9 carbohydrate-binding module from Thermotoga maritima bound to cellobiose (PDB ID 1I82)14  and exhibiting the typical pocket recognising the termini of glycans containing one to three monosaccharide units. Polypeptide chains are represented in a blue ribbon, while carbohydrate chains are represented in stick model. The picture was prepared with program UCSF Chimera.15 

Close modal

CBMs recognise carbohydrate ligands through various non-covalent interactions, such as hydrogen bonds, hydrophobic interactions, CH–π interactions, van der Waals forces and electrostatic interactions. Water-mediated hydrogen bonding has also been shown to influence protein–carbohydrate affinity.16  The ligand binding site of CBMs is usually composed of aromatic residues that form a flat, groove, or pocket-shaped surface that matches the shape and size of the ligand. Some CBMs can also disrupt the ordered structure of the polysaccharide by inserting tryptophan residues into the substrate, creating additional binding sites for the catalytic domain. The specificity and affinity of CBMs for their ligands depend on the number, type, and arrangement of the amino acids involved in the binding site. Different CBMs may have different mechanisms of ligand recognition depending on their structure, function, and evolution.

Effective bioinformatic tools are nowadays contributing to an increasing number of annotated CBMs which, together with the increasing assigment of carbohydrate-binding specificities, makes CBMs an excellent model for studying the protein–carbohydrate recognition event. CBMs comprise also valuable candidates for various biotechnological applications, as proven by the generation of engineered CBMs with new and diverse functional properties.17 

As the development of databases is providing valuable information on the proteins involved in carbohydrate recognition, there is a need to establish an integrated web-based platform that brings together structural and functional knowledge on CBMs and their carbohydrate ligands. CBMcarb-DB arises as a novel database dedicated to displaying and analysing the 3D structures of CBM–carbohydrate complexes, providing curated structural data about CBM–carbohydrate binding interactions, carbohydrate conformation, and functional information on binding specificity.

To populate CBMcarb-DB, an initial list of 638 entries containing the PDB18  IDs of CBM–carbohydrate structures was extracted from the CAZy database3  for all families assigned, using a web scraper program written specifically for this purpose. This scraper ran through the publicly available pages from the CAZy website and extracted all the relevant information for our purposes directly from the page’s HTML. This list was checked and filtered for possible duplicated IDs, resulting in 520 unique PDB entries. Each structure was then manually inspected using Mol* viewer,19  through RCSB PDB20  3D View, to confirm the presence of the CBM–carbohydrate complex and the relevance of the protein–ligand interaction. In cases where multiple proteins were present in the same structure, correctly identifying the CBM module and its respective ligand was done by checking the original publication and/or running an alignment search using InterPro.21  When distinct CBMs in complex with ligands from the same or different families occurred in the same structure, a separate entry was created for each CBM–carbohydrate complex. The corresponding information was annotated for each entry manually, cross-referencing with CAZy, PDBsum22  and Yorodumi.23  This analysis resulted in a final list of 362 curated entries that compose the original CBMcarb-DB. The upkeep of CBMcarb-DB will be continuous over time, as the information sources will be searched for new data every two weeks. Currently, information associated with each database entity is added manually. It allows for proper curation and annotation at the expense of a time lag between the date of deposition and the date of release in the database. Later, a more automated strategy for the data input will be explored.

Technically, the database was developed with PHP version 7, Bootstrap version 3 and MySQL database version 7. The interface is compatible with all devices and browsers. The pages are dynamically generated to match user-selected search criteria in the query window. Interactive graphics are developed in JavaScript on D3JS libraries version 3. A tutorial is available on the first page.

The database contains 360 manually curated three-dimensional structures of CBMs in complex with their ligands and produced by X-ray Crystallography and NMR methods. In crystal structures, the resolution value is a measure of the quality of the diffraction data collected. It represents the smallest distance between crystal lattice planes that is resolved in the diffraction pattern. High values (e.g., 4 Å) indicate a poor resolution and low values (e.g., 1.5 Å) indicate a good resolution. The median resolution for X-ray crystallography data in the PDB is 2.05 Å.

The representation of the carbohydrates complies with recommended nomenclatures and formats, the IUPAC condensed being the reference),24  and popular descriptions used to describe the nomenclature. Each sequence is also encoded in a machine-readable GlycoCT format25,26  and depicted in SNFG (Symbol Nomenclature for Glycan).27 

The content of the whole CBMcarb-DB can be visualised by scrolling through the entries displayed on the Field Search page (see Fig. 2).

Fig. 2

Overview of the content of CBMcarb-DB. Screenshot image taken from the front page of CBMcarb-DB website, depicting the search fields and two examples of results.

Fig. 2

Overview of the content of CBMcarb-DB. Screenshot image taken from the front page of CBMcarb-DB website, depicting the search fields and two examples of results.

Close modal

From the 97 CBM families assigned to date (July 2023), protein–carbohydrate structures have been solved for 46 families (see Fig. 3). Interestingly, there is a strong representation of structures of CBMs from family 13 (84), followed by CBMs from family 20 (37), 35 (25), 32 (23), 6 (18), 40 (16), 48 (16), and 50 (10). Glucose (134) and galactose (76) are the monosaccharide type mostly found in these structures, followed by N-acetyl-glucosamine (31), xylose (28), neuraminic acid (19), mannose (16) and N-acetyl-galactosamine (14). The size distribution of oligosaccharides complexed with a CBM varies from monosaccharides to octasaccharides, with a higher representation of those with a degree of polymerisation (DP) of DP2 (99) and then DP1(89), decreasing then from DP3 to DP8 (DP3 (58), DP4 (45), DP5 (28), DP6 (16), DP7 (14) and DP8 (3)). These numbers may reflect the biological and biotechnological interest of the community in investigating these CBMs, as well as the difficulty in obtaining crystals and high-resolution structures of proteins in complex with larger ligands.

Fig. 3

Screenshot image listing the CBM families represented in CBMcarb-DB.

Fig. 3

Screenshot image listing the CBM families represented in CBMcarb-DB.

Close modal

The database can be searched and explored with an advanced search tool handling a range of criteria.

The search can be performed by the protein types included in each CBM family by clicking on the desired panels to expand and access each corresponding entry or clicking on the blue search button to explore all the associated entries (see Fig. 4).

Fig. 4

Screenshot image depicting an example search within family 16 CBMs on the main page of CBMcarb-DB.

Fig. 4

Screenshot image depicting an example search within family 16 CBMs on the main page of CBMcarb-DB.

Close modal

The database can be searched and explored with an advanced search tool handling a range of criteria:

  • pdb: PDB ID.

  • cbm family: family classification as annotated in the CAZy database.

  • protein name: attributed name of the protein that includes the CBM module.

  • organism: origin of the protein.

  • resolution: value in Angstrom of the data resolution.

  • carb pdb: type of monosaccharide (3-character PDB codes).

  • carb iupac: oligosaccharide nomenclature according to the IUPAC condensed form.

  • carb length: degree of polymerisation of the oligosaccharide.

This will give access to the search by fields (see Fig. 5).

Fig. 5

View of the webpage depicting the multiple criteria of the advanced search in CBMcarb-DB.

Fig. 5

View of the webpage depicting the multiple criteria of the advanced search in CBMcarb-DB.

Close modal

Clicking on any of the panels will display the existing content. The user is invited to select the entry of his/her choice or to search a desired content by typing on the corresponding panel, for example, typing a PDB ID in the pdb panel, a type of carbohydrate monomer such as “Glc” in the carb iupac panel, or the degree of polymerisation in the carb length panel. It is a multi-field selection process aiming to search the structure and direct the user to access the desired target.

The selected criteria will lead to the corresponding entries (see Fig. 6).

Fig. 6

Summary view of the field search results using PDB ID 6R31 in the pdb search field.

Fig. 6

Summary view of the field search results using PDB ID 6R31 in the pdb search field.

Close modal

Clicking on ‘visualise the 3D structure’ yields the detailed page (see Fig. 7).

Fig. 7

Full results from the search on CBMcarb-DB, using PDB ID 6R31 in the pdb search field. More information can be obtained by clicking on the green buttons.

Fig. 7

Full results from the search on CBMcarb-DB, using PDB ID 6R31 in the pdb search field. More information can be obtained by clicking on the green buttons.

Close modal

On this page, an interactive view of the CBM–oligosaccharide three-dimensional structure is displayed. The LigPlot+28  diagram, oligosaccharide sequence according to the SNFG symbol representation, and the oligosaccharide and the CBM structures are also shown, and the user can download the respective images by clicking on them. For CBM complexes with ligands over three monosaccharides, an interactive view of the oligosaccharide is also displayed, and the user can download the corresponding PDB file by clicking on the Oligo PDB file link.

Links are also available to access other related information and resources: information about the original publication through PubMed or DOI; the protein sequence through UniProt;29  protein and carbohydrate structure and interaction using the various PDB databases sites (RCSG PDB, PDBe PDBj,30  and PDBsum, SwissModel,31  PLIP32  and PISA.33,34 

CBMcarb-DB cross-references to several other databases that rely on various strategies for visualising the interaction between carbohydrate ligands and their protein environment (see Fig. 7).

Additional information on the oligosaccharide and protein interactions can be obtained from the ligand-protein interaction profiler (PLIP) server.32  The NGL viewer35  adapted to SwissModel, displays the interactions identified by the PLIP application that calculates and displays atomic-level interactions (hydrogen bonds, hydrophobic, water bridge, etc.) occurring between carbohydrates and proteins. The specific features of the glycans interacting with the surrounding amino acid residues and possible metal ions are shown in 3D. The SwissModel application provides direct access to the PDBsum deployed by the EMBL-EBI, CATH,36  and PLIP.

A cross-link to the PISA application enables the exploration of quaternary structure formation and stability. The potential contribution of the carbohydrate to the formation of quaternary macromolecular complexes requires the evaluation of energetic stability. The structural information relates to the interfaces between the macromolecular entities, the individual monomers, and the resulting assemblies, from which complex stability can be assessed or predicted.

The interest in carbohydrate-binding modules materialises through several databases, such as CAZy, dbCAN3, and CBMDB. The CAZy database covers carbohydrate-active enzymes’ biochemical knowledge, including CBMs. CAZy revolves around amino acid sequence annotation and has grown closely related to NCBI genomes, Swiss-Prot, and UniProt, allowing the unambiguous characterisation of CAZymes via sequence accession numbers.8  dbCAN3 provides search tools and automated CAZyme annotations for newly sequenced genomes.37 

The reported CBMDB integrates more direct data related to CBMs, including sequence similarity searches, pairwise alignment, multiple sequence alignment, structure similarity searches, and phylogenetic visualisation.38  Regarding the prediction of binding sites, CBMDB classifies sequence-based methods to use the information derived from the amino acid sequence of a protein, such as conservation, similarity, or motifs.

While being more modest regarding the number of assembled information, CBMcarb-DB offers an insight inspection of detailed three-dimensional features relevant to the CBMs and interacting carbohydrates. Besides, it provides a structure-based method, where the information derived from the 3D structure of a protein, such as a shape, surface, or energy, is used to predict the binding sites. CBMcarb-DB provides a unique platform to analyse, predict, and interrogate critical structural features of carbohydrate-protein complexes.

Within the overview offered by the CBMcarb-DB, the absence of any X-ray crystallography complexes displaying the type A mode of interaction is a striking feature. Such a mode of interaction involves a planar hydrophobic surface of the CBM, decorated by aromatic residues that interact with flat crystalline polysaccharides, such as chitin or cellulose (see Fig. 1). It must be rationalised that the complexity of such interactions is not favourable for conventional X-ray crystallography investigation, both from the standpoint of co-crystallisation and 3D structure solution. Therefore, the following elements of analysis and discussion will be restricted to the type B and C modes of interaction, with a particular emphasis on type B.

CBMcarb-DB provides information on the specific carbohydrate ligands bound by determinant amino acid residues of CBMs. The database enables researchers to analyse the interactions between CBMs and carbohydrates, including hydrogen bonding (water-mediated), van der Waals contacts, and hydrophobic interactions. Understanding the binding interactions is crucial for elucidating the molecular recognition and specificity of CBMs. The most critical characteristic driving force mediating protein–carbohydrate interactions is the position and orientation of aromatic amino acid residues (Trp, Tyr and sometimes Phe) within the binding site. These essential planar residues provide a hydrophobic platform for the planar face of sugar rings, an interaction resembling hydrophobic stacking interactions.

The construction of CBMcarb-DB follows the generation of findable, accessible, interoperable and reusable (FAIR) biological data. This is an indispensable organisation to feed and train Machine Learning-based applications to predict different levels of structural organisation and characterise the unique features of the recognition and binding of carbohydrate structures by dedicated proteins, such as CBMs. They play an essential role in enhancing the catalytic efficiency of polysaccharide-degrading enzymes by promoting their proximity and affinity to the substrate and facilitating the enzyme activity. This is particularly notorious for insoluble substrates.

CBMcarb-DB facilitates studying conformational changes that occur in CBMs upon binding to carbohydrates. The comparison of the structures of apo (ligand-free) and holo (ligand-bound) CBM offers a way to identify structural rearrangements, including changes in secondary structure elements, loop movements, or global domain motions. Such analyses provide insights into the dynamic nature of CBMs and their adaptation to carbohydrate recognition. Integrating the curated data in the database with other bioinformatics tools provides a thorough understanding of CBM structure–function relationships and the molecular mechanisms underlying carbohydrate recognition. Recent developments in innovative Artificial Intelligence (AI) algorithms, such as AlphaFold,39  RoseTTAFold,40  RaptorX41  and others, open the route to improving the accuracy and efficiency of protein structure prediction.

Comparing known protein structures with AI-predicted models offers several advantages, as illustrated through the success of AlphaFold in the CASP (Critical Assessment of Structure Prediction) competitions.42  AlphaFold can predict protein structures relatively quickly and for a wide range of proteins across diverse proteomes, making it helpful in analysing many proteins in a short period, including those that are difficult to study experimentally. Such progresses cover the protein–carbohydrate complexes for which the definition of true ligand specificity is challenging. Even when the true (natural) ligand is known, these complexes are often difficult to crystallise. AI structure-prediction tools have shown impressive accuracy in predicting protein 3D structures, particularly for proteins with no similar structures in the PDB. Among many other advantages, comparing AlphaFold predictions with known structures can highlight conformational differences, such as loop movements or side-chain orientations. These variations can provide valuable insights not only into protein flexibility and dynamics but also into the mechanisms of ligand recognition.

While considering the direct interaction between CBMcarb-DB curated information and AI algorithms, several approaches are available:

  • AI structure predictions can provide high-quality 3D structures for CBMs that might not have been experimentally determined. They help visualise the CBMs’ binding sites and critical residues involved in carbohydrate recognition, enhancing structural insight into the particular CBM.

  • The predicted 3D structures can be superposed with known carbohydrate-bound CBM structures within the same family in CBMcarb-DB (e.g. the structural comparison of the AI-predicted structure of CBM11 from Microbacterium hydrocarbonoxydans with the CBM11 crystal structure from Clostridium thermocellum illustrated in Fig. 8 and described below). It allows the mapping of ligand-binding sites and provides insights into the specific interactions between CBMs and carbohydrates.

  • By comparing AI-predicted structures with known CBM–carbohydrate complexes in CBMcarb-DB, researchers can analyse the structural features that dictate ligand specificity and affinity. This information is valuable for understanding how CBMs recognise different carbohydrate ligands.

  • AI predictions of apo and holo CBMs (ligand-bound) can be compared to identify conformational changes upon carbohydrate binding. Integrating this information with CBMcarb-DB data can shed light on the dynamic behaviour of CBMs during ligand recognition.

  • AI predictions can be cross-validated with the curated CBM structures in CBMcarb-DB to assess the accuracy and reliability of the predictions. Any discrepancies can be addressed, and the predicted models can be refined using experimental data from CBMcarb-DB.

  • CBMcarb-DB may provide additional information about the presence of catalytic domains or other functional modules within CBMs (modular organisation of the genome). Integrating this information with AI predictions aids in understanding the structural basis of CBMs’ multifunctional roles.

  • AI predictions can be used to probe the binding of CBMs with carbohydrates that have not been previously characterised. It helps identify potential novel interactions and expands the knowledge of CBM–carbohydrate recognition.

  • Combining AI predictions with curated CBM structures can guide the rational design of mutations in CBMs to investigate the impact of specific amino acids on ligand binding or functional properties.

Fig. 8

Ribbon representation of the crystal structure of the Family 11 CBM from Clostridium thermocellum (CtCBM11) in complex with Glcβ1, 4Glcβ1, 3Glcβ1, 4Glcβ1, 4Glcβ1, 3Glc (PDB 6R31), in beige, superposed with the AlphaFold-predicted structure of a GH26-associated CBM11 from Microbacterium hydrocarbonoxydans (MhCBM11; A0A0M2HNJ1 in InterPro), in light blue. The alignment was performed using the MatchMaker tool from UCSF Chimera,15  and the rmsd of the superposition is 0.858 Å for 69 matching C-alpha atoms (and 5.830 Å across all 142 C-alpha pairs). The concave side of both CBM11 forms the binding cleft, where ligands are accommodated. The carbohydrate atoms and the side chains of the amino acid residues inside the binding cleft of CtCBM11 that interact with the ligand are shown as stick models and labelled in beige characters. Amino acid residues Tyr651, Tyr561, Gln567, Trp628 and Tyr651 from the MhCBM11, also in stick and labelled in blue characters, constitute the potential residues involved in ligand recognition. Calcium atoms (in green spheres) are surrounded by their coordinating residues, shown as sticks. The picture was prepared with the program UCSF Chimera.15 

Fig. 8

Ribbon representation of the crystal structure of the Family 11 CBM from Clostridium thermocellum (CtCBM11) in complex with Glcβ1, 4Glcβ1, 3Glcβ1, 4Glcβ1, 4Glcβ1, 3Glc (PDB 6R31), in beige, superposed with the AlphaFold-predicted structure of a GH26-associated CBM11 from Microbacterium hydrocarbonoxydans (MhCBM11; A0A0M2HNJ1 in InterPro), in light blue. The alignment was performed using the MatchMaker tool from UCSF Chimera,15  and the rmsd of the superposition is 0.858 Å for 69 matching C-alpha atoms (and 5.830 Å across all 142 C-alpha pairs). The concave side of both CBM11 forms the binding cleft, where ligands are accommodated. The carbohydrate atoms and the side chains of the amino acid residues inside the binding cleft of CtCBM11 that interact with the ligand are shown as stick models and labelled in beige characters. Amino acid residues Tyr651, Tyr561, Gln567, Trp628 and Tyr651 from the MhCBM11, also in stick and labelled in blue characters, constitute the potential residues involved in ligand recognition. Calcium atoms (in green spheres) are surrounded by their coordinating residues, shown as sticks. The picture was prepared with the program UCSF Chimera.15 

Close modal

The case study of family 11 CBM highlights the insights gained from such analyses and their implications for understanding CBM function and carbohydrate recognition. CBMcarb-DB reports the two crystal structures of the CBM11 from Clostridium thermocellum (CtCBM11) determined in complex with mixed-linked oligosaccharides featuring a β1,3-linkage at the reducing end (tetrasaccharide Glcβ1, 4Glcβ1, 4Glcβ1, 3Glc) and both at the reducing end and at an internal position (hexasaccharide Glcβ1, 4Glcβ1, 3Glcβ1, 4Glcβ1, 4Glcβ1, 3Glc), as informed by results from glycan microarrays.13 

The following illustrates the usage of the CBMcarb-DB throughout the comparison of CtCBM11 structures in complex with the mixed-linked ligand with the AlphaFold-predicted structure of a GH26-associated CBM11 from Microbacterium hydrocarbonoxydans (MhCBM11). The bacteria display hydrocarbon-degrading capabilities, which are of interest for bioremediation applications, especially in the cleanup of oil spills and hydrocarbon-contaminated environments.43 

The reason why hydrocarbon-degrading bacteria may possess family 26 glycoside hydrolases could be related to the presence of complex carbohydrates in the environments where they thrive or, as recently found, some members of family 26 glycoside hydrolases exhibit broader substrate specificity and can act on other molecules beyond galactomannan.44,45  The amino acid sequence alignment of the two CBM11 modules indicates a 20% of primary sequence identity. Fig. 8 shows the superposition of the CtCBM11-hexasaccharide complex (PDB ID: 6r31) with the AF-predicted structure of MhCBM11 (A0A0M2HNJ1 in InterPro), producing a rmsd of 0.858 Å for 69 C-alpha atoms (and 5.830 Å across all 142 C-alpha pairs). Despite the uncertainty on the binding specificity of MhCBM11, the superposition analysis reveals the protein stretches (loops) that may accommodate a carbohydrate ligand. It suggests that residues Tyr651, Tyr561, Gln567, Trp628 and Tyr651 from the CBM11 from M. hydrocarbonoxydans may constitute harbouring sub-sites in the protein, having a putative role in ligand recognition. Predicting the aminoacid residues responsible for binding in a yet unsolved family member can significantly advance the rational design of engineered CBM11s produced by recombinant methods.

One can predict the topology of the CBM binding site from its sequence using various computational methods. These methods typically use sequence- or structure-based features, or a combination of both, to identify and classify the binding sites. Some examples of these methods are:

  • ConCavity: this method combines evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities.

    It can accurately predict 3D ligand binding pockets and individual ligand binding residues.

  • PUResNet: this method uses a deep residual neural network to predict protein–ligand binding sites based on amino acids’ physicochemical properties and spatial arrangement.

    It can handle large and complex protein structures and outperforms existing methods on several metrics.

  • LIGSITE: this method uses a regular Cartesian grid to detect pockets and cavities on the protein surface based on solvent accessibility and geometric criteria. It can identify potential ligand binding sites in protein structures.

  • EASYMIFs and SITEHOUND: these methods use molecular interaction fields (MIFs) to identify probable binding sites based on the interaction energy between the protein and a probe molecule. They can filter and cluster the MIFs to locate the most favourable binding sites.

The dimensions of the carbohydrates, as assessed by the number of constituting monosaccharides, varies from simple mono; disaccharides to octa- and nona-saccharides; 62 oligosaccharides range between five to nine constitutive monomers. The structural diversity is somehow limited, as about 80% of the oligosaccharides are polysaccharide fragments made up of 1-4 β-linked hexoses (Glc, GlcNAc, Xyl, Man), with a rare 1-3 β-linkage occurrence. Branched structures occur in fragments of Xyloglucans. The remaining 20% are 1-4 α-linked Glc residues, either as small amylose linear fragments or cyclodextrins.

The limited number of disaccharide segments found when complexed to CBM facilitates the analysis of their bound conformations. The distribution of the experimentally observed conformations of such disaccharide segments can be schematically represented when plotted on computed potential energy surfaces as a function of the glycosidic torsion angles Φ and Ψ. Thanks to the similarity of their configuration of the β-d-Glc-1-4–β-d-Glc, the β-d-GlcNAc-1-4–β-d-GlcNAc, the β-d-Xyl-1-4–β-d-Xyl, the β-d-Man-1-4–β-d-Man, all the observed conformations can be displayed on the same potential energy surface. Other potential energy surfaces, computed for β-d-Glc-1-3–β-d-Glc and α-d-Glc-1-4–α-d-Glc, complete the analysis. As expected, the experimentally observed glycosidic torsion angles displayed some scattering when plotted on the corresponding potential energy surfaces, but they are all located on the lowest energy basins (see Fig. 9).

Fig. 9

Φ and Ψ measured in the 3D structures of oligosaccharides of CBM complexes reported on potential energy surfaces. (a) β-d-Glc-1-4–β-d-Glc, the β-d-GlcNAc-1-4–β-d-GlcNAc, the β-d-Xyl-1-4–β-d-Xyl, the β-d-Man-1-4–β-d-Man. (b) α-d-Glc-1-4–α-d-Glc. (c) β-d-Glc-1-3–β-d-Glc.

Fig. 9

Φ and Ψ measured in the 3D structures of oligosaccharides of CBM complexes reported on potential energy surfaces. (a) β-d-Glc-1-4–β-d-Glc, the β-d-GlcNAc-1-4–β-d-GlcNAc, the β-d-Xyl-1-4–β-d-Xyl, the β-d-Man-1-4–β-d-Man. (b) α-d-Glc-1-4–α-d-Glc. (c) β-d-Glc-1-3–β-d-Glc.

Close modal

A detailed inspection of the conformation of 62 hetero-pentasaccharides and higher confirm the conclusions drawn above. The finding that the type B binding mode does not induce any major/significant conformational change in the bound carbohydrate is vital in a predictive context. From the knowledge of a carbohydrate/polysaccharide sequence, several computational tools can provide reliable low-energy 3D structures: glycan-builder;46  bitbucket,47  glycam.org,48  which can serve as starting points in docking simulations.49 

The conclusions derived from the analysis of combining sites of the protein on one side and the conformations of the bound carbohydrate on the other side point towards the absence of any induced-fit effect and indicate favourable conditions to perform molecular docking.

As mentioned above, a challenge to recognising functional CBM–carbohydrate complexes is the identification of the natural carbohydrate ligands for CBMs and the experimental assignment of the binding specificity at the oligosaccharide level. Over the last two decades, carbohydrate (or glycan) microarrays have become instrumental tools for assigning carbohydrate-protein interactions. They provide targets for structural biology approaches and contribute to the elucidation of the function of diverse endogenous and microbial carbohydrate-recognition systems.13,50–55  Examples of glycan microarray analysis of bacterial CBMs are given in Fig. 10, highlighting different oligosaccharide-binding specificities.

Fig. 10

Differing specificities and chain length requirements obtained in the microarray analysis of glucan-binding CBMs from different CAZy families: family 41 and family 4 CBMs from the marine hyperthermophile Thermotoga maritima (TmCBM41 and TmCBM4-2, respectively); family 11 CBM from Clostridium thermocellum (CtCBM11); and family 6 from the aerobic soil bacterium Cellvibrio mixtus (CmCBM6-2); the inset shows the binding epitopes as determined by STD NMR of β1–3-linked d-glucose trisaccharide in the presence of TmCBM4-2 and CmCBM6-2 (dark, medium and light grey circles indicate strong, medium, and weak STD effects, respectively); NGL-lipid-linked (neoglycolipid) probe. Depiction of the glucan-oligosaccharide sequence diversity in the microarray is on the top of the panel. DP: degree of polymerisation. Reproduced from ref. 56 with permission from the Royal Society of Chemistry.

Fig. 10

Differing specificities and chain length requirements obtained in the microarray analysis of glucan-binding CBMs from different CAZy families: family 41 and family 4 CBMs from the marine hyperthermophile Thermotoga maritima (TmCBM41 and TmCBM4-2, respectively); family 11 CBM from Clostridium thermocellum (CtCBM11); and family 6 from the aerobic soil bacterium Cellvibrio mixtus (CmCBM6-2); the inset shows the binding epitopes as determined by STD NMR of β1–3-linked d-glucose trisaccharide in the presence of TmCBM4-2 and CmCBM6-2 (dark, medium and light grey circles indicate strong, medium, and weak STD effects, respectively); NGL-lipid-linked (neoglycolipid) probe. Depiction of the glucan-oligosaccharide sequence diversity in the microarray is on the top of the panel. DP: degree of polymerisation. Reproduced from ref. 56 with permission from the Royal Society of Chemistry.

Close modal

As observed in the microarray analysis, the carbohydrate chain length requirements correlate with the different modes of interaction of the CBMs with the ligand in solution using STD-NMR and their assigned functional type B and C.57  Such findings and other curated glycan microarray data on CBMs can be visualised through the Imperial College Glycosciences Lab web portal under category G.

In the foreseeable future, CBMcarb-DB will be connected to the experimental glycan microarray data. Integrating CBMcarb-DB with curated glycan microarray data will allow for atomic-level insight into the glycan-binding preferences and natural ligands for CBMs and correlate glycan sequence recognition with molecular determinants at the protein binding site.

In parallel with the implementation of glycan microarrays, there has been the much-needed development of bioinformatic tools and web resources for handling and using glycan microarray data (please see reference Bojar et al., 202258  and references therein for a comprehensive review of artificial intelligence methods and glycobioinformatics). Examples include the Carbohydrate Micro-Array Analysis and Reporting Tool (CarbArrayART), which is a distributable software tool that accommodates glycan microarray data and metadata storage with retrieval, comparison, mining, and sharing of generated data; the glycan Array Dashboard (GLAD),59  which displays glycan microarray results and searchable glycan-binding motifs; and DAGR,60  MCAW-DB,61  GlyMDB,62  and CarboGrove,63  which are databases for the interpretation of glycan microarray data that store glycan determinants for diverse recognition systems.

Another significant development has been the creation of an international glycan array repository established under the GlyGen project.64  There is a provision to release curated glycan microarray data on CBMs to this repository (only two entries are recorded for members of family 40). The data can be displayed through the CarbArrayART software interface that handles metadata compliant with the glycan microarray guidelines for Minimum Information Required for A Glycomics Experiment65  for FAIR data.

The accessibility to glycan microarray data also provides the means for developing powerful tools to predict glycan binding. These include MotifFinder, a software tool for predicting glycan-binding motifs,66  and LectinOracle, which combines transformer-based representations for proteins and graph convolutional neural networks for glycans to predict lectin-glycan binding specificities.67 

Bridging CBMcarb-DB with related artificial intelligence and glycobioinformatic tools will provide a structure-based informed rationale to add to predictors of glycan-binding specificities for newly identified CBM sequences from available genomics data. This will advance knowledge of glycan function and CBM engineering for biotechnological and biomedical applications. Importantly, these approaches will fine-tune the characterisation of multidomain enzymatic systems targeting complex carbohydrate substrates, e.g. polysaccharides or mucin glycoproteins.

In the ever-expanding field of glycobiology, the study of glycan structures and their interactions with biomolecules is paramount. The binding of carbohydrate ligands to CBMs triggers structural conformational changes that optimise the recognition and catalytic activity of CAZymes. These changes range from localised adjustments to global domain motions, and their dynamic nature allows for efficient substrate binding and enzymatic activity. Understanding the structural dynamics of CBMs and their conformational changes upon ligand binding is crucial for elucidating the molecular mechanisms underlying carbohydrate recognition and catalysis. Structural biology databases provide unique information about the three-dimensional structures of proteins, nucleic acids, and other biomolecules.

CBMcarb-DB is a valuable resource for researchers investigating CBM structures and their interactions with carbohydrates, facilitating the analysis of 3D structures, the exploration of carbohydrate-binding interactions, and the examination of conformational changes upon ligand binding. By using CBMcarb-DB and integrating its data with other bioinformatics tools researchers can better understand CBM structure–function relationships and the molecular mechanisms underlying carbohydrate recognition. The database can be utilised in conjunction with other bioinformatics resources and tools, as, for instance, by cross-referencing the CBM structures with databases such as CAZy and obtain additional information on the associated catalytic domains and substrate specificities. Furthermore, integrating structural analysis with sequence analysis tools enables a comprehensive understanding of CBM structure–function relationships. In addition to 3D structures, CBMcarb-DB provides valuable annotations and information related to CBMs. Further research combining experimental techniques and computational simulations will provide deeper insights into these fascinating processes. The representation of CBM–oligosaccharide structures covered by CBMcarb-DB so far may reflect the biological and biotechnological interest of the community in investigating these CBMs, as well as the difficulty in obtaining crystals and high-resolution structures of proteins in complex with larger ligands. CBMcarb-DB curated information will promote an understanding of the structural basis of CBMs ligand specificity, which may generate more structural and functional data to supply the database.

In a different approach, glycan microarrays serve as valuable resources for investigating the specific binding interactions between glycans and various biomolecular targets. Glycan microarrays house collections of diverse glycans immobilised on different surfaces, enabling high-throughput screening of glycan-protein interactions. By systematically probing these interactions, scientists can decipher the “glycan code” and unravel the intricate language of glycan recognition. In this chapter, we have explored the integration of structural biology databases and glycan array repositories, offering new opportunities for understanding glycan-mediated recognition events.

The authors would like to thank Doctor Luis Gomes (Department of Informatics, Faculty of Sciences, University of Lisbon) for his kind assistance in extracting the publicly available PDB IDs of CBM carbohydrate structures and Doctor José Braga (UCIBIO, FCT-NOVA) for implementing CBMcarb-DB in the FCT-NOVA web server. Funding by the Portuguese Foundation for Science and Technology (FCT-MCTES) for the project grants PTDC/BIA-MIB/31730/2017 and 2022.06104.PTDC; the Applied Molecular Biosciences Unit – UCIBIO (UIDP/04378/2020 and UIDB/04378/2020) and the Associate Laboratory Institute for Health and Bioeconomy – i4HB (LA/P/0140/2020), financed by FCT. We also acknowledge the project HORIZON-WIDERA-2021-101079417-GLYCOTwinning, financed by European funds and the COST Action Innogly CA18103 for the E-COST Virtual Mobility Grant to DR (E-COST-GRANT-CA18103-02b37039). The CrossDisciplinary Program Glyco@Alps supported this work within the framework ‘Investissement d’Avenir’ program [ANR-15IDEX-02].

1
Boraston
A. B.
,
Bolam
D. N.
,
Gilbert
H. J.
,
Davies
G. J.
,
Biochem. J.
,
2004
, vol.
382
(pg.
769
-
781
)
2
Gilbert
H. J.
,
Knox
J. P.
,
Boraston
A. B.
,
Curr. Opin. Struct. Biol.
,
2013
, vol.
23
(pg.
669
-
677
)
3
Drula
E.
,
Garron
M.-L.
,
Dogan
S.
,
Lombard
V.
,
Henrissat
B.
,
Terrapon
N.
,
Nucleic Acids Res.
,
2022
, vol.
50
(pg.
D571
-
D577
)
4
Gilbert
H. J.
,
Plant Physiol.
,
2010
, vol.
153
(pg.
444
-
455
)
5
Berg Miller
M. E.
,
Antonopoulos
D. A.
,
Rincon
M. T.
,
Band
M.
,
Bari
A.
,
Akraiko
T.
,
Hernandez
A.
,
Thimmapuram
J.
,
Henrissat
B.
,
Coutinho
P. M.
,
Borovok
I.
,
Jindou
S.
,
Lamed
R.
,
Flint
H. J.
,
Bayer
E. A.
,
White
B. A.
,
PLoS One
,
2009
, vol.
4
8
pg.
e6650
6
Yaniv
O.
,
Fichman
G.
,
Borovok
I.
,
Shoham
Y.
,
Bayer
E. A.
,
Lamed
R.
,
Shimon
L. J. W.
,
Frolow
F.
,
Acta Crystallogr., Sect. D: Biol. Crystallogr.
,
2014
, vol.
70
(pg.
522
-
534
)
7
Buist
G.
,
Steen
A.
,
Kok
J.
,
Kuipers
O. P.
,
Mol. Microbiol.
,
2008
, vol.
68
(pg.
838
-
847
)
8
Alicia Lammerts van Bueren and Elizabeth Ficko-Blean, Carbohydrate-binding modules, https://www.cazypedia.org/index.php/Carbohydrate-binding_modules, (accessed 24 July 2023).
9
T.
Ohnuma
and
T.
Taira
, Carbohydrate Binding Module Family 50, https://www.cazypedia.org/index.php/Carbohydrate_Binding_Module_Family_50, (accessed 24 July 2023).
10
Etzold
S.
,
Juge
N.
,
Curr. Opin. Struct. Biol.
,
2014
, vol.
28
(pg.
23
-
31
)
11
Bonnardel
F.
,
Haslam
S. M.
,
Dell
A.
,
Feizi
T.
,
Liu
Y.
,
Tajadura-Ortega
V.
,
Akune
Y.
,
Sykes
L.
,
Bennett
P. R.
,
MacIntyre
D. A.
,
Lisacek
F.
,
Imberty
A.
,
npj Biofilms Microbiomes
,
2021
, vol.
7
pg.
49
12
Nakamura
T.
,
Mine
S.
,
Hagihara
Y.
,
Ishikawa
K.
,
Ikegami
T.
,
Uegaki
K.
,
J. Mol. Biol.
,
2008
, vol.
381
(pg.
670
-
680
)
13
Ribeiro
D. O.
,
Viegas
A.
,
Pires
V. M. R.
,
Medeiros-Silva
J.
,
Bule
P.
,
Chai
W.
,
Marcelo
F.
,
Fontes
C. M. G. A.
,
Cabrita
E. J.
,
Palma
A. S.
,
Carvalho
A. L.
,
FEBS J.
,
2020
, vol.
287
(pg.
2723
-
2743
)
14
Notenboom
V.
,
Boraston
A. B.
,
Kilburn
D. G.
,
Rose
D. R.
,
Biochemistry
,
2001
, vol.
40
(pg.
6248
-
6256
)
15
Pettersen
E. F.
,
Goddard
T. D.
,
Huang
C. C.
,
Couch
G. S.
,
Greenblatt
D. M.
,
Meng
E. C.
,
Ferrin
T. E.
,
J. Comput. Chem.
,
2004
, vol.
25
(pg.
1605
-
1612
)
16
Tschampel
S. M.
,
Woods
R. J.
,
J. Phys. Chem. A
,
2003
, vol.
107
(pg.
9175
-
9181
)
17
Armenta
S.
,
Moreno-Mendieta
S.
,
Sánchez-Cuapio
Z.
,
Sánchez
S.
,
Rodríguez-Sanoja
R.
,
Proteins: Struct., Funct., Bioinf.
,
2017
, vol.
85
(pg.
1602
-
1617
)
18
Mir
S.
,
Alhroub
Y.
,
Anyango
S.
,
Armstrong
D. R.
,
Berrisford
J. M.
,
Clark
A. R.
,
Conroy
M. J.
,
Dana
J. M.
,
Deshpande
M.
,
Gupta
D.
,
Gutmanas
A.
,
Haslam
P.
,
Mak
L.
,
Mukhopadhyay
A.
,
Nadzirin
N.
,
Paysan-Lafosse
T.
,
Sehnal
D.
,
Sen
S.
,
Smart
O. S.
,
Varadi
M.
,
Kleywegt
G. J.
,
Velankar
S.
,
Nucleic Acids Res.
,
2018
, vol.
46
(pg.
D486
-
D492
)
19
D.
Sehnal
,
A.
Rose
,
J.
Koca
,
S.
Burley
and
S.
Velankar
,
The Eurographics Association
,
2018
, pp.
29
33
.
20
Berman
H. M.
,
Nucleic Acids Res.
,
2000
, vol.
28
(pg.
235
-
242
)
21
Blum
M.
,
Chang
H.-Y.
,
Chuguransky
S.
,
Grego
T.
,
Kandasaamy
S.
,
Mitchell
A.
,
Nuka
G.
,
Paysan-Lafosse
T.
,
Qureshi
M.
,
Raj
S.
,
Richardson
L.
,
Salazar
G. A.
,
Williams
L.
,
Bork
P.
,
Bridge
A.
,
Gough
J.
,
Haft
D. H.
,
Letunic
I.
,
Marchler-Bauer
A.
,
Mi
H.
,
Natale
D. A.
,
Necci
M.
,
Orengo
C. A.
,
Pandurangan
A. P.
,
Rivoire
C.
,
Sigrist
C. J. A.
,
Sillitoe
I.
,
Thanki
N.
,
Thomas
P. D.
,
Tosatto
S. C. E.
,
Wu
C. H.
,
Bateman
A.
,
Finn
R. D.
,
Nucleic Acids Res.
,
2021
, vol.
49
(pg.
D344
-
D354
)
22
Laskowski
R. A.
,
Jabłońska
J.
,
Pravda
L.
,
Vařeková
R. S.
,
Thornton
J. M.
,
Protein Sci.
,
2018
, vol.
27
(pg.
129
-
134
)
23
Kinjo
A. R.
,
Bekker
G.-J.
,
Suzuki
H.
,
Tsuchiya
Y.
,
Kawabata
T.
,
Ikegawa
Y.
,
Nakamura
H.
,
Nucleic Acids Res.
,
2017
, vol.
45
(pg.
D282
-
D288
)
24
McNaught
A. D.
,
Adv. Carbohydr. Chem. Biochem.
,
1997
, vol.
52
(pg.
43
-
177
)
25
Herget
S.
,
Ranzinger
R.
,
Maass
K.
,
Lieth
C.-W. V. D.
,
Carbohydr. Res.
,
2008
, vol.
343
(pg.
2162
-
2171
)
26
Lütteke
T.
,
Bohne-Lang
A.
,
Loss
A.
,
Goetz
T.
,
Frank
M.
,
von der Lieth
C.-W.
,
Glycobiology
,
2006
, vol.
16
(pg.
71R
-
81R
)
27
Neelamegham
S.
,
Aoki-Kinoshita
K.
,
Bolton
E.
,
Frank
M.
,
Lisacek
F.
,
Lütteke
T.
,
O’Boyle
N.
,
Packer
N.
,
Stanley
H. P.
,
Toukach
P.
,
Varki
A.
,
Woods
R. J.
,
Darvill
A.
,
Dell
A.
,
Henrissat
B.
,
Bertozzi
C.
,
Hart
G.
,
Narimatsu
H.
,
Freeze
H.
,
Yamada
I.
,
Paulson
J.
,
Prestegard
J.
,
Marth
J.
,
Vliegenthart
J. F. G.
,
Etzler
M.
,
Aebi
M.
,
Kanehisa
M.
,
Taniguchi
N.
,
Edwards
N.
,
Rudd
P.
,
Seeberger
P.
,
Mazumder
R.
,
Ranzinger
R.
,
Cummings
R.
,
Schnaar
R.
,
Perez
S.
,
Kornfeld
S.
,
Kinoshita
T.
,
York
W.
,
Knirel
Y.
,
Glycobiology
,
2019
, vol.
29
(pg.
620
-
624
)
28
Laskowski
R. A.
,
Swindells
M. B.
,
J. Chem. Inf. Model.
,
2011
, vol.
51
(pg.
2778
-
2786
)
29
The UniProt Consortium
,
Nucleic Acids Res.
,
2019
, vol.
47
(pg.
D506
-
D515
)
30
Kinjo
A. R.
,
Bekker
G.-J.
,
Wako
H.
,
Endo
S.
,
Tsuchiya
Y.
,
Sato
H.
,
Nishi
H.
,
Kinoshita
K.
,
Suzuki
H.
,
Kawabata
T.
,
Yokochi
M.
,
Iwata
T.
,
Kobayashi
N.
,
Fujiwara
T.
,
Kurisu
G.
,
Nakamura
H.
,
Protein Sci.
,
2018
, vol.
27
(pg.
95
-
102
)
31
Waterhouse
A.
,
Bertoni
M.
,
Bienert
S.
,
Studer
G.
,
Tauriello
G.
,
Gumienny
R.
,
Heer
F. T.
,
de Beer
T. A. P.
,
Rempfer
C.
,
Bordoli
L.
,
Lepore
R.
,
Schwede
T.
,
Nucleic Acids Res.
,
2018
, vol.
46
(pg.
W296
-
W303
)
32
Salentin
S.
,
Schreiber
S.
,
Haupt
V. J.
,
Adasme
M. F.
,
Schroeder
M.
,
Nucleic Acids Res.
,
2015
, vol.
43
(pg.
W443
-
W447
)
33
Krissinel
E.
,
J. Comput. Chem.
,
2010
, vol.
31
(pg.
133
-
143
)
34
Krissinel
E.
,
Henrick
K.
,
J. Mol. Biol.
,
2007
, vol.
372
(pg.
774
-
797
)
35
Rose
A. S.
,
Bradley
A. R.
,
Valasatava
Y.
,
Duarte
J. M.
,
Prlić
A.
,
Rose
P. W.
,
Bioinformatics
,
2018
, vol.
34
(pg.
3755
-
3758
)
36
Sillitoe
I.
,
Dawson
N.
,
Lewis
T. E.
,
Das
S.
,
Lees
J. G.
,
Ashford
P.
,
Tolulope
A.
,
Scholes
H. M.
,
Senatorov
I.
,
Bujan
A.
,
Ceballos Rodriguez-Conde
F.
,
Dowling
B.
,
Thornton
J.
,
Orengo
C. A.
,
Nucleic Acids Res.
,
2019
, vol.
47
(pg.
D280
-
D284
)
37
Zheng
J.
,
Ge
Q.
,
Yan
Y.
,
Zhang
X.
,
Huang
L.
,
Yin
Y.
,
Nucleic Acids Res.
,
2023
, vol.
51
(pg.
W115
-
W121
)
38
Lin
X.
,
Xie
X.
,
Wang
X.
,
Yu
Z.
,
Chen
X.
,
Yang
F.
,
Appl. Sci.
,
2022
, vol.
12
pg.
7842
39
Jumper
J.
,
Evans
R.
,
Pritzel
A.
,
Green
T.
,
Figurnov
M.
,
Ronneberger
O.
,
Tunyasuvunakool
K.
,
Bates
R.
,
Žídek
A.
,
Potapenko
A.
,
Bridgland
A.
,
Meyer
C.
,
Kohl
S. A. A.
,
Ballard
A. J.
,
Cowie
A.
,
Romera-Paredes
B.
,
Nikolov
S.
,
Jain
R.
,
Adler
J.
,
Back
T.
,
Petersen
S.
,
Reiman
D.
,
Clancy
E.
,
Zielinski
M.
,
Steinegger
M.
,
Pacholska
M.
,
Berghammer
T.
,
Bodenstein
S.
,
Silver
D.
,
Vinyals
O.
,
Senior
A. W.
,
Kavukcuoglu
K.
,
Kohli
P.
,
Hassabis
D.
,
Nature
,
2021
, vol.
596
(pg.
583
-
589
)
40
Baek
M.
,
DiMaio
F.
,
Anishchenko
I.
,
Dauparas
J.
,
Ovchinnikov
S.
,
Lee
G. R.
,
Wang
J.
,
Cong
Q.
,
Kinch
L. N.
,
Schaeffer
R. D.
,
Millán
C.
,
Park
H.
,
Adams
C.
,
Glassman
C. R.
,
DeGiovanni
A.
,
Pereira
J. H.
,
Rodrigues
A. V.
,
van Dijk
A. A.
,
Ebrecht
A. C.
,
Opperman
D. J.
,
Sagmeister
T.
,
Buhlheller
C.
,
Pavkov-Keller
T.
,
Rathinaswamy
M. K.
,
Dalwadi
U.
,
Yip
C. K.
,
Burke
J. E.
,
Garcia
K. C.
,
Grishin
N. V.
,
Adams
P. D.
,
Read
R. J.
,
Baker
D.
,
Science
,
2021
, vol.
373
(pg.
871
-
876
)
41
Xu
J.
,
Proc. Natl. Acad. Sci. U. S. A.
,
2019
, vol.
116
(pg.
16856
-
16865
)
42
Pereira
J.
,
Simpkin
A. J.
,
Hartmann
M. D.
,
Rigden
D. J.
,
Keegan
R. M.
,
Lupas
A. N.
,
Proteins: Struct., Funct., Bioinf.
,
2021
, vol.
89
(pg.
1687
-
1699
)
43
Schippers
A.
,
Bosecker
K.
,
Spröer
C.
,
Schumann
P.
,
Int. J. Syst. Evol. Microbiol.
,
2005
, vol.
55
(pg.
655
-
660
)
44
Patel
A. B.
,
Patel
A. K.
,
Shah
M. P.
,
Parikh
I. K.
,
Joshi
C. G.
,
Biotechnol. Appl. Biochem.
,
2016
, vol.
63
(pg.
257
-
265
)
45
Glasgow
E. M.
,
Vander Meulen
K. A.
,
Takasuka
T. E.
,
Bianchetti
C. M.
,
Bergeman
L. F.
,
Deutsch
S.
,
Fox
B. G.
,
J. Mol. Biol.
,
2019
, vol.
431
(pg.
1217
-
1233
)
46
Engelsen
S. B.
,
Hansen
P. I.
,
Pérez
S.
,
Biopolymers
,
2014
, vol.
101
(pg.
733
-
743
)
47
Kuttel
M. M.
,
Ståhle
J.
,
Widmalm
G.
,
J. Comput. Chem.
,
2016
, vol.
37
(pg.
2098
-
2105
)
48
Kirschner
K. N.
,
Yongye
A. B.
,
Tschampel
S. M.
,
González-Outeiriño
J.
,
Daniels
C. R.
,
Foley
B. L.
,
Woods
R. J.
,
J. Comput. Chem.
,
2008
, vol.
29
(pg.
622
-
655
)
49
Marchetti
R.
,
Perez
S.
,
Arda
A.
,
Imberty
A.
,
Jimenez-Barbero
J.
,
Silipo
A.
,
Molinaro
A.
,
ChemistryOpen
,
2016
, vol.
5
(pg.
274
-
296
)
50
Pedersen
H. L.
,
Fangel
J. U.
,
McCleary
B.
,
Ruzanski
C.
,
Rydahl
M. G.
,
Ralet
M.-C.
,
Farkas
V.
,
von Schantz
L.
,
Marcus
S. E.
,
Andersen
M. C. F. F.
,
Field
R.
,
Ohlin
M.
,
Knox
J. P.
,
Clausen
M. H.
,
Willats
W. G. T. T.
,
J. Biol. Chem.
,
2012
, vol.
287
(pg.
39429
-
39438
)
51
Ruprecht
C.
,
Bartetzko
M. P.
,
Senf
D.
,
Dallabernadina
P.
,
Boos
I.
,
Andersen
M. C. F.
,
Kotake
T.
,
Knox
J. P.
,
Hahn
M. G.
,
Clausen
M. H.
,
Pfrengle
F.
,
Plant Physiol.
,
2017
, vol.
175
(pg.
1094
-
1104
)
52
Rillahan
C. D.
,
Paulson
J. C.
,
Annu. Rev. Biochem.
,
2011
, vol.
80
(pg.
797
-
823
)
53
Palma
A. S.
,
Feizi
T.
,
Childs
R. A.
,
Chai
W.
,
Liu
Y.
,
Curr. Opin. Chem. Biol.
,
2014
, vol.
18
(pg.
87
-
94
)
54
Ruprecht
C.
,
Geissner
A.
,
Seeberger
P. H.
,
Pfrengle
F.
,
Carbohydr. Res.
,
2019
, vol.
481
(pg.
31
-
35
)
55
Correia
V. G.
,
Trovão
F.
,
Pinheiro
B. A.
,
Brás
J. L. A.
,
Silva
L. M.
,
Nunes
C.
,
Coimbra
M. A.
,
Liu
Y.
,
Feizi
T.
,
Fontes
C. M. G. A.
,
Mulloy
B.
,
Chai
W.
,
Carvalho
A. L.
,
Palma
A. S.
,
Microbiol. Spectrum
,
2021
, vol.
9
3
pg.
e0182621
56
D. O.
Ribeiro
,
B. A.
Pinheiro
,
A. L.
Carvalho
and
A. S.
Palma
, in
Carbohydrate Chemistry: Chemical and biological approaches
, ed.
A. P.
Rauter
,
T.
Lindhorst
and
Y.
Queneau
,
Royal Society of Chemistry
,
2017
, pp.
159
176
.
57
Palma
A. S.
,
Liu
Y.
,
Zhang
H.
,
Zhang
Y.
,
McCleary
B. V.
,
Yu
G.
,
Huang
Q.
,
Guidolin
L. S.
,
Ciocchini
A. E.
,
Torosantucci
A.
,
Wang
D.
,
Carvalho
A. L.
,
Fontes
C. M. G. A.
,
Mulloy
B.
,
Childs
R. A.
,
Feizi
T.
,
Chai
W.
,
Mol. Cell. Proteomics
,
2015
, vol.
14
(pg.
974
-
988
)
58
Bojar
D.
,
Lisacek
F.
,
Chem. Rev.
,
2022
, vol.
122
(pg.
15971
-
15988
)
59
Mehta
A. Y.
,
Cummings
R. D.
,
Bioinformatics
,
2019
, vol.
35
(pg.
3536
-
3537
)
60
Sterner
E.
,
Flanagan
N.
,
Gildersleeve
J. C.
,
ACS Chem. Biol.
,
2016
, vol.
11
(pg.
1773
-
1783
)
61
Hosoda
M.
,
Takahashi
Y.
,
Shiota
M.
,
Shinmachi
D.
,
Inomoto
R.
,
Higashimoto
S.
,
Aoki-Kinoshita
K. F.
,
Carbohydr. Res.
,
2018
, vol.
464
(pg.
44
-
56
)
62
Cao
Y.
,
Park
S.-J.
,
Mehta
A. Y.
,
Cummings
R. D.
,
Im
W.
,
Bioinformatics
,
2020
, vol.
36
(pg.
2438
-
2442
)
63
Klamer
Z. L.
,
Harris
C. M.
,
Beirne
J. M.
,
Kelly
J. E.
,
Zhang
J.
,
Haab
B. B.
,
Glycobiology
,
2022
, vol.
32
(pg.
679
-
690
)
64
York
W. S.
,
Mazumder
R.
,
Ranzinger
R.
,
Edwards
N.
,
Kahsay
R.
,
Aoki-Kinoshita
K. F.
,
Campbell
M. P.
,
Cummings
R. D.
,
Feizi
T.
,
Martin
M.
,
Natale
D. A.
,
Packer
N. H.
,
Woods
R. J.
,
Agarwal
G.
,
Arpinar
S.
,
Bhat
S.
,
Blake
J.
,
Castro
L. J. G.
,
Fochtman
B.
,
Gildersleeve
J.
,
Goldman
R.
,
Holmes
X.
,
Jain
V.
,
Kulkarni
S.
,
Mahadik
R.
,
Mehta
A.
,
Mousavi
R.
,
Nakarakommula
S.
,
Navelkar
R.
,
Pattabiraman
N.
,
Pierce
M. J.
,
Ross
K.
,
Vasudev
P.
,
Vora
J.
,
Williamson
T.
,
Zhang
W.
,
Glycobiology
,
2020
, vol.
30
(pg.
72
-
73
)
65
Liu
Y.
,
McBride
R.
,
Stoll
M.
,
Palma
A. S.
,
Silva
L.
,
Agravat
S.
,
Glycobiology
,
2016
(pg.
1
-
6
)
66
Klamer
Z.
,
Haab
B.
,
Anal. Chem.
,
2021
, vol.
93
(pg.
10925
-
10933
)
67
Lundstrøm
J.
,
Korhonen
E.
,
Lisacek
F.
,
Bojar
D.
,
Adv. Sci.
,
2022
, vol.
9
pg.
2103807
Close Modal

or Create an Account

Close Modal
Close Modal