- 1 Introduction
- 2 Coarse-grained modelling: basic ideas
- 3 Protein representations
- 3.1 Atomistic vs. coarse-grained modelling
- 3.2 Elastic network models
- 3.3 Force-field based approaches
- 3.4 Calibrating a coarse grained model
- 3.5 Integrative multi-scale modelling
- 4 Lipids and membranes
- 4.1 Continuum models
- 4.2 Atomistic models
- 4.3 Coarse-grained models
- 4.4 A prototypical example: membrane curvature
- 4.5 Multi-scale hybrid particle/field models
- 5 Final remarks
Toward accurate coarse-graining approaches for protein and membrane simulations
-
Published:18 Nov 2015
-
Special Collection: 2015 ebook collection
M. Cascella and S. Vanni, in Chemical Modelling: Volume 12, ed. M. Springborg and J. Joswig, The Royal Society of Chemistry, 2015, vol. 12, pp. 1-52.
Download citation file:
In the last few years, several multi-scale approaches aimed at overcoming time- and size-bottlenecks in atomistic simulations have been proposed. Amongst them, coarse-graining methods appear the most promising ones in terms of compromise between computational cost and molecular accuracy. Because of their success, a large family of coarse-grained models addressing different soft matter systems are available in the literature and they are nowadays commonly used to study complex phenomena such as molecular aggregation, protein folding, or DNA assembly. In this chapter, we restrict our analysis on current methodologies used in protein and membrane simulations using multi-scale and coarse-grained potentials. We introduce fundamental concepts in coarse-grained modelling, and provide examples of some of the most widely used forms of Hamiltonians. Among others, we discuss elastic-network models as well as the MARTINI and UNRES force fields for proteins. We also briefly discuss parameterisation protocols for coarse-grained potentials, including iterative Boltzmann inversion and force matching. In the second part of the chapter, we review current approaches to biological membrane simulations, comparing atomistic, coarse-grained and continuum models. Particular attention is given to the MARTINI and SDK potentials. Alongside with presenting the available coarse-grained models, we discuss their current limitations and the challenges to improve their reliability.
1 Introduction
From the smallest biological molecules to complex living organisms, the organisation of the living matter follows highly hierarchical organisation. As depicted in Fig. 1, starting from the Ångström dimensionality, we first encounter atoms and molecules, then oligomers and polymer, like short RNAs or single-domain globular proteins; at larger scales, macromolecular assemblies give rise to cellular organelles and cells, that in turns, in superior organisms, form tissues, organs, and finally the whole body. Likewise, different biological phenomena occur at different size- and time-scales, and therefore can be understood by employing methods of investigation at the most pertinent level of resolution.1
Since the beginning of the informatics revolution in the 50's of the past century, major effort has been put in developing reliable mathematical and physical computational models of complex systems at different resolutions. In bottom-up approaches, the aim is to establish computational models based on fundamental physical principles that are able to predict the behaviour of the system of interest (Fig. 2). At the most fundamental level, quantum mechanics approaches2 can be used to treat relatively small-sized molecular systems (up to∼103 atoms, and for times of the order of 10−12,−10 s).
Quantum mechanical calculations can nowadays reach up even to millions of atoms for static calculations,3,4 also in this case depending on the degree of approximation with respect to the exact theoretical formulation (and according to the complexity of the system of interest).5–13
For larger systems and longer times (∼106 atoms, and for routinely times of 10−8,−7 s up to 10−6,−3 s) molecular models employing explicit representation of atoms (all-atom models, AA hereafter) interacting through parameterised mechanical effective potentials are the most commonly used approach.14–17 Such potentials can be trained on both accurate quantum-mechanical calculations and on large experimental data set18–22 and they can reliably reproduce molecular processes involving non-covalent intermolecular interactions or conformational changes.23
Combination of quantum mechanical and classical methods in a hierarchical structure is often used as a way of treating those biochemical phenomena that require quantum mechanical treatment while keeping a direct coupling with the environment.24 Historically, the first multi-scale model proposed dates back to 1976 by Arieh Warshel and Michael Levitt,25 where the idea of embedding quantum mechanical treatment of a chemically relevant portion of a biological system (like the active site of an enzyme) into a parameterised description of the environment was proposed. In the past decades, a large family of hybrid quantum mechanics/classical mechanics (QM/MM) methods stemmed from this seminal work, and have established what is today recognised as the standard common practice to treat quantum mechanical phenomena in biological systems.26–36 For this fundamental theoretical work Profs. Warshel and Levitt, together with Prof. Martin Karplus, were awarded the Nobel Prize in Chemistry in 2013.
Even though atomistic simulations can now deal with systems as large as tens of millions of atoms, and for simulation times beyond the millisecond,37,38 several biological processes involving large macromolecular complexes require description at time and sizes that go beyond even such dimensionalities. In order to overcome these bottlenecks, in the past decades several groups have been working on the development of reliable Coarse-Grained (CG) models.39–56 Similarly to AA, effective CG potentials can be derived from higher-resolution AA simulations, or from direct match with specific experimental properties of interest. In such approaches the detailed atomic resolution is lost; nonetheless, some information on the topological structure of the molecular assembly is retained, as described in Fig. 3. These models can efficiently represent molecular systems composed by several millions of atoms, for effective times that can reach the second scale; therefore, they are in principle well-adapted to investigate the structure and dynamics of large macromolecular assemblies and multi-phase systems. The large number of reviews published on the subject in recent years highlights the strong interest by the scientific community in this topic (for example: ref. 1, 16, 50, 53, 57–67).
Treatment of very large systems, and for very long times, opens up a completely different view on the understanding of biological systems and phenomena. In fact, it is often the case that the complexity of such systems is irreducible to the fundamental properties of individual or relatively few molecules, but it requires the treatment of a large number of particles. Moreover, biochemical/biophysical processes are often not simply driven by simple thermodynamic equilibrium, but several kinetic effects, like for example diffusional barriers, may play a fundamental role.
As a pivotal example of the power of coarse-graining approaches in modern computational investigations, a very recent study by Marrink and co-workers was able to investigate the lipid composition, dynamics and diffusion in a realistic model of the plasma membrane.68 Biological membranes are extremely complex environments formed by several lipophilic/amphiphilic compounds, which can behave in very different manners according to their specific composition. Building reliable models of such environments necessarily implies the use of large model systems to respect the relative concentration of the different species. Moreover, properties such as lipid lateral diffusion, lipid flip/flop, membrane elasticity and surface tension can be heavily biased in MD simulations by too small boundaries. The study in ref. 68 reported a model made of over 60 different lipid types, in a stoichiometric ratio compatible to that experimentally determined using lipidomics approaches, for a total of ∼20 000 lipid molecules, and simulated for 40 microseconds using coarse-grained potentials.
The major complication present in any multi-scale modelling approach of biological systems is associated to the fact that phenomena characteristic of a certain size/time scale may influence, directly or indirectly, properties that are intrinsic of a different scale. For example, the network of molecular interactions and molecular recognition patterns at interfaces directly influences the dynamics of large macromolecular complexes; on the other hand, the in vivo efficiency of an enzyme in a cell does not solely depend on its catalytic activity, but also on the accessibility to the substrates within the highly crowded cytoplasmic environment.
To date, the computational community struggles in the effort of building up, on one hand, more and more reliable and general models at the different resolution scales; on the other hand, it is becoming increasingly urgent to develop methods that combine and integrate information from multiple resolutions, in order to improve the predictive power of the implemented models.48,49,57,69,70
2 Coarse-grained modelling: basic ideas
The fundamental concepts related to coarse-graining root deeply into developments of statistical mechanics and the study of phase-transitions near the critical point. These studies evidenced how in such regions of the phase diagram the behaviour of the single particles become less relevant, while collective phenomena dominate the global behaviour of the system.71–73 Although not necessarily near critical points, soft-matter systems, and proteins in particular, are usually characterised by a rather complex phase diagram, and at room conditions they often lay in marginally stable regions near several phase transition crossings.74 This suggests that several of the physical properties of a polymer may be understood employing a description that is at a coarser resolution of the atomic one (for example: ref. 75–79).
Restricting our discussion to biological systems, the first CG model for proteins was proposed again by Levitt and Warshel.80 The original model made use of single centroids to describe individual amino acids, and of an elastic network to reproduce the folded state structure and predict the folding kinetics. Through the last 40 years, several coarse grained models have been developed to study polymers, multiphase systems, proteins and nucleic acids, as previously referenced.
A coarse graining operation implies the mapping of a finely grained system formed by N particles and described by a Hamiltonian HN(α), defined by a set of parameters α, onto a second system composed by M<N particles and responding to a new Hamiltonian HM(β), which will depend on another set of parameters β.
The mapping operation requires a transformation connecting the N and M particles of the two systems. Moreover, the mapping Hamiltonian HM(β) must be such that the partition function ZM:
is distributed according to the statistical distribution of the starting system.
In other words:
where Ξ(rN, pN→rM, pM) is the transformation mapping any conformation of the finely grained system into the coarse grained one.
Such transformation is evidently non-trivial for the following reasons:
(i) The mapping transformation is ill defined. In fact, there is no unique mathematical way of defining the mapping from one fine description into a coarse one. For example, a variable number of bodies may be used, yielding different levels of coarsening, or the same number of bodies may be used to address different structural parameters as degrees of freedom of the coarse grained Hamiltonian.
(ii) The functional form of the coarse grained Hamiltonian is ill defined. In fact, there is no universal transformation that defines an analytical potential function for any coarse grained representation. The choice of the functional form for the potential energy is typically made according to the properties of interest that the model should address.
(iii) There is no consensus on how to parameterise the coarse grained Hamiltonian. This is a consequence of the absence of a well-defined functional form for the potential energy. The interaction among coarse-grained bodies can be built to match as rigorously as possible the fine-grained system, or to fit chosen experimental data sets, or to model a specific property. In any case, this leads to potentials that have either limited transferability or reduced reliability.
Ultimately, the challenge in coarse-graining procedures lies in the definition of those finely grained degrees of freedom that will be disregarded in the coarse representation. It is evident that such choice is strictly dependent on the properties of interest that the coarse grained model should investigate. In fact, not all phenomena depend to the same extent on the same degrees of freedom or structural/dynamical parameters. Therefore, it is debatable whether the hunt for a universal coarse-grained model is by itself well placed, and that different models addressing multiple properties should instead be the right approach to coarse graining.
3 Protein representations
3.1 Atomistic vs. coarse-grained modelling
The study of the proteome constitutes one of the most fascinating challenges of molecular and cellular biology. Investigations may address different topics, comprising, among others, the relationship between the amino acid sequence, the structure and the function of individual proteins, protein folding, protein/protein interaction networks, dynamical effects on protein function (like in motor proteins), or interference of protein function by ligand binding.
Different phenomena associated to proteins can span the most different time and size scales. In fact, single protein filaments can have very different lengths (from few tens of amino acids like in small rubredoxins,81 to hundred thousands like in titin82 ) and processes can be as fast as femtoseconds (like in early photo-activation of the visual signal in rhodopsin)83,84 down to several hours, for example in weakly catalysed processes.85 It is therefore unrealistic to imagine the establishment of a universal computational model able to describe so different phenomena at so diverse scales.
In the course of the decades, different computational protocols tackling modelling of proteins at different resolutions have been established. At the most accurate level, we find today quantum mechanics methods (mostly based on Density Functional Theory), used in combination with embedding methods.24 The applicability of such methods is typically restricted to study those biochemical phenomena that strictly require a quantum-mechanical treatment. The most prominent examples include the study of enzymatic activity, photo-activation, or biological electron transfer and redox systems.24 Even though the constant increase of computational power has allowed in recent years the appearance of the first quantum-mechanical studies on whole small proteins, even with explicit quantum mechanical treatment of the solvent,4 it is unlikely that such approaches become feasible on a routinely basis to study biological phenomena occurring at larger time and size dimensionalities.
To date, the standard method to study structural and dynamical properties of proteins that provides the best compromise between accuracy and computational feasibility is classical molecular dynamics using parameterised potentials.86 Within this approach, molecular systems are described starting from their atomic constituents. Atoms are represented as massive point objects; molecular structures are held together making use of mechanical stretching, bending and torsional effective potentials mimicking the binding effect on nuclei by the electronic cloud. The effect of bond polarization and van der Waals forces driving the interaction between non-bonded atoms is typically taken into account by associating an individual point electrostatic charge to any atomic centre, and by defining Lennard-Jones potentials between pairs of atoms. Overall, the general formula of an AA potential takes the following analytical form:
where the first three terms represent the stretching, bending and torsional energies, the set of (qi) parameters define the Coulomb charges associated to each atom present in the system, and the (Aij, Bij) constants are the Lennard-Jones parameters associated to each i, j couple of atoms.
Parameterisation of the several constants defining the function VAA can be achieved in multiple ways, but in general is determined by combining fitting over a series of experimental and high-level quantum mechanical data. In the course of the last decades, several sets of standardised parameters, generally named “force fields”, have been developed.18–22,87–89 These force fields are nowadays quite reliable in predicting molecular properties of most protein systems. In particular, several folding simulations have verified that the global free energy landscape of protein conformations can be reliably reproduced using such potentials.90–92 Force fields are continuously re-parameterised to improve their reliability and transferability; therefore, we advise the reading of specialised literature and reviews to have a clear view on the performance of the different force fields for the calculation of different properties.23
To date, world-leading groups in molecular simulations have pushed molecular dynamics studies of proteins to reach time-scales on the millisecond range, and sizes as big as 107 atoms. These technical progresses allowed to investigate the folding of several globular proteins,37,91,92 to elucidate signal transduction in G protein coupled receptors,93,94 and to achieve structural refinement of low-resolution cryo-EM images of the HIV-1 capsid.38
Nonetheless, these dimensionalities are, on one hand, not accessible to the broad computational scientific community, and at the same time, they are not sufficient to cover the scales pertinent to large in vivo macromolecular complexes.
To date, several research groups are actively working to establish simplified CG potentials capable of reliably representing large macromolecular protein aggregates and the associated slow phenomena, and several reviews available in the literature provide detailed historical retrospective on the field of CG modelling.1,16,50,53,57–67 Here we focus on the present challenges and on what are the currently debated possibilities of pushing the field forward.
3.2 Elastic network models
As explained in the former paragraphs, the CG procedure includes two steps: the definition of a simplified coarse system mapping the more complex and finer initial one, and the definition of a potential energy function for the coarse system capable of mapping the properties of interest of the finely-grained one.
The first model for proteins containing these ingredients dates to one of the fundamental seeding papers of the field of biomolecular simulations by Levitt and Warshel.80 In their paper, they presented a simplified model for protein folding based on the concept of elastic networks. In these approaches, the structure of a folded protein (known from the experiment) is reduced to few centroids, usually one or two per amino acid, placed at the Cα (and Cβ) position. The distances between consecutive Cα in the polymer chain are constrained to the experimental Cα–Cα length over a peptide bond. All Cα's interact with all other Cα's not contiguous in the sequence through an (an-)harmonic potential centred at the experimental distance of the Cα–Cα couple:
These methods accept intrinsically the native folded structure as a global minimum of their energy, and therefore can be studied to explore in a fast and efficient way the folding landscape. In particular, folded structures lay in a well-defined global minimum of the free energy of the protein. Therefore, small fluctuations of the energy should follow a quadratic law. This can be easily implemented in elastic networks, by imposition of a strictly harmonic analytical form of the potential V in eqn (4).
Several studies in the past years have shown how such apparently simple approaches are indeed capable of representing well thermally associated local deformations of protein scaffolds, as well as how they can be used to reproduce principal component analysis coming from more complex all-atom simulations of the same targets.95–105
In 2004, based on previous studies by Bahar and coworkers,95–97 Micheletti et al. worked on an extension of the model of elastic network where one single bead is introduced to mimic the presence of the side-chain (Fig. 4).101 This minimal complication of the model is sufficient to significantly increase the accuracy of the predicted fluctuations.102 In this so-called β-Gaussian model, a protein is reduced to a set of beads representing the Cα and the Cβ of the various amino acids, and its total energy when in a trial configuration Γ takes the analytical form:
Explicitly, the terms in the Hamiltonian are:
where Δij represents the contact matrix between beads i, j in a predefined reference structure and d the separation of two beads in the configuration Γ. In the condition that the reference structure is the folded state of the protein, and thus sitting in the global minimum of the free energy landscape, the potentials V(d) can be approximated by Taylor expansion truncated to its quadratic term:
where the distance d is expressed in terms of its equilibrium value rij and the instantaneous displacement xij:
and (m,n) are indexes that run over the three Cartesian coordinates. Furthermore, the coordinates of the Cβ are embedded into the topological structure of the Cα, according the reconstruction scheme by Park and Levitt:106
where ℓ=3 Å. In this way, the coordinates of the Cα remain as the only degrees of freedom of the system, and thus the Hamiltonian of the system can be written simply as:
where xi,m is the deviation of the ith Cα along the m axis, and is a 3N×3N symmetric matrix. The eigenvalues and the eigenvectors of determine the elastic response of the system. Recently, such models have been also proposed as means of identification of evolutionary patterns in protein sequences.107
Despite their simplicity, elastic network models are based on very solid and intuitive mathematical-physical concepts (like polynomial expansion of the potential energy function from the global minimum), and this has constituted one of the reasons for their success in the literature in the past years. Nonetheless, at the same time, their simplicity restricts their applicability to a relatively small set of studies.
In particular, elastic network models reproduce the local shape of the folding free energy landscape of a protein by an artificial effective potential that does not contain any information on the physical means by which such shape is obtained. Moreover, they intrinsically bias the global minimum to a selected conformation, and thus they are not capable to capture the protein conformational changes and exploration of multiple structural minima that may be required for exploitation of the protein function (for example, in motor proteins, transporters etc.). Finally, these models do not contain any information on the external potential of the proteins; therefore, they cannot be in principle used to explore in detail interactions of proteins with the environment (i.e., with membranes, other proteins or ligands).
3.3 Force-field based approaches
Establishment of CG protocols for proteins having broader applicability than elastic network models requires the definition of potential energy functions containing terms that are based on more general assumptions. The Holy Grail of the present research would be the definition of a universal form for such terms that would guarantee transferability over multiple systems keeping reasonable levels of reliability over several properties of interest.
In the next paragraph, we will discuss the general ideas behind force-field based coarse-graining, and will provide few examples of approaches present in the literature that could be used to push forward the field.
3.3.1 CG molecular Hamiltonian – simple concepts
The coarse-grained mapping of a molecular system can be done at different levels of resolution. We concentrate here on models at a relatively fine level of mapping that keep resemblance of the physicochemical nature of the polymer chain. Within such logic, models incorporate at least one body per amino acid (usually centred at the Cα position), but often they contain also one or multiple bodies to represent the side chains or other components of the backbone.
At this level of mapping, the coarse-grain systems still resemble the topology of the original molecular systems. It is therefore tempting to hypothesise that the CG potential should also resemble the all-atom one:
The expression in eqn (11) is on purpose written in a very general form, because each term present there may or may not be expressed precisely as those of an AA force field (eqn (3)).
The most straightforward modelling, in fact, keeps all terms in eqn (11) analytically identical to atomistic models. Within this model, the Hamiltonian takes the form:
This is at the base, for example, of the MARTINI force field, which is to date one of the most popular and broadly used CG models in the literature.47 In particular, this model uses strictly the same harmonic or periodic analytical form for the stretching, bending and torsional distortion associated to bonds, angles and dihedral structural parameters, and Coulomb and Lennard-Jones like potentials for the non-bonded parts.
This approach has the advantage of being very easily implemented in all molecular dynamics codes, provides a direct estimate of the reduction of the computational requirements with respect to corresponding all-atom systems and allows the definition of highly-transferable single units that can be parameterised independently and used as building blocks to set up calculation of any protein system of interest. The MARTINI force field has been applied to the investigation of numerous processes involving protein-membrane interactions, including membrane proteins dimerization/oligomerization,108–113 recognition of specific lipids by peripheral proteins114–116 or the role of peptides in membrane fusion.117,118
As just mentioned, molecular-type CG force fields have a relatively broad range of applicability, as they were successfully used to reproduce the qualitative behaviour of protein localization at membranes, or the relative displacement of folded subdomains of large protein aggregates upon mechanical stress.61 On the other hand, this description results too simplified to guarantee reliable transferability over different systems and cannot be used to study several properties of crucial interest in protein science. In particular, simple harmonic potentials for the backbone are not adequate for the description of secondary structure elements, making these kind of potential inadequate to study conformational changes in proteins, or folding events (as discussed in detail in the next paragraph). Moreover, the use of bead-types, in similarity with the atom-type concept of atomistic simulations, to calibrate non-bonded interactions for the various CG bodies heavily reduces the chemical variability, and thus reduces the capability to describe subtle but biologically relevant modifications of the interactions at specific sites.
For example, in a recent computational study on the mechanisms of membrane recognition and binding by α-Tocopherol Transfer Protein (α-TTP), simulations with the MARTINI force field were well able to capture the primary mechanism of α-TTP recruiting at the plasma membrane by phosphoinositides through contact formation with an evolutionarily well conserved basic patch at the protein surface.119 On the contrary, CG simulations were not capable of determining the interaction pattern between the membrane and the sequence-variable N-terminal domain, which is known to be crucial for biological discrimination and targeting of different cellular compartments by different members of this protein family. It was instead possible to describe such interactions by atomistic models of the same system.119
Even though the choice of a bead type approach may intrinsically yield a reduced sensitivity of the model to both the chemical variability of the protein sequence and the surrounding environment, it is nonetheless reasonable to pose the question whether CG models of proteins should be meant to capture such effects at all. In fact, on one hand, any coarse grain mapping from an atomistic model intrinsically decreases the resolution to a less-than-chemical detail.
Opposite to this consideration there is the evidence that interactions at the chemical resolution ultimately drive the behaviour of all objects at the coarser one. Therefore, coarse models should be able to reproduce at least the main features of such interactions to guarantee that the properties of interest are reliably reproduced. The struggle between simplification of the representation and transferability/universality of the model is to date not resolved;57,65,66 therefore, it is crucial for the user to understand the physical principles at the base of each model proposed in the literature, and to choose the most appropriate one according to the type of study of interest.
3.3.2 Refining the molecular potential terms
Since several years, several research groups have been investigating the possibility of improving the quality of CG potentials by moving beyond the molecular potential of eqn (12). Different modifications can be applied to both bonded and non-bonded terms.
Bonded-interactions
Figure 5 reports the atomistic representation of the backbone connecting three consecutive amino acids in peptide chain, and the corresponding CG representation that one would get in a hypothetical model where each amino acid is reduced to a single bead located at its Cα position. Assuming as first approximation rigid peptide bonds, relative displacement of the backbone can be achieved only by rotation around the (ϕ, ψ) torsional angles. The well-known Ramachandran plot, reporting the statistical abundance of the (ϕ, ψ) angles in polypeptides and proteins, clearly shows the existence of multiple regions of minimal free energy, corresponding to the different secondary structure elements characterizing folded polypeptides.
In Cα-based CG representations of backbone chains, the two-dimensional Ramachandran space is reduced to one bending parameter (angle γ in Fig. 5).65 The most common secondary structure elements: α-helices and β-sheets correspond to two well-separated values of γ (∼90° for the α-helix and between 120° and 140° for the β-sheet). This simple example is sufficient to explain how a CG potential that aims at exploring thermodynamically accessible protein conformations and/or folding events cannot use a simple harmonic approximation for the angular terms, but it requires more complicated functional forms allowing multiple minima.
For example, a work by Tozzini et al. in 2006 showed that it is possible to map the (ϕ, ψ) space in an analytical way using the angle γ in combination with the dihedral angle ω formed by four consecutive beads56 (Fig. 5).
Non-bonded interactions
Proper choice of bonded interactions can be applied to improve the specific elastic properties of the polymer chain at local sites. Similarly, more sophisticated modelling schemes can be applied to the non-bonded interaction terms of the potential function. In condensed-phase systems, non-bonded terms of the potential function constitute the most expensive part of the calculations. Therefore, it is desirable that these contributions are expressed in computationally affordable functional forms.
In atomistic models, non-bonded interactions are usually divided in electrostatic and van der Waals interactions, represented by Coulomb and Lennard-Jones potentials (eqn (3)). The computational cost of an all-atom simulation is largely due to calculation of the long-ranged electrostatic interactions, which decay with a R−1 power only. It is widely assumed in several CG models that in most cases non-bonded interactions may be represented by shorter-range potentials than the Coulomb one.
The electrostatic potential produced by a group of N charges in a point of space Rj is given by the Coulomb sum:
where ri describes the position of each charge qi. The same potential can be approximated as a function of the only vector R=(Rj − Rq) describing the distance of the point Rj from the centre of charge distribution Rq. This approximation is the well-known expansion in multipoles:
where Q is the total charge of the system, μ is the electrostatic dipole of the charge distribution, and is its electrostatic quadrupole. The expansion extends to higher order multipoles, here not explicitly written. Disregarding the side-chains of charged amino acids, grouping of atoms into CG beads usually leads to grouping of charge distributions of negligible total charge. Therefore, the corresponding long-range potential should decay with a power law >1. In particular, interaction among not charged beads should be analogous to a dipole-dipole potential, and therefore, it should follow a R−6 law. If so, CG electrostatic potentials should have similar asymptotic behaviour as the van der Waals ones, thus, it should be possible to implement the two into one single effective potential. Generalised Vn,m(R) potentials of formula:
have been tested and implemented in the literature. The use of values of n<12 to describe the repulsive part of the potential is usually justified by the fact that CG beads, representing a molecular moiety, are expected to be softer than the electronic cloud of single atoms, while attractive terms usually maintain m=6, consistently with dipole–dipole attraction law.
The use of a potential of the Vn,m(R) form is advantageous because it is analytical and with analytical derivatives, therefore, it guarantees cheap and accurate calculations of the forces, required for integration of the equations of motion via Verlet algorithm, or any other similar one. Nonetheless, such potentials may lead to rather unsatisfactory results.
In fact, proteins are expressed and exploit their function in the condensed-phase. Therefore, long-range interactions between non-bonded groups are never direct, but they are constantly mediated by the solvent. In atomistic simulations, the solvent is present in the system, either by explicit representation of the solvent molecules, or by introducing some form of dielectric continuum response. In any case, a portion of the potential function is directly taking into account the presence of the solvent. In CG representations, the situation is not so straightforward. For example, a simple dielectric response cannot be easily implemented for all those models that incorporate at least to a certain extent electrostatics into effective long-range potentials. The use of explicit CG beads for the solvent, as it is often done, not necessarily leads to reliable solvation models. This is particularly true for solute hydrophilic CG beads, where directional interactions, like H-bonds, which drive the local solvation structures, are missing. Tackling this problem implies the formulation of non-bonded effective potentials using analytical expressions that take into account the screening effect of the solvent. For example, the potential can be calibrated over radial distribution functions, thus incorporating the presence of regions of meta-stability due to the presence of solvation shells (see for example: ref. 51).
3.3.3 Radial vs. non-radial forms of the long-range interactions
So far, we have discussed possible modelling of the non-bonded interactions between CG beads that are strictly dependent on the distance between the two particles. This is straightforward for point or spherically symmetric objects, like atoms. This assumption is in fact not so obvious for CG beads, which are effective representation of molecular fragments.
Let's consider the multipolar expansion in eqn (14). Apart from the monopolar term, which strictly depends on the distance R only, all other terms depend on the relative orientation of the charge distribution generating the potential. Therefore, in the most general case, the long-range potential associated to a CG bead should not be dependent on the distance only.
One can imagine three cases when the formulation of V(R) as a radial potential is valid if:
(i) The interacting particles have a neat electrostatic charge; in that case the dominating term is the Coulomb potential;
(ii) Both particles have either a highly symmetric charge distribution, or are weakly charged. In these cases, low order less symmetric multipoles are zero or negligible, thus the dominating terms are practically spherical, or anyway weaker or shorter-ranged than van der Waals interactions. This occurs for example in hydrophobic groups;
(iii) The internal dynamics of the particles is fast enough, so that the reorientation of the multipoles along the electrostatic field is much faster than translation of the particle, for example, in free solvent particles.
It is evident that these conditions do not apply to neutral polymeric chains containing strongly polar groups. In fact, the same polymer structure imposes topological restraints that prevent free and fast relaxation of the polar moieties. Polypeptide chains fall in this last category, as each peptide bond connecting two consecutive amino acids brings a permanent dipole of about 3.5 D oriented roughly perpendicular to the main-chain direction. The orientation of the backbone dipoles is strictly associated to the Ramachandran angles, and therefore it cannot be assumed to change unless local conformational changes occur.
3.3.4 Example of a Hamiltonian beyond the molecular form: the UNRES potential
The UNRES model by Scheraga and co-workers is one of the most successful CG models for folding prediction available to date.45,46 The possibility of exploring efficiently the conformational space of a protein sequence is achieved by keeping the number of interaction sites as small as possible: typically, one bead for the backbone, located at the peptide bond centres and one spheroid representing the side-chain. Auxiliary non-interacting sites are located at the Cα's, to facilitate the geometric definition of the chain structure (Fig. 6). The potential function of the UNRES models reads as:
This potential introduces several additive contributions to the energy, taking into account the usual stretching, bending and torsional terms Vb(di), V(γi), V(φi), but includes also explicit correlation terms for consecutive torsional angles (φi, φi+1) and the angles (α, β) defining the position and orientation of each side chain group. Moreover, explicit temperature factors are included as screening constants for several terms of the Hamiltonian according to progressive terms of the Kubo cumulant-cluster expansion.120,121
The UNRES potential adds explicit long-range contributions to take into account orientation and alignment of the backbone groups mimicking dipole–dipole interactions, as well as multi-body contact terms to facilitate stabilisation and folding of secondary structure elements. Specifically, contact terms are used to induce the alignment of backbone groups according to experimentally known folded structures, in particular α-helices and β-sheets.
CG terms are calibrated on Potential of Mean Forces. These quantities can be extracted either from statistical distributions from the PDB data bank, or from atomistic MD simulations of model systems.45,46 Currently, the UNRES force field provides two sets of parameters.122 The first is calibrated over both structural and folding thermodynamics of G-related albumin-binding module (PDB code: 1GAB),123 a triple α-helix bundle. The second set of parameters is calibrated over folding data for tryptophan cage and tryptophan zipper 2 (PDB codes 1L2Y, 1LE1),124,125 small protein constructs displaying both α-helical and extended structural elements.
The UNRES model represents one of the most accurate CG force fields for protein folding, as confirmed by several assessments at CASP (Critical Assessment of Techniques for Protein Structure Prediction).126 Recently the UNRES model proved to have good reliability also in predicting protein oligomerisation.127–129
3.4 Calibrating a coarse grained model
Definition of an appropriate functional form for a CG potential constitutes only one step of the problem. In fact, however it is expressed, the CG Hamiltonian will, in general, depend on a set of free parameters that require appropriate calibration. Multiple strategies are present in the literature, that can be grouped into two large families: top-down approaches that match to global thermodynamic quantities, and bottom-up approaches that build parameters from the mechanical properties of corresponding atomistic models.
3.4.1 The Boltzmann inversion method
In top-down methods, the statistical distribution (ξ) of a specific order parameter ξ and the associated Free energy constitutes the typical starting point:
This expression is referred to as Boltzmann inversion, and it is the starting point to build several CG potentials. The main issue related to this approach is that the statistical distributions of different CG parameters ξ are not uncorrelated. Therefore, it is not possible to use additive potential of mean forces derived tout-court from the Boltzmann inversion for the different variables of interest to build a CG potential without introducing significant bias. Separation of genuine contributions from spurious one can be rigorously obtained by the iterative Boltzmann inversion procedure as introduced by Reith et al.130 In this protocol, one starts from an initial potential of mean function U1(ξ), which will generate a probability distribution for the parameter ξ equal to 1(ξ). The potential is then iteratively corrected according to the formula:
where (ξ) is the target distribution of interest. This procedure guarantees that, at convergence, the statistical distribution of ξ is matching the corrected one. Unfortunately, the number of free parameters in a CG Hamiltonian can be quite large, for example, in her review of 2010, Tozzini estimates roughly ∼400 terms only to calibrate bonded interactions.65 Iterative Boltzmann inversion is in fact efficiently used in organic polymer simulations, even though some examples of application to biopolymers are now appearing (for example: ref. 131).
3.4.2 The force-matching method
The force-matching method represents a rigorous way of building up CG parameters bottom-up from atomistic simulations. The procedure introduced by Ercolessi and Adams in 1994132 consists in optimising the free parameters of a mapping Hamiltonian so that the forces acting on all the particles of a system reproduce those acting on the same particle described by the mapped Hamiltonian. For example, a molecular Hamiltonian may be built by matching the forces on atoms to the nuclear forces obtained from a quantum-mechanical calculation.133 The procedure is very general, and it has been widely extended through the years by Voth and co-workers and others as a rigorous protocol to derive CG potentials from AA simulations.43,44,48,49,55,134,135
Briefly, the force matching procedure implies definition of a penalty function:
where Γ is a configuration of the system, α are the set of parameters defining the CG Hamiltonian, F(Γ;α) is the CG force acting on the ith CG bead, F(Γ) is the corresponding force according to the AA potential, and N is a normalisation constant. Minimisation of the penalty function χ2(α) will determine the best set of parameters α.
If the potential is linearly dependent on the parameters α, the problem transforms in an over-conditioned set of linear equations, which can be minimised by several fitting protocols. Typical of such problems, the major issue of the force-matching procedure relies on the fact that minimisation may lead to several equivalent numerical solutions; therefore, solid optimisation algorithms are required, along with the use of restraints to confine the numerical solution on physically sound values. More importantly, the matching procedure requires sampling on a subset of structures typically obtained from AA simulations. As a consequence, parameter sets may be biased toward those structures that are observed in the relatively short time of an AA simulation, and therefore they may lead to inaccuracies for very long simulation times if the system drifts over conformational basins that were not explored during calibration of the potential.
To date, the force-matching procedure has been validated on several systems and at different scales. In fact, Voth and co-workers showed that this method can be reliably applied to diverse biochemical systems, spanning from lipid bilayers44,136,137 and cholesterol/lipid interactions,138 to monosaccharides,139 peptides,140 to large protein assemblies.141–143
3.5 Integrative multi-scale modelling
3.5.1 Hybrid AA/CG Hamiltonian
However accurate CG models can be, several phenomena associated in general to molecular recognition (like, receptor-ligand binding, adhesion at functionalised surfaces, substrate recognition, ligand-induced conformational changes, allosteric response etc.) strictly require an accurate description with atomistic detail. It is therefore crucial to develop multi-scale methods that allow exchange of information at multiple resolutions, thus being capable of exploiting the efficiency of CG descriptions with the accuracy of atomistic ones. Several possible approaches have been proposed. An excellent review is for example given as ref. 57.
As an example, here we present the hybrid all-atom/coarse grained (AA/CG) model introduced by Neri et al. in 2005.144 Similarly to QM/MM, the AA/CG model explores the possibility of building a hybrid multi-scale model where any portion of interest is treated at the more accurate atomistic resolution, while the rest is described at the CG level (Fig. 7). The original formulation as presented in the 2005 paper made use of a combination of the Gromos united-atom force field21 and an elastic network model with anharmonic restraints:
where VAA is the atomistic potential function describing the all-atom region, VCG is the anharmonic elastic network model applied to the CG part:
with Ri,i+1 and d0 being the distance of two consecutive CG particles and the average distance of two Cα's in a peptide bond, respectively; while Rij and bij represent the distances of two non consecutive CG particles, and the corresponding equilibrium distance. Coupling between the atomistic and CG parts is mediated by a 6 Å thick interface shell around the atomistic portion of the system. This region maintains its all-atom details, and interacts through it with the proper all-atom portion; on the other hand, the centre of mass of each residue present in this region is connected to the elastic network and responds mechanically to it.
Different studies proved that the model is capable of describing thermal fluctuations of the protein scaffold with the same accuracy of pure elastic network models. Moreover, the atomistic region responds to the mechanical strain induced by the same fluctuations consistently with what is observed in all-atom simulations.145–147
The absence of direct coupling between the all-atom and the CG parts constitutes the major drawback of the model. In fact, several proteins are characterized by a strong long-range electrostatic potential, which influences local properties like the pKa of titrable groups, the relative orientation of polar/charged moieties, the binding affinity of ligands, or the redox potential. Improvement of the model necessarily passes through the establishment of a reliable CG force field that includes explicit and accurate treatment of long-range electrostatics.
Cascella and co-workers demonstrated that simple structural information, namely, the positions of the Cα is sufficient to reconstruct the electrostatic field of a protein.148 The model makes use of the topological correspondence between the orientation of the dipoles and the CG angle γ between three consecutive Cα's, and by reconstructing the position of the electrostatic multipoles associated to the side-chains of an amino acid based on restraints from chirality and flexibility of the side-chain similarly to the protocol by Park and Levitt.106 Although very simple, the reconstruction of the electrostatic features of a protein is systematically more accurate than any point charge models as, for fields of intensity superior to ≈5.0·10−5 a.u., the relative error with respect to the field generated by all-atom charge distributions is negligible. Moreover, the model is capable of reproducing the strong changes of the macromolecular dipole of a peptide during unfolding from helical to elongated conformations (Fig. 8).148
Dynamic models derived from the same assumption verified that, like in the UNRES model, the presence of the backbone dipoles is sufficient to naturally stabilise secondary structure elements and to predict assembly of secondary structure elements into super-secondary assemblies.39 For example, the model describes accurately the presence of a macromolecular dipole along the axis of an α-helix, which in turn is sufficient to predict stabilisation of helix-coil-helix super-structures by advantageous alignment of the two respective dipoles.39
More recent developments of this approach have begun exploring the possibility of extending the model toward a universal force field with good transferability to all protein sequences. Spiga et al. showed that combination of the electrostatic model for the protein backbone with molecular-type CG potentials yielded systematically reliable predictions for different proteins with different folded structures.55 Nonetheless, ad hoc calibration of the several parameters composing the potential function for each specific protein over atomistic simulations of the same system by force-matching procedure yielded systematically better results than a generic set of parameters calibrated over multiple targets.55 This indicates that inclusion of accurate electrostatics into molecular-like potential functions, although improving their quality, is anyway not enough to achieve full transferability. For that, more sophisticated forms of the potential energy function might be explored.
3.5.2 Integrative experimental–computational approaches
The models presented so far aim at developing protocols that allow the investigation of bio-molecular phenomena from a purely computational perspective. In this respect, potential functions are built from sound physical principles, and experimental data are eventually used for initial calibration of parameters only. In more recent years, different groups have been developing protocols aimed at integrating experimental data with simulations on the fly. These are very pragmatic ways of training simulations to reproduce/interpret/rationalise in the most accurate way the data coming from the experiment. Although they may not be useful to build up transferable models, these may nonetheless produce some outstanding results. Here we present few examples of such approaches.
Integrative experimental-multiscale computational tools for refinement of cryo-EM data
Cryo-electron microscopy has revolutionised the field of protein imaging in the last years.150 This technique presents several advantages with respect to traditional X-ray diffraction spectroscopy. First, it captures samples in an aqueous environment, which is intrinsically more “natural” than a highly concentrated condition like in a packed crystal. Second, it does not require the presence of ionic buffers for co-crystallisation, which may also induce significant local deformation of the protein structure. Third, glasses are obtained by rapid quenching of the solution; therefore, multiple room-temperature accessible conformations of the protein may be captured in one single experiment. Last but not least, the technique allows determination of large macromolecular complexes naturally present in solution but that may not be able to crystallise.
The major drawback of this technique lies in the typical resolutions that can be achieved. Especially for macromolecular complexes, it more commonly remains within 5 and 10 Å, which does not facilitate enough understanding of the specific intermolecular interactions regulating complex formation and functioning, even though outstanding structures down to 3.5 Å resolution are sometimes possible.151
Tama et al. developed in the past years a protocol called molecular dynamics-flexible fitting.152,153 In this procedure, the normal modes of a protein of known structure are computed using simple elastic network model approaches (as described in Section 3.2). Then, the protein scaffold is deformed along the computed normal modes in order to fit the electron density map coming from an EM experiment. Once the structure of the protein that best fit the experimental data is found, the atomistic-detailed structure is mapped back starting from the original high-resolution data. In this way, the different conformations of a protein present in a Cryo-EM data set can be resolved at atomistic resolution.
The same protocol can be adapted to study the conformations of large macromolecular complexes for which the high-resolution structures of the single components are known. The most prominent example is the refinement of the HIV-1 capsid by Zhao et al.,38 which also represents, at present, one of the largest atomistic calculations ever performed.
A similar method has been used in the past few years by Dal Peraro and co-workers to resolve several large trans-membrane multi-protein complexes.149,154–156 Compared to original molecular dynamics-flexible fitting,152,153 Dal Peraro and co-workers employed a Swarm Intelligence Dynamic Modelling search to explore efficiently the large conformational space spanned in protein complexes characterised by dramatically large conformational changes.157
The most prominent example is given by the resolution of the pore-forming toxin aerolysin (Fig. 9).149 Such protein is the prototypic representative of the β-pore-forming toxin class, for which the structure was not known. Determination of low-resolution images by cryo-EM of the heptameric complex posed an even bigger puzzle, as different images were showing significantly different assemblies. Dal Peraro and co-workers proceeded with systematic investigation at both the experimental and computational levels. First, they determined at high-resolution the monomeric component of the toxin; then, atomistic simulations were employed to reveal large-scale motions accessible to these single proteins. Finally, the structure of the complex in its resting state was determined.
Analysis of the large-scale fluctuations revealed that a simple concerted twisting motion of the seven monomers allows formation of a transmembrane β-pore, by extrusion of long β-domains from the core of the assembly. The existence of such conformation was confirmed by the almost perfect matching with the cryo-EM images, revealing the existence of structurally significantly different states.149
Integrative experimental-CG protocol for determination of macro-assemblies
Cellular protein complexes can span over dimensions that are beyond reach even for mid-/low-resolution techniques like cryo-EM, or they may have an intrinsic dynamical nature, thus preventing stable protein co-localisation for sufficient long time to allow direct probing. In this case, there is no possibility of obtaining structural data of the whole structure from single experimental methods. Still, smart combination of experimental insights and simulations can lead to surprisingly outstanding results.
The example that we report here is the determination of the three-dimensional structure of the nuclear-pore complex (NPC) of Saccharomyces cervisiae by Alber et al. achieved in 2007.158,159
This macromolecular assembly localises at the nuclear membrane, and it is crucial for regulation of molecular transport inside/outside the cellular nucleus. The construct is very complex; the NPC is constituted by several units of about 30 different proteins, up to 456 individual molecules, for a total mass of roughly 50 MDa. The different proteins localize dynamically at different regions of the nuclear membrane (nuclear, transmembrane, or cytpoplasmic sides), and arrange in eight rather symmetric units around the pore axis, which is filled by filamentous protein domains providing the docking sites for the transport factors. Determination at the (quasi)-atomistic resolution of the whole assembly by traditional structural biology methods is not feasible, both because of its huge size, and of the dynamical character of the same assembly.
At the time of the work by Alber et al. only low resolution images by cryo-EM were available, and only roughly 5% of the individual protein sequences were structurally characterised at the atomistic level.158
The integrative experimental/computational approach devised by Alber et al.159 makes use of massive data coming from several proteomics techniques capable of capturing any possible structural or protein–protein interaction data. These may include sedimentation analysis to determine coarse shapes of each protein component, immuno-EM to infer localization of proteins in the assembly, affinity purification of tagged proteins to determine pair or oligomeric interactions among protein partners in the construct. Per se, data coming from each single technique may yield very poor information about the global structure of the complex; nonetheless, taken altogether, they should infer sufficient information to reconstruct the global structure of a macromolecular assembly.
In the approach by Alber et al. (Fig. 10), the first step is to translate information from experiments mentioned above (plus others) into spatial restraints. Some restraints are applied to single proteins, to define their shape, volume occupation etc.; others to entire protein groups, to facilitate or destabilise protein proximity in the assembly.
Then, a computational protocol is defined where proteins are represented by a minimal set of spheroidal beads topologically arranged to grossly reproduce the respective shape, and initially randomly distributed in a confined volume. All spatial restraints are combined in one scoring function, which tests how much a given structure is compatible with the collected restraints coming from the experiment. Additional external factors, like global shape or symmetry of the complex are added as well to the scoring function. The structure is then relaxed with respect to the scoring function until a minimum is reached. The procedure can be repeated several times until a statistically meaningful set of structures is found. Finally, all best structures that satisfy similarly well the initial restraints are processed by cluster analysis.
The procedure allows three possible outputs: (i) there is only one clustered structure that respects the initial restraints. In this case, a likely structure for the macromolecular complex has been found. (ii) There are multiple structures respecting initial restraints. Then, either there exist multiple native aggregation states, or the number of restraints used is insufficient to uniquely determine the structure of the complex. In any case, additional experiments adding new restraints should be performed. (iii) The algorithm cannot find structures satisfying the initial restraints. In this case, either the experimental data set is biased, or there is some issue in the interpretation of the experiment in terms of spatial restraints.
This approach is surely extremely fascinating, but it requires a large number of experimental data, possibly coming from diverse sources, so to balance bias, correlation, and eventually spurious information. To date, accumulation of such large quantity (and high quality) of data may still require large investment of time and resources by scientific consortia. Advances in high-throughput proteomics methods may nonetheless facilitate the establishment of this approach in the coming years.
4 Lipids and membranes
In addition to proteins, biomembranes are also a major component of cells. They not only constitute the cell boundary that separates cells from the extracellular milieu, but they are also necessary for intracellular sub-compartmentalization, encapsulating both intracellular organelles (endoplasmic reticulum, Golgi apparatus, mitochondria, lysosomes, endosomes, etc… .) and the cargo vesicles that are used to transport materials between them.
Because of their importance and ubiquity, it is not surprising that large efforts have been made to accurately describe membranes using theoretical and computational approaches, and especially in order to keep up with the exploding wealth of experimental data available on the subject in the last few decades. Yet, chemical and physical properties of membranes are remarkably different from those of proteins, and modelling attempts to accurately describe and predict membrane behaviour have followed a significantly different path than those focusing on proteins.
The physicochemical differences between proteins and membranes originate from their basic constituents. While proteins are composed of amino acids, membranes are made of lipids: amphipathic molecules characterized by having a polar head at one extremity and a highly hydrophobic moiety at the other end, typically in the form of hydrocarbon chains, with the two ends usually connected by a small linker (most frequently glycerol or sphingosin).
Even though the chemistry of lipids can be as rich as one can imagine, thanks to the numerous combinatorial ways in which polar heads and acyl chains can combine, the most glaring property of lipids is their capability of self-assembly into membranes, usually in a bilayer conformation where two lipid monolayers come into contact through their hydrophobic moieties, thus preventing the passage of both water and polar solutes across the bilayer.
This behaviour is at the root of the cellular role of membranes as a protecting barrier, and different theoretical approaches, ranging from mesoscopic to microscopic models, have been developed throughout the years to investigate the unique properties of this fascinating biological structure. The particular choice of a specific methodology (continuum, atomistic, coarse grained) to investigate membrane properties using numerical methods essentially depends on the scale/size and accuracy needed to address a specific question, and in the next sections we will discuss the main characteristics of the various approaches.
4.1 Continuum models
Lipid bilayers are only few nanometres thick, but they can stably span macroscopic lateral scales (Fig. 11). Considering that each lipid occupies a surface area of approximately 0.6–0.7 nm2 and consists of approximately 40–50 heavy atoms, it is evident that an atomistic description of an entire membrane would be rather prohibitive from a computational point of view.
However, for lateral scales larger than the bilayer thickness, membranes can be simply described as a two-dimensional surface in three-dimensional space using the classical formalism of differential geometry. If the internal degrees of freedom of the lipids constituting the membrane are sufficiently disentangled from the large-scale properties of this two-dimensional object, then the energy of our system will depend only on large-scale observables and it will be possible to describe its behaviour with a relatively simple functional.
This assumption relies on scale-separation, and it is at the root of the phenomenological Hamiltonian proposed by Canham in 1970,160 Helfrich in 1973,161 and Evans in 1975,162 which identifies stretching and bending as the two main energy components of membranes:
Here, σ is surface tension, i.e., the empirical parameter describing the energy penalty for area variations, κ is the bending modulus, κ̅ is the Gaussian modulus and K=(c1+c2), KG=c1c2 and K0 are the extrinsic, Gaussian and spontaneous curvature of the lipid bilayer, respectively, with c1 and c2 as the inverse of the two principal curvature radii R1 and R2.
This equation is at the centre of all continuum theories of lipid membranes and has been described and reviewed in length in several occasions and we refer the reader to several remarkable texts for further details.163–165
The historical importance of the Canham-Helfrich formulation cannot be overstated. It not only led to the correct prediction of the topology of the membrane shape of red blood cells,166 but it also provided a simple theoretical framework for experimentalists to interpret a wide range of observations, leading to a golden era of membrane science.
By measuring the empirical parameters in eqn (23) using various experimental techniques,167–172 a remarkable property of membranes emerges naturally: while stretching membranes is relatively expensive in terms of energetic cost (especially since it linearly depends on the size of the membrane), membrane deformations are only one order of magnitude bigger than thermal energy, with bending moduli of cellular membranes of approximately 20 kbT.169,171 Hence, membranes are stable against thermal fluctuations (i.e. they do not break or fall apart) but soft enough to be deformed and remodelled by proteins. This property is crucial for several cellular processes, and especially signalling, allowing the formation of transport vesicles that transfer materials both between intracellular compartments and between different cells.
Despite being formulated in the 70's, the modelling of membranes using continuum approaches remains nowadays an active area of research and developments have mostly focused on the ability to incorporate microscopic details into what is essentially a phenomenological description. For example, the derivation of the expression for membrane fluctuations starting from the Helfrich Hamiltonian has allowed the derivation of the bending modulus κ using flickering spectroscopy173–178 while other derivations allow its determination via scattering techniques179–181 or physical pulling of membrane tethers.182–184 In addition, internal degrees of freedom of lipids, including lipid tilting185–191 or lipid shape192–195 have been included to derive higher-order energetic terms. Also, protein-membrane interactions, especially for what concerns the role of protein insertion in membrane fission or membrane curvature sensing, have been also investigated in the last few years.187,196,197
Another interesting area that is receiving some attention recently is the use of MD simulations to derive the model parameters of the continuum treatment.193,198–206 These efforts look very promising, not only to further expand our understanding of the properties of lipid membranes, but also to expose current limitations of available MD force fields to describe large-scale membrane properties (these limitations will be described in the next Sections).
In summary, continuum models of membranes have been instrumental to understand the physical properties of lipid assemblies and they have dramatically helped the development of this research field. They nowadays represent a gold standard to interpret both in vitro and in vivo experiments and to test the accuracy of bottom-up microscopic approaches on large-scale membrane properties. Yet, despite massively elaborated mathematical attempts to incorporate microscopic details into the equations, the inevitable assumptions that are intrinsic of phenomenological approaches prevent a faithful and accurate description of the chemical properties of lipid assemblies.
4.2 Atomistic models
To overcome the intrinsic limitations of continuum models to investigate the chemical properties of lipid and membranes, molecular simulations are an obvious and powerful alternative methodology. The first MD simulations of lipid assemblies date back to the early 80's,207–209 and all of them shared some level of “coarsening” with respect to fully atomistic models, with simplified lipid head groups and hydrocarbon chains and no explicit solvent. Even though very limited in size (<100 lipids) and time (few picoseconds) scales, these seminal works allowed the investigation of the microscopic properties of individual lipids, and were already quite successful in reproducing key properties of lipid tails.
The first simulations of fully solvated phospholipid bilayers appeared in the early 90's.210–217 These works showed that atomistic MD simulations of membranes were feasible (up to hundreds of picoseconds) and represented a powerful tool to study microscopic properties of membranes, including the slow-down of water diffusion in the proximity of lipids, membrane properties in different thermodynamic phases (liquid-crystalline, gel), or the ordering of lipid chains in agreement with deuterium NMR experiments.218,219
Since then, atomistic MD simulations have become a standard tool to investigate molecular properties of lipids with Ångström-level accuracy.67,220–222 Current atomistic simulations span scales of the order of tens of nanometres and up to microseconds in length, and they have shown the capability of accurately reproducing several experimental parameters, including area per lipid, deuterium order parameter, lipid packing or lipid diffusion. Perhaps more importantly, they have become a standard approach to explicitly investigate the interactions between both integral membrane proteins and peripheral proteins with membranes and lipids.223–226
Yet, despite their widespread use, atomistic simulations of membranes are far from trivial to set-up and run, with multiple hidden potential pitfalls. For the uninitiated and the “expert” alike, the most discouraging aspect of atomistic simulations of lipid assemblies is the prohibitively large number of available force fields. A not comprehensive list includes Berger,227 CHARMM36,228–230 CHARMM36 United-Atom,231,232 GAFF,88 Lipid14,87 Slipid,233 MacRog,234 Poger,235 Kukol,236 Chiu et al.,237 Rabinovich et al.,238 Högberg et al.,89 Ulmschneider et al.,239 Tjörnhammar et al.240
These force fields vary in the strategy used for their parameterization and, as a consequence, they give slightly different results, having been optimized to reproduce some specific membrane properties and inevitably providing less-than-ideal results for other non-target properties. In addition, their use with non-native software platforms may be problematic,241 mostly because of the different ways in which energy interactions (and especially long-range terms) are numerically treated in the different software applications for molecular dynamics.
While a consensus on the performance of these force fields is far from being established, there is some room for optimism: first, sustained efforts by a large numbers of researchers are contributing to the their constant improvement, thus providing hope for a possible convergence in their accuracy; second, community-wide efforts to objectively assess their performances are starting to emerge (see ref. 241 or http://nmrlipids.blogspot.fr for more details) and it is expected that these studies will provide indications for further developments.
In general, because of the larger number of available experimental data and of a somewhat simpler chemistry, microscopic properties of the hydrophobic core of lipids are better reproduced than those of their polar counterpart.242,243 This observation hints at a much larger problem, i.e. the accuracy of lipid force fields in combination with other, independently developed, parameterizations, such as those for water molecules, ions or amino acids.
Since each parameterization has intrinsic errors per se (consider, as an example, the large number of available water models and their associated systematic errors),244,245 there is no guarantee that combining different force fields will lead to a serendipitous cancellation of errors. As such, membrane properties of the lipid-water interface, i.e. of the lipid polar heads, will depend significantly on the particular water model used. To partially overcome this problem, each lipid force field is generally developed to work in combination with a specific water model and to counterbalance its pre-existing inaccuracies.228,246
This approach is suitable for lipid-water interactions but generates ripple effects when dealing with other molecules such as ions. For example, Na+ ions appear to have a higher binding affinity for membranes in simulations in comparison with experiments for almost all force fields (http://nmrlipids.blogspot.fr). This may lead to substantial errors in the phase transition temperature of bilayers since ions may promote lipid ordering via their (in this case, excessive) binding.247–249
Even though frequently and willingly neglected, this issue becomes most prominent in the case of MD simulations of protein-membrane interactions. This kind of simulations, that were first reported in the 90's,250 are nowadays very common.226,251 Yet, both the compatibility and accuracy of MD simulations using different lipid and protein force fields has been rarely thoroughly investigated,252 mostly due to the huge sampling required to converge free energies in such complex systems.
Beyond the obvious choice of mixing force fields with similar parameterization strategies (i.e. CHARMM lipids with CHARMM proteins or AMBER lipids with AMBER proteins) some mixed combinations have been suggested (such as OPLS253 or AMBER proteins254 with Berger lipids). Recent works wishing to estimate the translocation free energy of individual peptides into model lipid bilayers appear the most promising avenue to gain insight into this issue, but published results appear somewhat contradictory so far.255–259
In summary, MD simulations of lipid assemblies are nowadays an established tool that can provide remarkable insights into the molecular structure and dynamics of model membranes. Even though some limitations persist, their ability to reproduce experimental data on membrane properties is truly amazing. Most importantly, the remaining drawbacks are under intense scrutiny, suggesting that future developments will lead to further improvements in their accuracy. In this context, the most challenging area appears to be that of protein–membrane interactions, a field that will probably remain the most significant application area for atomistic simulations of lipid assemblies.
4.3 Coarse-grained models
Remarkable progress in both software and hardware has allowed expanding the time-scales for atomistic MD simulations of membranes by four orders of magnitude in two decades, from the hundreds of picoseconds reported in the first studies of fully solvated phospholipid bilayers,210–217 up to microseconds in current simulations.93,94,260,261
On the other hand, the lateral dimensions of MD simulations of lipid bilayers have only marginally increased, remaining confined to box sizes of approximately 10×10 nanometres. This limitation not only prevents investigating large-scale membrane remodelling phenomena that are crucial in cellular processes but it also does not permit a direct comparison between atomistic bottom-up numerical simulations and those continuum theories that had been historically so successful in investigating membrane properties at larger scales.
To overcome this shortcoming of atomistic simulations, and in order to retain sufficient chemical accuracy in the description of mesoscopic membrane properties, coarse-grain MD simulations have been extensively developed in the last two decades, and they have proven to be remarkably successful towards the investigation of multiple crucial membrane-mediated cellular phenomena.40,50,53,61,262
The first step in CG MD simulations of lipid assemblies is the choice of a suitable mapping between a fully atomistic representation and a coarse grain one. This mapping is not univocal and several choices have been proposed, with the minimal requirement of being able to distinguish between the lipid polar head and the acyl chains with different CG beads. Regardless of the mapping, however, all CG models share some intrinsic limitations that must be taken into account before indulging into a more detailed description of the different CG strategies.
First of all, the mapping between an atomistic representation and a coarser one involves the loss of degrees of freedom. This implies that certain physical interactions between system components are no longer present. In the case of a water molecule, for example, even the simple (and computationally still quite expensive) representation with a single bead implies that we lose the capability of describing the dipolar moment that is characteristic of water molecules. A two-beads representation would be able to retain dipolar interactions but would still inevitably lose the energetic terms deriving from hydrogen bonding. Thus, even though a careful parameterization of the interactions between CG beads may compensate for those missing interactions, special care must be taken when interpreting the results from CG simulations, and especially when establishing their range of validity and the systematic errors associated to the results.
In addition, the neglect of atomic degrees of freedom in the CG simulations results in smoother energy landscapes, giving rise to a much faster system dynamics. While this aspect is beneficial with respect to the exploration of the phase space of the system, it presents the inevitable drawback of preventing a straightforward interpretation of time scales in CG simulations. Even though strategies to recover system dynamics from the CG simulations have been proposed, usually involving the comparison with atomistic simulations, it is important to remember that the lower-dimensional energy landscape of CG simulations is not meant to reproduce realistically dynamical processes.
An analogous situation arises for what concerns the ability of CG models to correctly reproduce system entropy. According to the Boltzmann formalism, the entropy of a system depends on the number of available microscopic phase space microstates. Since several degrees of freedom are discarded in the CG representation, the entropy of a system may thus be (sometimes dramatically) wrong. Even though the overall free energy can be set as to reproduce that of atomistic systems or of experimental measurements (essentially thanks to enthalpy compensation), special care must be taken when interpreting entropy-dependent properties such as, for example, temperature dependent phenomena.
Finally, a crucial open question regarding CG methods is their portability. Since the parameterization of CG interactions involves a specific training set, regardless of the strategy used, it is up to debate whether these interactions would retain their validity in significantly different conditions, such as temperature, ionic strength or, especially for lipid bilayers, phase transitions. Thus, changing the simulation conditions may require, if not an entire reparameterisation, at least the careful validation of the underlying assumptions, usually via comparison with atomistic MD data or, better, relevant experimental observations.
As CG simulations of lipids and membranes become more and more widespread and accessible (“routine”), it is easier to lose track of how a specific model has been parameterized and of its main capabilities and limitations. We will thus briefly review the history of CG modelling approaches to study lipid assemblies and we will especially focus on the most popular CG force fields that are somewhat able to retain chemical accuracy, providing a direct link between atomistic properties of lipid bilayers and mesoscale properties of membranes that can be investigated using continuum approaches.
The first CG MD simulations of membranes described lipids as a set of Lennard-Jones particles, distinguishing between polar (for water and polar heads) and apolar (for acyl chains) particles.263,264 These early simulations were able to observe the spontaneous self-assembly of lipids into micellar structures and their arrangement at the water/oil interface, thus characterizing structurally their behaviour as surfactants. Despite the capability to observe spontaneous self-assembly and to qualitatively reproduce basic membrane properties such as bending rigidity or protrusion modes,41 these models were not able to investigate length scales much larger than those available for atomistic MD simulations.
A promising alternative to reach larger time and length scales was dissipative particle dynamics (DPD), a methodology akin to MD simulations where only three types of forces are present: a conserved soft repulsion force, pairwise dissipation forces and pairwise random forces. Thanks to the use of soft repulsion forces, DPD simulations can tolerate a much larger time step while still preserving momentum and providing the correct hydrodynamic behaviour, thus allowing for longer simulations times and bigger system sizes. However, the mapping used in these simulations remains really coarse.42,262,265,266
Alternatively, solvent-free models able to describe large-scale membrane remodelling processes using MD simulations have been developed by several groups.54,137,199,267–271
While these models have been successfully applied to curvature mediated-interactions (also in the presence of proteins or nanoparticles), thus to scales of the order of several tens (and sometimes up to hundreds) nanometres69,193,272–275 they still lack the chemical accuracy to establish a direct link with the different properties of the various lipid species that are typically investigated using atomistic simulations.
A recently proposed promising alternative is the solvent-free Highly Coarse-Grained (HCG) lipid model proposed by Voth and co-workers.276,277 This method can distinguish between different lipid species, such as DLPC, DOPC and DOPS, and is able to reproduce both elastic and structural properties of the various lipid species in comparison with reference all-atom simulations. This method is based on the so-called multi-scale coarse grain (MS-CG) methodology previously proposed by the same authors,44,48,49 where the interactions between CG particles are derived from all-atom simulations using a force matching approach based on a variational protocol.
This methodology has been recently applied to investigate membrane-remodelling processes by peripheral proteins278 and appears to be a promising compromise to investigate large-scale properties while still being able to distinguish between different lipid species. However, given the low resolution of the method, with individual lipids described using only three or four beads, several chemical-dependent properties of lipid assemblies remain beyond the capability of this approach.
More generally, all solvent-free CG methods have intrinsic limitations when it comes to describing explicitly processes that ultimately depend on the partitioning of molecules between water and oil, a key characteristic of lipids and surfactants. Thus, even though solvent-free CG methods have shown the capability to reproduce structural properties of membranes, more accurate thermodynamic properties require the explicit treatment of water–lipids interactions.
The two most successful CG force fields that have been trying to mimic most of the chemical properties of lipids are the MARTINI,279 developed by Marrink and coworkers and the SDK model280 developed by Klein and colleagues. Both methods use a finer mapping, with phospholipids described with approximately ten to twenty CG beads, depending on the acyl chain length, and an explicit treatment of the surrounding solvent (Fig. 12).
The MARTINI force field, developed as a substantial improvement from previous CG attempts281 mostly uses a mapping of four heavy atoms into one CG bead, including for water molecules. A finer mapping is used for chemical moieties of subtle structural properties, such as ring compounds that are important in some lipid species, including cholesterol and phosphoinositides.
In the model, four main types of CG beads are introduced: polar, nonpolar, apolar and charged. Each of these particle types is in addition characterized by some subtype specifications, mainly its hydrogen-bond capability (acceptor, donor, both or none) and its overall polarity (on a 1–5 scale). With few exceptions, all particles have the same size and the interactions between them depend solely on the pair-pair strength interaction (in a classical 6–12 Lennard-Jones fashion) and on their charges (in the electrostatic term).
This approach allows distinguishing between several chemical compounds, while still keeping the number of different CG beads manageable. As an example, a phospholipid molecule (phosphocholine for example, see Fig. 12) is described by two charged particles representing the choline and phosphate groups, respectively, two polar particles describing the glycerol moieties, and several apolar particles modelling the two acyl chains, with the total number depending on the length of the acyl chains and consistent with the 4 to 1 mapping.
Thanks to the relatively high number of different particles, subtler chemical properties that are crucial in lipid chemistry can be investigated, such as the influence of lipid mono- or poly-unsaturation, or the properties of different lipid polar heads (even fine ones such as the difference between phosphocholine and phosphoethanolamine).
Once the atomistic to CG mapping and the potential energy terms (harmonic potential for bonds and angles, 6–12 Lennard-Jones and electrostatic with a relative dielectric constant of 15) are set, the crucial aspect of this CG procedure is the determination of the numerous intra- and inter-molecular parameters.
In the MARTINI strategy, bonded terms are parameterized such as to reproduce values obtained from atomistic simulations, while non-bonded parameters are set to reproduce, for all the individual particle types specified in the mapping, experimentally-derived thermodynamic properties (sometimes replaced by atomistic free energy calculations if the experimental data are not available), and in particular free energies of hydration, free energies of vaporization and the partitioning free energies between water and a number of organic phases.
The main advantage of this empirical approach is its immediate transferability: since any new molecule is built from the same set of building blocks, it is reasonable to assume that the force field will be able to describe, at least semi-quantitatively, its thermodynamic properties. On the other hand, the main drawback is that if a specific property (and especially structural ones) needs to be accurately described, there is no a priori guarantee that the MARTINI force field will be able to do so with the desired level of accuracy.
In particular, this parameterization approach has shortcomings in reproducing correctly the phase diagram of water, including, for example, water/vapor interfacial tension,279 and, as a consequence, it has flaws for what pertains the investigation of surfactant properties of lipid monolayer at the air/water interface.61,280 Improvements of the MARTINI water model have been proposed via the use of a polarizable force field,282 but polarizable CG force fields remain poorly investigated so far in the field of CG simulations.
Despite its limitations, the MARTINI model, especially thanks to its connections with the GROMACS molecular dynamics software package283,284 and to its user-friendly implementation (see http://md.chem.rug.nl/cgmartini), is by far the most used CG model for lipids assemblies to date and it has been proficiently used to investigate a large number of membrane and lipid properties. The successes of the MARTINI force field have been recently described in an excellent review;61 for our purposes, it is important to underline that, in large part thanks to the remarkable efficiency of modern MD software, this model has been recently shown the ability to investigate large-scale membrane remodelling processes with unprecedented chemical accuracy, thus providing a more accurate description of phenomena that were either traditionally investigated using continuum approaches, with low-resolution CG models or simply not addressed using theoretical or numerical approaches.
Amongst them, the investigation of realistic, multi-component membranes,68,285 entire influenza virions,286 curvature-dependent properties,200,201,287–290 monolayer collapse291–293 and membrane remodelling processes such as membrane fusion294–298 or fission.299
Despite those early successes, it is important to remember that the four to one mapping poses intrinsic limitations, and especially for what pertains those phenomena in which large scale properties emerge as the macroscopic read-out of much finer and detailed molecular properties, such as, for example, those involving hydrogen bonding. Thus, care must always be taken when assessing the predictive power of CG simulations and further validation against experimental properties is recommended whenever possible.
Another chemically accurate CG force field for MD simulations of lipid assemblies is the Shinoda-DeVane-Klein (SDK) force field.280 This model is the improved version of the earlier CG model developed by Shelley et al.51 and utilizes a three heavy atoms to one CG bead mapping, thus providing a slightly higher resolution than the MARTINI (Fig. 12).
While the initial model by Shelley et al. was solely parameterised on atomistic simulations, the SDK model uses atomistic simulations exclusively for the parameterisation of bonded interactions (analogously to the MARTINI strategy). Non-bonded interactions, on the other hand, are parameterised not only with transfer free energies (like in the MARTINI), but also on density and interfacial properties such as surface tension.
Unlike MARTINI, however, the SDK model utilizes a 12–4 Lennard-Jones potential for all pair interactions involving water beads52 and a 9–6 Lennard-Jones potential for all other pairs. With this choice, together with the explicit treatment of long-range Coulomb interactions, water behaviour, and especially its phase properties, density and compressibility, is reproduced fairly well within the intrinsic limitations of the three to one mapping.
In addition, instead of defining a number of different CG beads and building up each individual molecule choosing from these basic building blocks, the SDK approach consists in subdividing each molecule into its main chemical constituents (while keeping the usual three to one mapping) and then fitting to the interactions of the new resulting parameters based on a mix of thermodynamic (mainly density, surface tension and solvation free energies) and atomistic simulations data.
As it is expected from its finer mapping and more sophisticated parameterization procedure, this model is generally more accurate than MARTINI, especially for what pertains properties at the lipid-water interface. On the other hand, the unusual functional form of Lennard-Jones interactions (that is not supported by most current MD software) and the need for a careful parameterization every time a new molecule is introduced, make this force field less appealing to the non-initiated.
Nevertheless, the authors have reported so far a large range of applications on lipid and membrane systems of sizes comparable to those studied using MARTINI, including spontaneous vesicle self-assembly,53 monolayer collapse,280 vesicle to micelle transformation300 and membrane fusion.301,302 The SDK model thus represents a powerful and promising approach to investigate finely tuned chemical properties of large lipid assemblies when cheaper but less accurate CG models turn out to be unsuccessful.
4.4 A prototypical example: membrane curvature
The investigation of cellular processes in which membrane curvature plays a prominent role is not only a relevant topic in membrane biology, but also a perfect scenario to assess the performances of continuum models and of bottom-up molecular simulations.
In the last few years, a large number of in vitro and in vivo studies have suggested that membrane curvature plays a major role in regulating multiple cellular phenomena,303–305 including vesicular trafficking, nuclear pore assembly or autophagocytosis.
At the same time, recent MD simulations have shown the capability of dealing with system sizes and curvatures that are comparable with those found in intracellular transport vesicles and that have been traditionally addressed solely using continuum methods.306
Because of system sizes, these investigations are clearly not feasible with atomistic MD simulations: for example a lipid vesicle of a radius of approximately 25 nm would contain around 18 000 lipids, corresponding to approximately 2 500 000 atoms using a fully atomistic force field, even before taking into account the unavoidable presence of water molecules. However, thanks to progresses both in hardware and in software technologies, such systems are nowadays computationally tractable using CG models such as the MARTINI or the SDK.
The main advantage of particle-based models over continuum treatments is their capability of being able to describe simultaneously membrane curvature and lipid chemistry, as well as their interplay: how do specific lipids promote/prevent curvature stresses? And how do lipids respond to curvature stresses, for example by partitioning between the inner and the outer monolayer of highly curved vesicles?
On the other hand, since the investigation of the interplay between curvature and lipid chemistry is not straightforward using AA simulations, validation of the CG methodology in this context requires a special protocol.
To do so, Vanni and colleagues took advantage of the unique properties of a class of amphipathic peptides, the Amphipathic Lipid Packing Sensors (ALPS) motifs, which extensive biochemical analyses had identified as accurate sensors of both lipid chemistry and membrane curvature.305
In particular, these peptides showed a marked sensitivity to both the number of monounsaturated acyl chains, with binding increasing moving from saturated acyl chains to monounsaturated ones, and to the “size” of the lipid polar head, with higher binding in the presence of lipids containing smaller polar heads, such as diacylglycerol or phosphatidylethanolamine. Using atomistic simulations, the authors managed to show that these variations in lipid chemistry promote the formation of lipid-packing defects in membranes307 and that ALPS motifs specifically bind to pre-existing packing-defects via the insertion of hydrophobic residues.308
This observation paved the way for the careful validation of CG simulations (in this particular case using the MARTINI force field) by evaluating their ability to reproduce lipid-packing defects measured in atomistic simulations for several flat bilayers of different compositions.290
On the one hand, the high correlation between the data obtained from CG and atomistic simulations indicated that the used CG model was, in this case, able to reproduce with high accuracy this specific property of lipid assemblies (Fig. 13). On the other hand, the very good agreement between the simulations and the observed binding of the ALPS motif to lipid bilayers of identical compositions in vitro, suggested a possible methodology to validate the CG force field also for curved bilayers, where comparison against all-atom simulations would have not been possible.
Analysis of lipid-packing defects in curved bilayers as a function of lipid composition revealed the crucial role of lipid chemistry in the adaptation of lipid bilayers to membrane curvature and, in turn, of ALPS binding to curved membranes.
In particular, CG simulations suggested, in agreement with experiments, that membrane curvature and lipid composition synergize to promote the formation of lipid packing defects, i.e. those lipid compositions that promoted packing defects in flat bilayers also did so in highly curved bilayers. This observation further confirmed only weak coupling between lipid shape and leaflet curvature, in agreement with fluorescence measurements309 and in contrast with several predictions from continuum models.
4.5 Multi-scale hybrid particle/field models
Even though the molecular representation of lipids with molecular-type approaches (atomistic or CG) has intrinsic advantages, as shown in the previous sections, mesoscale models have the possibility of exploring significantly larger time-space dimensionalities. We present here a very brief resume of hybrid approaches combining CG and density-field approaches, which represent the counterpart of QM/MM and AA/CG models introduced in the previous sections.
In these approaches, soft matter systems are modelled using self-consistent field (SCF) theory. In detail, systems are still formed by explicit molecular objects. The bodies, however, are not interacting directly with each other via usual two-body terms. Instead, molecular segments are decoupled from each other, and posed under the effect of a static external field.310 A molecule in SCF theory is considered to be interacting with neighbouring partners only through an averaged density field. Marcelja was the first to propose a simple field model for lipid molecules in 1975.311 There, head groups of the lipids are treated as a boundary to which the tails of the lipid molecules are anchored. The sampling of degrees of freedom corresponding to intra-molecular rearrangement is made using Rotational Isomeric State. The molecular segments are coupled to an anisotropic aligning potential, which determines the organization of the lipid layer.311
In a rigorous formulation, the Hamiltonian of a system composed of N molecules is split into two parts as H0(Γ)+W(Γ). Where H0 is the reference Hamiltonian for an ideal system composed by N non-interacting molecules, but still responding to all intra-molecular forces, and W is an external potential causing perturbations from the ideal behaviour. The Canonical partition function is thus written as:
The statistical distribution of particles ϕ(r;Γ) at position r for a given microscopic configuration Γ will have a probability that depends on W. W can thus be defined as a functional of ϕ (W=W[ϕ]), such that a statistical distribution of “ideal” particles equal to that of the real system minimises the energy.
Equilibration and dynamics of soft-matter systems can be thus obtained by free particles that move in the external field W that self consistently responds to reproduce equilibrated particle distributions (for more technical details, one can refer to specialised reviews like312–314 ).
Combination of particle and field representations has been investigated by Daoulas et al.315,316 and applied to Monte Carlo simulations of CG models of synthetic polymers.317,318 More recently, also molecular dynamics simulations have been combined with self-consistent field description.319,320
The use of such models can be coupled straightforwardly with CG representation of lipids. In fact, CG reduces the number of different species present in the system, and thus facilitates the calibration of field responses for the different particles.321,322 Both lamellar and non lamellar (including reverse micelle and micellar phases) phases are well reproduced with these models without any geometrical assumptions.
The computational efficiency of this approach allows at the same time models with a resolution close to atomistic and simulations on very large length and timescales.323 For example, these models have been applied to study the interaction and the dynamical exchange of block copolymer chains between a spherical micelle (functioning as drug nanocarrier) and a lipid bilayer (as model of cell surface) also in the presence of drug molecules (iboprufen) in the micelle core. Simulations of 12 nm large micelles with membrane bilayers over several microseconds of simulations could be achieved on home-cluster facilities.324
Due to their computational efficiency, particle-field approaches are gaining popularity in the field.312–314 Hybrid models can be more generally used with respect to pure field models and much less assumptions need to be adopted. Several CG models using this hybrid representations for biocompatible polymers,325 biomembranes,314,321,322,325,326 vesicles,327 proteins,310,328 block copolymers,329–332 and colloidal particles333,334 have been reported in the recent literature, paving a new way toward closing the gap between atomistic and mesoscopic dimensionalities.
5 Final remarks
Molecular simulations are one of the most promising tools for basic understanding of biological phenomena at the sub-cellular dimensionality. In these years the combined efforts of several groups are rapidly pushing forward the frontier of the field. All-atom simulations remain the most reliable tool, allowing the description of the complex interplay between direct two-body and indirect many body interactions at the chemical resolution level. This methodology is currently a mature research field, able to faithfully reproduce a wide range of protein and membrane properties that can be investigated experimentally and providing a microscopic structural view of membranes that would not be possible otherwise.
However, even though efforts to improve the accuracy of atomistic simulations are ongoing, and especially for what concerns the interaction of lipids with polar molecules and proteins, the main limitation of atomistic simulations is their inability, at least so far, to investigate mesoscopic scales and thus establish a direct connection between molecular properties and large-scale phenomena.
Several approaches are being proposed to go beyond the yet continuously expanding time- and size-bottlenecks that limits the applicability of atomistic models, mostly on a case-by-case basis. For example, continuum models have been crucial to the development of the field of theoretical investigations of membranes, not only establishing a conceptual framework to interpret a large number of experiments, but also providing the basic vocabulary through which membranes are still described nowadays.
To provide a direct, bottom-up, representation of large-scale phenomena starting from the individual properties of their chemical components, CG models are an area currently undergoing intense and fascinating development. Since the mapping from an atomistic representation to a coarser one can be implemented in several ways, several CG models have been proposed so far, with each one of them possessing its own advantages and limitations.
The most promising approaches in this context appear to be those CG models that are able to retain the key aspects of protein and lipid chemistry while still being sufficiently computationally-efficient to investigate large-scale processes that have been traditionally investigated with more drastic coarse-graining approaches or with continuum methodologies.
These methods are allowing the description of large molecular assemblies with an unprecedented resolution, not only as a complement to experimental studies, but also suggesting new investigations of cellular processes at all levels. While limitations in describing water-mediated processes and membrane-protein interactions still remain, CG simulations of protein and lipid assemblies are quickly becoming a mature research field.
Finally, integration of multiple scales at different resolution may provide a best-compromise strategy to include the required information at the necessary detail without increasing costs. Implementations both at the hybrid AA/CG and at the particle/field levels may yield very interesting results in the future.
Whether the next progresses in addressing current limitations of CG methods will originate from new CG approaches or from improved CG parameters and multi-resolution potentials to investigate proteins and membranes will be a fascinating thing to watch.
MC acknowledges the support of the Norwegian Research Council through the CoE Centre for Theoretical and Computational Chemistry (CTCC) Grant Nos. 179568/V30 and 171185/V30. SV acknowledges Sandra Lacas-Gervais and the Centre Commun de Microscopie Appliquée, Université Nice Sophia Antipolis for providing the electron microscopy images in Fig. 11. We thank Matteo Dal Peraro for providing Fig. 9, and Giuseppe Milano for support on particle/field simulations of membranes. We finally thank Patrick Fuchs and Giacomo Fiorin for critically reading the manuscript.