This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
The COVID-19 pandemic has spread worldwide. However, as soon as the first vaccines—the only scientifically verified and efficient therapeutic option thus far—were released, mutations combined into variants of SARS-CoV-2 that are more transmissible and virulent emerged, raising doubts about their efficiency. This study aims to explain possible molecular mechanisms responsible for the increased transmissibility and the increased rate of hospitalizations related to the new variants. A combination of theoretical methods was employed. Constant-pH Monte Carlo simulations were carried out to quantify the stability of several spike trimeric structures at different conformational states and the free energy of interactions between the receptor-binding domain (RBD) and angiotensin-converting enzyme II (ACE2) for the most worrying variants. Electrostatic epitopes were mapped using the PROCEEDpKa method. These analyses showed that the increased virulence is more likely to be due to the improved stability to the S trimer in the opened state, in which the virus can interact with the cellular receptor, ACE2, rather than due to alterations in the complexation RBD-ACE2, since the difference observed in the free energy values was small (although more attractive in general). Conversely, the South African/Beta variant (B.1.351), compared with the SARS-CoV-2 wild type (wt), is much more stable in the opened state with one or two RBDs in the up position than in the closed state with three RBDs in the down position favoring the infection. Such results contribute to understanding the natural history of disease and indicate possible strategies for developing new therapeutic molecules and adjusting the vaccine doses for higher B-cell antibody production.
Keywords: SARS-CoV-2, mutations, conformational states, coronavirus, electrostatic interactions, epitopes, binding affinity, protein–protein interactions
Effects of the conformational states of the SARS-CoV-2 spike protein variants on their electrostatic stability and their impact on virulence.
Severe acute respiratory syndrome corona virus-2 (SARS-CoV-2), the new beta coronavirus responsible for the COVID-19 pandemic, infected more than 100 million people and killed more than two million, approximately 110 times more than the 2009 H1N1 pandemic officially killed within 1 year (1, 2). Even though the numbers are alarming, the data is not being collected evenly due to different coping strategies adopted worldwide. Moreover, the test results are not blindly reliable. For example, the RT-PCR test, considered trustworthy, was found to have high false-negative rates due to insufficient cellular material and different viral load kinetics of SARS-CoV-2, depending on the patient and sampling timing (3–5). The numbers of the COVID-19 deaths, as a result, are underestimated and often far from reality. The 2009 H1N1 pandemic mortality, for example, was 15 times higher than the number shown by laboratory-confirmed results (2).
Both SARS-CoV-2 and the other coronavirus SARS-CoV-1, responsible for the 2003 Severe Acute Respiratory Syndrome pandemic in China, use a very similar mechanism to promote the fusion of viral and cellular membranes to infect the human cell (6). They both interact with the Angiotensin Converting Enzyme II (ACE2) receptor through the Spike (S) glycoprotein receptor-binding domain, which shares 74% of sequence identity—see Jaimes et al. (7)—with a nearly identical binding conformation and similar affinities to promote the cell entry (6, 8). Virus entry is also related to the presence of N-linked glycans around the post-fusion spike protein structure, which probably act in a protective role against host immune responses (9). Although the virus already has some natural immune escape mechanisms, mutations can increase this escape, meaning that people who have already been infected could remain susceptible to reinfection, such as what happened in Manaus, Brazil (10). All viruses mutate as they replicate numerous times. Although coronaviruses have a relatively efficient proofreading mechanism (11), as many mutations have already been reported. For instance, the “Global Initiative on Sharing All Influenza Data” (GISAID) (12) repository has more than 804 k (17 March 2021) genome assemblies available in their “official hCoV-19 Reference Sequence.” They adopted the high-quality genome sequence “hCoV-19/Wuhan/WIV04/2019,” isolated from a clinical sample at the Wuhan Jinyintan Hospital in Hubei Province on December 30, 2019, as the official reference sequence.
Typically, point mutations with a defective genome do not represent any health issue. However, mutations driven by an adaptive evolution are an advantage for the virus, and as such, they represent a possible increased risk to human health, raising doubts about the efficiency of both the antibody therapies and the developed vaccines. The major concern arises from the knowledge that small differences in genetic material can substantially alter the properties of the viral proteins and offer extra advantages to viruses, such as higher ability to transmit or the capacity to escape the control of antibodies (whether produced due to a previous infection, received via intravenous administration or stimulated by vaccination) (13). Due to the rising fear of possible consequences that new variants may bring to the outcome of the pandemic, local outbreaks have been studied, and some of the variants responsible for them are being classified as variants of concern (VOCs) by some health organizations from around the world, such as Centers for Disease Control and Prevention (CDC) and European Centre for Disease Prevention and Control (ECDC) (14, 15).
To date, there are currently three principal VOCs considered: (a) 501Y.V1, also called VOC 202012/01 and B.1.1.7, (b) B.1.351, also called 501Y.V2, and (c) P.1 (15). Recently, the World Health Organization (WHO) labeled them as Alpha, Beta and Gamma, respectively. All these three variants are of concern because of the specific mutations that increased the transmissibility of the virus and its harmful effects on society. 501Y.V1, a variant detected in the United Kingdom and approximately 50% more transmissible, carries eight mutations on the Spike glycoprotein homotrimer. However, three of them are particularly worrisome—N501Y, meaning that residue number 501 had an asparagine replaced by a tyrosine, P681H, and the deletion of residues 69 (histidine) and 70 (valine) (16–19).
B.1.351, on the other hand, was first detected in South Africa and had, apart from N501Y (included in the three VOCs), the mutations K417T and E484K at the receptor-binding domain (RBD), the same key mutations present in the P.1 lineage, detected in Brazil (15, 20). Even though the impact of these mutations on the course of the pandemic is still partially unknown, different authors have elucidated some key aspects of the mutations mentioned. N501Y, for example, is located in the RBD and may increase ACE2 binding and transmissibility, while the infectivity rises by the deletion 69/70 (21, 22). The other two mutations found in both B.1.351 and P.1, E484K and K417N, have a crucial role in the viral escape, preventing the neutralization by some antibodies (10, 20). Although individually these mutations might generate changes in the trimeric properties, it is known that the combination of E484K, K417N, and N501Y mutations cause a greater conformational change in the RBD than N501Y or E484K alone (23). Another worrisome modification in B.1.351 and P.1 is the mutations D614G, once 614G variant, and ORF1ab 4715L, which were proven to be related to higher fatality rates (24).
The main clinical aspects of COVID-19, unlike efficient treatment strategies, are well documented and known in the literature, at least for the wild-type virus. The illness usually starts with nonspecific symptoms, such as fever, persistent cough, and fatigue (25). However, loss of smell, taste, and sense, delirium, skipped meals, and gastrointestinal symptoms, like abdominal pain and diarrhea, can also be considered to identify individuals infected with SARS-CoV-2 (26). In a more advanced stage, constituting the severe form of the disease, shortness of breath can also be observed, which usually leads to hospitalization. However, the clinical aspects of COVID-19 caused by the new variants are still being elucidated.
Experimental data suggest that P.1 (or Gamma) and B.1.351 (or Beta) are partially or even totally resistant to antibodies developed for the treatment of COVID-19 and are inefficiently inhibited by the serum from convalescent individuals (27). Moreover, based on analysis from the “New and Emerging Respiratory Virus Threats Advisory Group” (28), from the United Kingdom, the infection caused by the B.1.1.7 variant is associated with a higher rate of hospitalization and death when compared to the wild type of SARS-CoV-2 (SARS-CoV-2 wt) (28). All these results suggest that the increased transmissibility and a potential antigenic escape are the reason why there is a resurgence of cases and why individuals, previously infected with the wild type, are only partially protected against the VOC (27, 29).
Indeed, among several possible interdependent mechanisms that can increase viral transmissivity (e.g., increased viral shedding, the longer interval of contagiousness, people mobility, wearing masks), a couple of them are directly related to mutations: (a) an increased binding affinity between the viral adhesins with specific cell receptors (in this case, the RBD-ACE2 complexation, that defines the virus virulence and possible increased infectivity), (b) an increased environmental stability of the viral proteins and the virion particle, (c) a higher availability of human cell receptors by either direct genetic conditions [e.g., the concentration of receptors ACE2 tends to be more elevated in men (30)] or due to comorbidities [e.g., a lower stomachal pH in patients with Barrett's esophagus induces a higher expression of ACE2 (31)], (d) an increased interaction with co-receptors, and (e) immune evasion (32–35). Each of these individual factors and also their combinations have been discussed in the literature. For instance, there are studies investigating other possible interactions for the spike protein to enter the cell (36, 37), genetic factors affecting the viral virulence [e.g., the ACE2 polymorphisms can also influence the virus entry in the host cell and individual susceptibility (38)], and the immune evasion (39–41).
The interaction between viral proteins and cell receptors can vary from virus to virus and for different mutations (33, 42). The spike RBD comparisons for its binding affinity with ACE2 for both SARS-CoV-1 and SARS-CoV-2 (8, 43–45) have been reported. However, the data is still lacking for the new variants of SARS-CoV-2, quantifying their impact on virulence. Starr and co-authors partially addressed this question in their landmark experimental research work where they performed a deep mutational scanning of SARS-CoV-2 RBD (46). They provided information about the effect of single mutations on the binding, stability, and expression (46). However, nothing is known about double or triple substitutions. Moreover, the spike trimer must undergo a complex mechanism to make the RBD available for a proper binding with ACE2. Cryo-electron microscopy (CryoEM) structures of the SARS-CoV-1 spike trimer revealed that at least one chain of the homotrimer has to be at the “up” position (“open” conformation) as a prerequisite conformational state for the RBD-ACE2 interaction (47). At least a two-step “expose–dock-like” mechanism is needed first to allow the whole homotrimeric structure to perform all the conformational adjustments (first step of this complex process) before steric clashes (seen for the Spike “close” conformation) are removed to allow the complexation RBD-ACE2 to happen (second step) (43, 47). Up to three receptors, ACE2 can be bound one by one (48). Since several mutations present in the new variants occurred at the spike protein outside the RBD region (e.g., for the 501Y.V2 variant, L18F, D80A, D215G, R246I, D614G, and A701V), it is expected that they have an impact on the “up” and “down” mechanism of the exposing step. Also, the number of titratable amino acids involved in these mutations suggests an electrostatic dependence. Nevertheless, the pH effects are somehow contradictory or, at least, not fully understood. On one side, pH does not seem to be particularly important to trigger the conformational changes from the “down” to the “up” state (49). For example, no significant conformational changes were observed in CryoEM at lower pH (pH 5.6) in comparison with neutral-pH (pH 7.2) for SARS-CoV-1 (48). On another side, Zhou et al. (50) concluded that an immune evasion could be facilitated for SARS-CoV-2 by the low pH “down” confirmation because of a pH-dependent refolded region located at the spike–interdomain interface, consisting of residues 824–858, that exhibited structural modifications and RBD-mediated positioning of the trimer apex (50).
Following a previous study on the interactions of the RBDs of SARS-CoV-1 and SARS-CoV-2 spike proteins and the human cell receptor ACE2 (43), we investigated the interactions of the RBD of the new most worrying variants (at present) and other mutations with ACE2. Using constant-pH biophysical simulation methods, the binding affinities between the RBDs of these variants with ACE2 were quantified at different pH regimes and the electrostatic epitopes (EEs) of each case mapped and compared. For this analysis, the spike RBD was assumed to be ready for the interaction at the proper conformational state (i.e., only the second step of the “expose–dock-like” mechanism was studied). Another important aspect explored in this study was the electrostatic stability of the different possible conformational states of the trimer [all S chains at the “down” state (DDD), one chain at the “up” state, and two others at the “down” state (UDD), and two chains at the “up” state and one “down” state (DUU)] for some variants at all pH regimes. Such combined data can provide answers at the molecular level for the increased transmissivity of the new variants observed in clinical practice, explaining the faster spread of them, the increased number of hospitalizations, and a tendency to affect younger patients.
The field of virology is widely explored and enriched by computational approaches. It involves from bioinformatics tools and machine learning methods to biophysical simulations to understand the various aspects of viral functioning, including its immunology, pathogenesis, structural and molecular biology of the virus proteins (51–53). Complementing experimental studies, computational tools allow, for example, the understanding of molecular mechanisms of the virus, such as the comprehension of capsid proteins assembly (assembly intermediates are still difficult to be obtained by lab experiments), the quantification of the biomolecular interactions of the host–pathogen system, the prediction of conventional epitopes and EEs, the understanding of increased virulence for different strains, and the development of specific molecular binders for diagnosis, treatment, and prevention (43, 54–60).
Biophysical simulations of virus systems, based on computational molecular simulation methods such as Monte Carlo (MC) (61, 62) and classical Molecular Dynamics (MD) (61, 63) can take advantage of their long-recorded success for probing the thermodynamic, dynamic, and interactive properties of biomolecules in pharmaceuticals [see (64, 65) for reviews]. Here, a fast constant-pH MC scheme (66, 67) is applied to identify important residues for host-pathogen interactions and to clarify the intermolecular interactions involving the RBD of S proteins of SARS-CoV-1, 2 and the South African variant (B.1.351) and the electrostatic stabilities (68) of the spike homotrimers for the DDD, DUU and UDD conformational states at all solution pHs. Additionally, other recent variants like the Brazilian P.1 (alias of B.1.1.28.1), the Californian (B.1.427/B.1.429), the New York (B.1.526), the Indian double mutations E484Q and L452R (B.1.617), and the SARS-CoV-2 mink-associated variant strain (Y453F) were included in our analyses. Due to the emergence of more than one nomenclature for these variants of the SARS-CoV-2 virus, the WHO provided quite recently a new naming system based on the Greek alphabet: Alpha is used to refer to B.1.1.7 (UK), Beta to B.1.351 (South African), Gamma to P.1 (Brazilian), Kappa to B.1.617.1 (the Indian double mutant), Epsilon to B.1.427/B.1.429 (Californian) and Iota to B.1.526 (New York).
In the present study, several molecular systems were investigated by employing the SARS-CoV-1, 2 and the new variants of the S RBD proteins with ACE2 (simulation set 1). Also, the electrostatic stability of the whole spike homotrimeric protein at different conformational states was studied (simulation set 2). A scheme of these two simulation sets is given in Figure 1 . The three-dimensional coordinates of these macromolecules necessary for all these simulations were obtained from different sources:
Schematic representation for the two simulation sets of this study. Simulation set 1 represents complexation simulations at constant-pH where the interaction between the S receptor-binding domain (RBD) protein of SARS-CoV-1, 2 and the the new variants (B.1.351, P.1, and the mink variant) with the human receptor angiotensin-converting enzyme II (ACE2) were investigated. Simulation set 2 represents simulations used to estimate the electrostatic stability of whole spike homotrimeric proteins at different conformational states (DDD, UDD, and DUU, respectively) for all solution pHs. Spike wt proteins from SARS-CoV-1, 2 and its B.1.351 (Beta WHO new label) were investigated.
(a) The SARS-CoV-1 S RBD wt protein (RBD1wt): It was extracted from the RCSB Protein Data Bank (PDB) (69), where it was deposited with the PDB id 2AJF (chain E, resolution 2.9 Å, pH 7.5) and found complexed with ACE2 (chain A) – see Figure 2 .
Crystal structure of the SARS-CoV-1 S RBD (PDB id 2AJF, chain E) and the modeled SARS-CoV-2 S RBD (wildtype). See text for details regarding the modeling aspects. These macromolecules are shown, respectively, in blue and red in a ribbon representation. The root-mean-square deviation (RMSD) between these structures is equal to 0.64 Å.
(b) The SARS-CoV-2 S RBD wt protein (RBD2wt): For the sake of comparison with a previous study (43), we used the coordinates obtained from the comparative modeling of the three-dimensional protein structure built up at the SWISS-MODEL workspace (YP_009724390.1) based on the NCBI reference sequence NC_045512 (70). See Reference (43) for the details. The root-mean-square deviation (RMSD) of atomic positions between this modeled structure for the RBD of SARS-CoV-2 S wt protein and the available one for SARS-CoV-1 (PDB id 2AJF) is 0.64 Å. A comparison with experimental SARS-CoV-2 S RBD structures deposited after this theoretical model was proposed [e.g., PDB ids 6W41 (resolution 3.08 Å, pH 4.6) and 6YM0 (resolution 4.36 Å, pH 8)], which revealed similar RMSDs (0.51–0.56 Å) to this one (0.64Å). This is an important feature showing that the sequence-structure relationship is robust enough to handle even mutations while still preserving the overall fold of the molecules. A previous study have also indicated this feature (71). Additional runs were performed with RBDs extracted from the PBD ids 6VSB (prefusion open state with one RBD at the up position) and 6VXX (close state). These are CryoEM coordinates obtained with a resolution of 3.46 and 2.8 Å, respectively. We shall refer to them as RBD2wt′(6VSB) and RBD2wt′(6VXX).
(c) The SARS-CoV-2 S RBD variants (RBD2variant): The coordinates for all studied variants were modeled by the simple replacement of amino acids from the SARS-CoV-2 S RBD wt structure followed by an energy minimization using “UCSF Chimera 1.14” (72). Rotamers using the Dunbrack 2010 library with the highest probabilities were selected for each case. The minimization was performed with default parameters also considering the H-bonds. Replaced amino acids are a) N501Y, K417N, and E484K (20, 46, 73, 74) for the South African (SA) variant (RBD2SA), b) K417T, E484K, N501Y (75–78) for the Brazilian (BR) P.1 (RBD2BR), c) Y453F (79) for the “mink” (RBD2m) strain, d) N501Y (41, 75) for the UK B.1.1.7 variant (RBD2UK), e) E484K and N501Y (80) for the New York (NY) B.1.526 variant (RBD2NY), and f) L452R (81, 82) for the Californian (CA) B.1.427/B.1.429 (RBD2Ca) strain. Note that both the SA/B.1.351 (or Beta), the Brazilian P.1 (Gamma), and the NY/B.1.526 (Iota) variants share two common mutations (E484K and N501Y). The third mutation at the RBD for the BR and SA variants is absent in the NY variant and occurs at K417 that is replaced by N and T, respectively, for the BR and SA variants, and does not alter the electrostatic properties of the RBD. Both ASN and THR have the same physical–chemical characteristics being polar with uncharged side chains, indicating that these mutations are a natural adaptation that helps the virus. The E484 was also quite recently found in a double mutation (E484Q and L452R) in India (RBD2I).
(d) The SARS-CoV-2 S homotrimer wt protein (Strimer2wt): Structural data are now abundant for the spike proteins (or their parts), providing us with a rich level of information never seen before. To date, 151 experimental structures are available at the PDB with a diversity of resolution, conformational states, and bound forms (apo and holo with different partners). They were solved by either x-ray crystallography or CryoEM. A good compilation of these available structures can be found at “Universal Protein Resource” (UniProt) under the id P0DTC2 (83). Several computer simulations were also performed, expanding even more information extracted from these experimental structures (84–86). Despite this amazing work in such a short time, new variants are also surging in a highly competitive time frame, and key physical chemistry parameters as the solution pH have not been explored in full detail yet. From this ample source of available structural coordinates, we decided to use the configurations extracted from the MD trajectories generated at Poma's lab (85) as input structures for our present calculations. This research group has characterized the structural and energetic differences between the DDD, DUU, and UDD conformations of the SARS-CoV-2 spike wt trimer at pH 7. The advantages of this choice are: i) standardization of all physical–chemical conditions used to obtain the structures at the three conformational states (different experimental structures were solved at distinct conditions) in the absence of any additional chemical (i.e., in a genuine electrolyte solution), ii) all structures from these MD trajectories were complete (no missing amino acids) and already minimized while the experimental structures have missing residues (in fact, the CryoEM structures contain several missing amino acids particularly at the RBD), iii) this set of configurations includes thermal, structural fluctuations, iv) configurations extracted from different simulation replicas allow us to estimate the standard-deviations (SDs) in our measurements. All these issues avoid the introduction of additional artifacts in the calculations. We also performed simulations with the experimental structure given by the PDB id 7A94 (CryoEM SARS-CoV-2 Spike homotrimer wt with one ACE2 bound, resolution 3.9 Å, pH 8). This configuration (at the UDD conformational), the most populated conformational state experimentally observed among the bound ones (48), allows us to test an experimental structure and see how the binding of ACE2 changes the stability of the complex S-ACE2.
(e) The SARS-CoV-2 S homotrimer SA variant (Strimer2SA): By assuming valid the robust sequence-structure relationship to preserve the overall fold of the molecules upon point mutations (as mentioned above), the coordinates for all studied variants were modeled as described above for item (c) by the simple replacement of amino acids from the SARS-CoV-2 S wt trimer structures at the corresponding three different conformational states. Replaced amino acids are for the SA B.1.351 variants, D614G, N501Y, K417N, E484K, L18F, D80A, D215G, R246I, and A701V (20, 46, 73, 74).
(f) The SARS-CoV-1 S homotrimer wt protein (Strimer1wt): As done for the SARS-CoV-2 B.1.351 variant, deletions and amino acids substitutions were performed in the SARS-CoV-2 wt homotrimer following the sequence of the SARS-CoV-1 S protein as given by Uniprot id P59594 (83). This introduces an additional approximation in outcomes whose effect is assumed to be small due to the high identity (at the sequence level) between them and the strong sequence-structure relationship mentioned above. Additional calculations were performed with the experimental structures given by PDB ids 6ACC (DDD conformational state) and 6ACD (UDD conformational state).
As indicated above between parenthesis, we shall refer to these systems (a)–(f) as RBD1wt, RBD2wt (and RBD2wt'), RBD2variant (variant = SA, BR, CA, NY, UK, I or m), Strimer2wt, Strimer2SA, and Strimer1wt, respectively. Calculations performed with a different input structure will have their PDB ids included as a subscript (e.g., Strimer1wt(6ACC) for the SARS-CoV-1 spike wt protein using the PDB id 6ACC). The human receptor ACE2 needed for all simulations of set 1 was obtained from the PDB id 2AJF (chain A) as demonstrated in a previous study (43).
All PDB files were edited before the calculations. Missing regions in these proteins were built up using the “UCSF Chimera 1.14” interface (72) of the program “Modeler” with default parameters (87). Water molecules and hetero atoms were completely removed from all used files. Glycosylation sites were not included (they are also often truncated in the experiments) due to the incompatibility of highly flexible molecules with a rigid protein model and all the uncertainties and arbitrariness involved in it (88). Cysteines involved in sulfur bridges were fixed with GLYCAM web tools (89). The “UCSF Chimera 1.14” package (72) was employed for all molecular visualizations and representations. Some images were generated by CoV3D (90). When appropriate, it is indicated in the captions of the figures. For some analysis, it was necessary to determine structural interfaces. This was done with the online server “PDBePISa” (91) with default options.
The linear sequences of the SARS-CoV-1 and 2 S proteins are available in UniProt with the ids P59594 and P0DTC2 for SARS-CoV-1 and 2 (wt and the B.1.351 variant), respectively. The S1 subunit that binds the virion to the cell membrane receptor is the cleaved chain between residues 14 and 667 for SARS-CoV-1, and between residues13 and 685 for SARS-CoV-2. To our knowledge, there is no information yet of if the mutations have affected the cleaved region for SARS-CoV-2. Alignments of pairwise sequences were obtained by the EMBOSS Needle server (92) with default settings. They are shown in Supplementary Figures 1–3. The identity and similarity between SARS-CoV-1 and 2 wt are 64.2 and 78.6%, respectively. The RBD corresponds to positions 306–527 and 319–541 for SARS-CoV-1 and 2, respectively. Both identity (I) and similarity (S) are higher for this specific structural region (I = 73.1% and S = 82.1%). On the other hand, the identity and similarity between SARS-CoV-1 and the B.1.351 variant are 68.2 and 75.5%, respectively. The identity and similarity between SARS-CoV-2 wt and the B.1.351 variant are 98.4 and 99.0%, respectively. This comparison shows that the B.1.351 variant of SARS-CoV-2 has an identity for the RBD slightly closer to SARS-CoV-1 than SARS-CoV-2 wt (68.2% against 64.2%).
Despite the wide availability of molecular simulation models at different scales (64, 93–96), the so-called coarse-grained (CG) models are the most cost-effective ones, due to (i) the large number of atoms (implicitly) involved in each of the studied systems (e.g. the SARS-CoV-2 S wt homotrimer has 3,363 amino acids), (ii) the electrostatic coupling between a large number of titratable groups in these macromolecular structures (e.g., 246 groups per chain of the SARS-CoV-2 wt homotrimer), (iii) the need to repeat the calculations both at several different physical–chemical conditions (140 different pH conditions per system), for different protein conformations (e.g., 10 input structures for each conformational state of the homotrimer) and several viral protein systems (more than 7 systems as described below, see item 2.1), (iv) the estimation of the free energy of interactions based on a histogram method where the statistics on each histogram bin requires longer simulation runs for a proper sampling, and (v) the number of simulation replicates to guarantee their numerical convergence. These simplified computer models allow a lower computational cost to explore the main physical characteristics of a system with a small number of parameters (54, 97, 98). Though it may not be so explicit, all these calculations still require extensive computational resources.
A simple CG for protein–protein interactions has been devised using a rigid body description of the macromolecules and successfully applied to study several different biomolecular systems (54, 97–100). The main feature of this model is the inclusion of a fast and accurate description of the pH effects using the fast proton titration scheme—FPTS (66, 67, 96, 101). Ideally, the coupling between the proton equilibria and conformational changes should be described by a constant-pH molecular dynamics (CpH MD) scheme. Nevertheless, convergence, especially of the electrostatic properties, is still a critical issue of such methods for a single and small protein (102–105). The CPU costs are prohibitive to study all the systems and conditions mentioned above, even for fast CpH MD approaches as our OPEP6 (105). Conversely, an interesting observation is that key dynamical properties significantly relate to electrostatic property variations, as recently demonstrated for flaviviruses (42). Therefore, we followed the constant-pH MC (CpH MC) strategy as used before for host-pathogen interactions (43, 59). In such a rigid model, internal degrees of freedom are not considered, which implies that our electrostatic model cannot assess the transition features of any reorganized structures of dynamic loops. For radial averaged properties, such as free energy of interactions, it has virtually no significant effect. Curiously, no significant conformational changes induced by pH have been seen in the spike protein in the experimental research carried out by Song and co-authors (49). Other models are also available in the literature. Each one has its characteristics, pros, and cons. For instance, Yu et al. are currently developing a SARS-CoV-2 virion CG model (solved by MD) that provides a full description of the virus due to its ability to adapt and incorporate new details of the molecules as they are released. However, pH effects are not fully included in their model, while it is done in our present work.
For the present CpH CG model, each group of atoms defining an amino acid is converted into a single charged Lennard-Jones (LJ) sphere of valence zi (a function of the pH of the solution) and radius (Ri)—the values for each class of amino acid is taken from the study of Persson et al. (99). The centers of mass of the spheres created are used to arrange them accordingly with their experimental three-dimensional structures (as specified above). In order to obtain the valences and allow the variation of the amino acids depending on the pH during the simulation, FPTS was employed, whose physicochemical basis and explanation can be found in previous publications (66, 67, 101, 102).
For all simulations of set 1 (i.e., for the complexation RBD-ACE2—see Figure 1 ), two proteins (here, the RBD and the ACE2) are placed in an open cylinder simulation box to allow for forward and backward translations in one axis combined with rotational movements in any direction. As for the simulation data, the static dielectric constant of the medium (εs) was 78.7 to mimic an aqueous solution (assuming a temperature of 298 K), the radius (rcyl) of the simulation cell used was 150 Å, and height (lcyl) was 200 Å. Furthermore, the salt particles and the added counter-ions were represented using an electrostatic screening term as follows: for two ionizable amino acids i and j, the screening is given by [exp(–κrij)], where κ is the modified inverse Debye length, and rij is the distance between particles (54, 97, 98, 102). For simulations of set 2 (i.e., the stability of different homotrimers), only one macromolecule was included in the simulation cell.
The electrostatic interactions [uel(rij)] between any two ionizable amino acids of valences zi and zj are given by
u e l = z i z j e 2 4 π ϵ 0 ϵ r i j e x p ( − κ r i j ) ,where e (the elementary charge) is 1.602 × 10–10 C and ε0 is the dielectric constant of the vacuum (ε0 = 8.854 ×10 −12 C 2 /Nm 2 ). Except for the ionizable amino acids charges—which were defined by the FPTS (66, 67)—all the others were fixed neutral and kept constant during all simulation runs. Further details can be found on Poveda-Cuevas et al. (54), Barroso da Silva et al. (97), Delboni and Barroso da Silva (98), and Mendonça et al. (100).
Hydrophobic effect, van der Waals interactions, and excluded volume repulsion can also affect protein-protein interactions (54, 98, 99, 106). A simple way to at least incorporate the main contributions of these interactions is using an LJ term [uvdw(rij)] between the amino acids (54, 98, 101). For any two amino acids (either charged or not), the calculation for the LJ term is given by
u v d w = 4 ε L J [ ( σ i j r i j ) 12 − ( σ i j r i j ) 6 ] ,where εLJ regulates the intensity of the attractive forces in the system (54, 97, 98) and σij (= Ri + Rj)/2 is the LJ diameter for a pair of residues, represented by i and j, when they are in contact. The choice of εLJ is somehow arbitrary, although it has been used in many works a universal value of .124 kJ/mol (97, 98, 101, 107), corresponding to a Hamaker constant of ca. 9kBT (where kB = 1,380 ×10 −23 m 2 kg s −2 K −1 is the Boltzmann constant, and T the temperature, in Kelvin) for amino acid pairs (98, 99, 108). The direct impact is that the calculated magnitude of the free energy of interactions can be either under or overestimated, as we discussed before (43). For instance, the absolute numbers might be shifted when compared with results obtained using other force field descriptions. In principle, when experimental second virial coefficients are known for the same physical–chemical conditions and system, proper calibration of εLJ can be obtained for this very specific situation. This is not the case for these spike proteins. Alternatively, results can be interpreted in relative terms, comparing the measurements between similar systems and conditions. This solves the possible problem of ambiguity in the interpretation of the obtained data.
The size of the amino acid beads also affects the LJ contributions. For the sake of consistency with the adopted εLJ value, all Ri's were taken from Persson et al. (99). Each amino acid has a specific value of Ri (e.g., RTYR = 4.1 Å, RGLU = 3.8 Å. For this pair, 2σij = RTYR+RGLU = 7.9 Å) which allows the description of mostly macromolecular hydrophobic moments (109). Therefore, the simulations should correctly generate the docking orientation at short separation distances (54).
By bringing equations 1 and 2 together, one can obtain the total interaction energy of the system (whether charged or neutral) for a given configuration [U(k>)]:
U ( < r k >) = 1 2 ∑ i = 1 N ∑ j = 1 N ( u e l ( r i j ) + u v d w ( r i j ) ) ,where k> is the position of the amino acids and N is the total number.
The results were obtained using the Metropolis MC sampling performed with physiological ionic strength (NaCl at 150 mM) at different pH regimes. Calculations were performed with the Faunus biomolecular simulation package (110), where the FPTS is implemented (66). For the complexation study described in the simulation set 1, following a previous study (43), the pHs 7.0 and 4.6 were chosen, respectively, due to the need to understand the behavior of the system in physiologic pH level conditions and the acidic pH of the endosomal environment. This is a reasonable choice to keep the general features of the present study even though the precise value of the pH in human cells can be slightly different from these numbers (111). For all simulations from set 2, pH was varied from 0 to 14 with an increment of 0.1 units of pH to explore all possible pH conditions. This pH range was also used for the epitopes mapping by the PROCEEDpKa method (54)—see below. As indicated by Figure 1 , the main outcomes of these CpH simulations were the averaged total protein charge (total >), the averaged partial charge of each titratable group (aa >>), and the averaged protein dipole moment (< μ >). These quantities are also conveniently expressed in units of the elementary charge: the averaged total protein charge number (Z = total >/e), the averaged valence of each titratable group (zaa = aa >/e), and the averaged protein dipole number moment (< μo > = |∑ziri|). Free energy of interactions [βw(r)] were estimated from radial distribution functions [g(r)] between the centers of the proteins [βw(r) = –ln g(r), where β = 1/kBT]. r is the separation distance between the centers.
After preparing the molecular systems for the simulations as described above, equilibration and production runs were performed. Even invoking all the approximations in the CG model, these simulation runs demanded high computational resources due to (i) the large number of titratable groups involved in the system with strong electrostatic coupling; (ii) the free energy barriers of the systems; (iii) the need to fill all histogram bins used for the calculation of g(r) during sampling; and (iv) longer runs to the decrease statistical noises in the βw(r) data (54, 97, 98). All simulations from set 1 required at least 3.0 10 9 MC steps at the production phase. Simulations from set 2 could be well performed with 10 8 MC steps at the production phase. SDs were estimated by the use of at least three replicates per simulated system. Some systems were further explored with additional replicates.
Electrostatic properties are well known to be of great importance in biomolecular interactions. Indeed, they strongly depend on the spatial distribution of intramolecular and intermolecular charges, environmental conditions, such as pH or salt concentration that can vary significantly in different cellular compartments (54, 67). Due to its intrinsic long-range characteristics and the electrostatic coupling between ionizable residues, key amino acids responsible for the host-pathogen interaction can include groups outside the classical view of the epitope–paratope interface. Such broader definition, including inner titratable residues that can also take part in the interplay of interactions, has been called “electrostatic epitopes” (EEs). They can be efficiently mapped by a computational strategy called “PROCEEDpKa” (PRediction Of electrostatic Epitopes basedED on pKa shifts) (54). This allows the identification of all ionizable residues of macromolecules that do drive the biomolecular interactions. The difference between the numbers of classical (based on the “key and lock” view) and electrostatic epitopes will be higher for systems with a stronger electrostatic coupling between superficial ionizable residues with the for a given solution pH, protein system and conformational stateinner ones. Among other advantages [see Poveda-Cuevas et al. (54)], “PROCEEDpKa” includes the pH and ionic strength dependence that can dramatically affect the complexation process of the host–pathogen interactions. This is particularly important for the antibody–antigen interface with a peculiar electrostatic pattern richer in titratable amino acids (54).
The electrostatic stability of the different spike homotrimers (Strimer1wt, Strimer2wt, and Strimer2SA) at three distinct conformational states (DDD, UDD, and DUU) as a function of solution pH were estimated using the electrostatic free energy (ΔGelec). This physical quantity was calculated in terms of Coulombic contributions from the individual titratable groups for a given protein structure in a specific conformation and physical–chemical conditions (68). As proposed by Ibarra-Molero and coauthors (112), ΔGelec (in kJ/mol) can be given as
G e l e c ≈ − 1 2 1389 78.7 z i z j r i j e x p ( − r i j κ ) ,where rij is the separation distance between the ionizable sites i and j, as defined by the spike homotrimer conformation, all charge numbers, zk, are the averaged ones obtained from the titration studies, i.e., zk = aa >/e for a given pH, protein system and conformational state. All zκ values as a function of the solution pH were obtained from the FPTS calculations. κ was fixed at 7.86 Å to correspond to an electrolyte solution at 1:1 ratio with 150 mM NaCl.
An essential step in cell invasion by a virus is the interaction with a cell receptor. The receptor used by SARS-CoV-1 has been known since 2003: the angiotensin-converting enzyme II (ACE2) (113). Because of the great similarity between the viral proteins, the same human receptor was promptly proposed to be used by the virus responsible for the COVID-19 pandemic too (114). This has been repeatedly confirmed by different studies (6, 38, 43, 114, 115). Modifications in this molecular region (which can be visualized in Supplementary Figure 4), either by mutations in the Spike protein or by alterations in the receptor itself, can change the free energy of the interaction and result in higher or lower affinity. This is easily seen for some mutations. For example, the mutation E484K presented at least in three new variants (B1.1.351, P.1, and B.1.526) has an acid residue (GLU) replaced by a basic one (LYS). This substitution changes the physical–chemical nature of the residue 484 and implies a complete inversion of its electrostatic interactions. GLU has its electrical charge (in elementary units) varying from −1 (when fully deprotonated) to 0 (when fully protonated), while LYS has a zero charge number when deprotonated and is positively charged (+1) when protonated. All neighboring ionizable groups might also be affected by this inversion. This kind of evolutive viral signature suggests that the virus uses electrostatic properties to increase its virulence. When comparing the main simulated physical-chemical quantities of the RBD proteins, we can see the effect of the substitutions of the amino acids both at the protein charge number level (varying from +2.1 to +4.1, see Supplementary Table 1) and the dipole number moment level (varying from 31 to 90, see Supplementary Table 1). These results given between parenthesis were obtained by the FPTS at pH 7.0 and are summarized in Supplementary Table 1 for all studied variants. Data for pH 4.6 is included in this table too. Being the ACE2 receptor negatively charged at pH 7, the attractive charge–charge interaction will be stronger for most variants favoring the RBD-ACE2 complexation. The differences in the dipole moments reveal that the binding orientation can also be altered for some systems.
This preliminary and simplest physical–chemical analysis was investigated with more quantitative details of the complexation process. As a continuation of a previous study on the molecular interactions of SARS-CoV-1 and 2 wt S RBDs done at the beginning of the pandemic (43), we investigated in the present work the binding association of the S RBD proteins new variants to ACE2. The RBD was assumed to be exposed and ready for the docking phase [i.e., the RBDs were out of the homotrimeric S protein, assuming that the missed parts of the whole trimer do not interfere with the binding as suggested by the available crystallographic data (43)]. The mink, SA (B.1.351), the BR (P.1), the UK (B.1.1.7), CA (B.1.427/B.1.429), the NY (B.1.526), the India (B.1.617) RBDs complexations with ACE2 were studied using our CpH MC protein–protein simulations at pH 4.6 and 7.0. The wild type with a single E484K mutation was also included in this study. Free energy of interactions [βw(r)] calculated using the potential of mean forces sampled during the simulation runs for these systems are shown in Figure 3 at physiological salt concentration. The estimated maximum SDs obtained by comparing results from at least three replicates are .01 for βw(r).
Free energy profiles for the interaction of RBD proteins with the cellular receptor ACE2. The simulated free energy of interactions [βw(r)] between the centers of the RBD proteins from SARS-CoV-1, SARS-CoV-2 (wt), SARS-CoV-2 (mink), SARS-CoV-2 (SA), SARS-CoV-2 (CA), SARS-CoV-2 (NY), SARS-CoV-2 (UK), SARS-CoV-2 (I), and SARS-CoV-2 (E484K) and the cellular receptor ACE2 are given at pH 4.6. The source of the three-dimensional structures of these proteins is explained in the text and referred to as RBD1wt, RBD2wt, RBD2m, RBD2SA, RBD2CA, RBD2NY, RBD2UK, RBD2I, and RBD2E484K, respectively. Salt concentration was fixed at 150 mM. Data for the complexes with the wildtype proteins (RBD1wt-ACE2 and RBD2wt-ACE2) was given in an earlier study (8). Simulations started with the two molecules placed at random orientation and separation distance. Results for SARS-CoV-1, SARS-CoV-2 (wt), SARS-CoV-2 (mink), SARS-CoV-2 (SA), SARS-CoV-2 (CA), SARS-CoV-2 (NY), SARS-CoV-2 (UK), SARS-CoV-2 (I), and SARS-CoV-2 (E484K) are shown as black, red, green, blue, dark purple, pink, light purple, cyan, and continuous orange lines, respectively. (A) Full plot. (B) The well depth region of the βw(r) for each studied complex.
Simulations confirmed the binding of all RBDs to ACE2, as can be seen by the negative values of βw(r) around 50Å. In comparison with the SARS-CoV-2 S RBD wt protein, the SARS-CoV-1 S RBD protein has the strongest tendency to bind to ACE2 in our simulations, as it was previously reported in some theoretical and experimental studies (43, 116). However, this is not a complete consensus in the literature (6, 116–120). In one of the first experimental comparisons, Tian and co-workers provided quite similar binding affinities (KD = 15.2 nM for SARS-CoV-1 S RBD and KD = 15.0 nM for SARS-CoV-2 S RBD, where KD is the equilibrium dissociation constant), showing that SARS-CoV-2 S RBD-ACE2 had a slightly stronger affinity (117). Conversely, Brielle and others compiled a set of experimental measurements suggesting the higher binding affinity of SARS-CoV-1 S RBD–ACE2 (KD ~1.5–10.0 nM) (KD ~1.5–10.0 nM) comparable to the binding affinity of SARS-CoV-2 S RBD–CACE2 (KD = ~1.2–14.7 nM)—see Brielle et al. (116). This discrepancy can have different sources (e.g., use of other structural coordinates, molecular dynamical runs not long enough to sample larger structural fluctuations, glycosylation, presence of other monomers of the ACE2, use of the homotrimer instead of a single RBD, constant-charge vs. constant-pH simulations, different experimental assays, other physical–chemical conditions, etc.) (118). Calculations performed with RBD coordinates extracted from CryoEM structures will suffer from the lack of coordinates for this critically important region. For instance, the prefusion S homotrimer with a single RBD “up” as given by the PDB id 6SVB (chain A) has several missing amino acids at the RBD (INT….KVGGN….LFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYF….NG….).
Although these missing amino acids can be fully reconstructed by homology modeling, it becomes difficult to precisely identify if the final model provides more reliable coordinates for the complexation studies than the one built-up using the SARS-CoV-1 RBD as the template (with fewer unknown coordinates). Testing RBDs extracted from the CryoEM structures (completed with homology modeling) can result in binding affinities equivalent to what was measured for SARS-CoV-1 or slightly superior—see Supplementary Figure 5. As expected and reported before (97, 102), there is a dependence on the used structural coordinates for the protein–protein calculations. A different set of coordinates for ACE2 could also affect the possible structural rearrangements during the binding. Constant-charge MD simulations would not be a definitive answer, as indicated by a previous study that also predicted a higher affinity of SARS-CoV-1 RBD–ACE2 binding (116). An ideal solution would be applying CpH MC simulations to all variants but this approach is still nonfeasible at present. Instead, we opted to use as the input structure the RBD modeled as before (43) and focus on the aspects given by the effects of pH and the different substitutions of the amino acids, which is the main aim of the present study. It is a consensus from different studies that electrostatic interactions drive the RBD-ACE2 complexation in both cases (43, 118), which means that CpH models (as used in this work), even with other approximations, should better capture the main physics of the system.
Moreover, despite the good precision in the protein–protein simulations [0.01 units of βw(r) as mentioned above], the differences between the potential depths for each protein sequence are small. Considering all the intrinsic approximations assumed in this study (including possible fluctuations due to structural dynamics), we think it is safer to conclude that these outcomes indicate tendencies given by the differences in the linear sequences. Therefore, this analysis showed a slight tendency toward a higher affinity for ACE2 by all studied new variants. The highest affinity was found for the RBD2NY. In a crescent order, we have RBD2wt < RBD2mink < RBD2SA and RBD2BR < RBD2I < RBD2UK < RBD2CA < RBD2NY. The BR variant P.1 (data not included in Figure 3 because it was on the top of the plot of the SA case) behaves identically to the SA variant (B.1.351) due to the presence of the same key mutations in both of them—E484K and N501Y—at the receptor-binding motif. The mutations K417N/T do not make any difference for these two variants in terms of their binding affinities, at least captured by our CpH CG model. These substitutions on K417 might work to inhibit the ACE2-affinity (46). The effect of the absence of the mutation K417 in the NY variant can also be seen. This enhances the RBD-ACE2 affinity, especially when compared with the two other variants (SA and BR) that share the same substitutions but have either K417N or K417T to decrease it. The key mutations at the RBD region for the different variants included in this study are illustrated in Supplementary Figure 7. The increased affinity between RBD2UK and ACE2 has recently been seen in another computational approach (121). This result also revealed that the binding affinity of these new variants is approaching and enhancing the affinity measured for SARS-CoV-1 that was more virulent.
Most viruses use a drop in pH to trigger their host penetration (122), including some coronaviruses (123). For SARS-CoV-2, as we mentioned above, the pH effects are somehow contradictory. To further study the effects of pH in the RBDs-ACE2 complexation, calculations were also performed at pH 7 for all the variants. Data for the SARS-CoV-2 wt was obtained before in a previous study (43). As shown in Supplementary Figures 6A,B, the increased pH slightly favors the attraction for all of them. The same relative behavior between the strains seen at pH 4.6 is essentially reproduced at the physiological pH conditions. In the two studied pH regimes, the NY variant has the strongest binding affinity. Recent experimental data in the preprint format confirms the same trend for the wild-type virus (50). Surprisingly, the low pH does not seem essential for the viral cell invasion for the variants. Indeed, it was suggested before that an acid regime did not appear to act as a direct trigger of this entry process for SARS-CoV-1 (124). However, the low pH could be necessary for an additional player in the process, i.e., acidic protease involvement and membrane-fusion activity (125). pH not being so critical for the RBD-ACE2 complexation offers the SARS viruses and their variants an opportunity to easily and rapidly infect the human cells. This characteristic might also contribute to increasing their infectivity, as noted before (43).
Table 1 summarizes the main theoretical results together with experimental data for single mutations. For the sake of clarity, mutations are also listed in this table. Although the experimentally available data was not obtained exactly under the same conditions, it can be seen that they are qualitatively similar to the theoretical data. The presence of more than one mutation does not seem additive. For this set of studied cases, both N501Y and Y453F are mutations that experimentally resulted in the highest RBD-ACE2 affinities (46). The theoretical predictions confirm that the N501Y mutation increases this affinity. The difference between this strain and the NY and CA variants seen in our calculations is within the estimated errors (0.01). Conversely, the Y453F mutation is equivalent to the wild type in the theoretical predictions when the estimated errors are considered. Despite quantitative discrepancies, it is clear that the new variants tend to be more virulent than SARS-CoV-2 wt due to the molecular properties of their more evolutionary adapted RBDs.
Estimated binding affinities for RBD–ACE2 interactions.
RBD structure | Mutation(s) | Predicted binding tendencies (pH 4.6) | Experimental relative binding affinity for each single mutation (*) |
---|---|---|---|
RBD2mink | Y453F | 0.01 | 0.25 |
RBD2SA | K417N E484K N501Y | 0.02 | −0.45 0.06 0.24 |
RBD2BR | K417T E484K N501Y | 0.02 | −0.26 0.06 0.24 |
RBD2CA | L452R | 0.12 | 0.02 |
RBD2NY | E484K N501Y | 0.13 | 0.06 0.24 |
RBD2UK | N501Y | 0.11 | 0.24 |
RBD2I | L452R E484Q | 0.10 | 0.02 0.03 |
RBD2E484K | E484K | 0.10 | 0.06 |